Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
gunalx
3 months ago
|
parent
|
context
|
favorite
| on:
DeepSeekMath-V2: Towards Self-Verifiable Mathemati...
Well problematic because they are using their own verifier as apanem of experts, with their own model trained specifically to satisfy this verifier. On the benchmark runs, they dont mention using human experts to cross validate their scores.
cubefox
3 months ago
[–]
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: