Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doing an eval on itself is clever but confusing for the reader. How about a tutorial explaining how to do an evals on something more normal?


I'd be happy to. One thing that is tough is knowing what will resonate with the audience and not being too simple or too complex.

What do you think would resonate with you or with the audience you're thinking about?

That repo also has an illustrative eval for Agent Skill in Airflow for Localization

https://github.com/Alexhans/eval-ception/tree/main/exams/air...


How about taking a small, real open source project that has an AGENTS.md and showing how to add evals and optimize it?

The question I have is: what are we optimizing for and how do we measure it?

In your own repos, I see you have a fork of safepass, which seems like a nice simple project, but it doesn't have an agents file yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: