It's fairly surprising to me how naive/early we are still in the techniques that we use here.
Anthropic's post on the Claude Agent SDK (formerly Claude Code SDK) talks about how the agent "gathers context", and is fairly accurate as to how people do it today.
1. Agentic Search (give the agent tools and let it run its own search trajectory): specifically, the industry seems to have made really strong advances towards giving the agents POSIX filesystems and UNIX utilities (grep/sed/awk/jq/head etc) for navigating data. MCP for data retrieval also falls into this category, since the agent can choose to invoke tools to hit MCP servers for required data. But because coding agents know filesystems really well, it seems like that is outperforming everything else today ("bash is all you need").
2. Semantic Search (essentially chunking + embedding, a la RAG in 2022/2023): I've definitely noticed a growing trend amongst leading AI companies to move away from this. Especially if your data is easily represented as a filesystem, (1) seems to be the winning approach.
Interestingly though this approach has a pretty glaring flaw: all the approaches today really only provide the agents with raw unprocessed data. There's a ton of recomputation on raw data! Agents that have sifted through the raw data once (maybe it reads v1, v2 and v_final of a design document or something) will have to do the same thing again in the next session.
I have a strong thesis that this will change in 2026 (Knowledge Curation, not search, is the next data problem for AI) https://www.daft.ai/blog/knowledge-curation-not-search-is-th... and we're building towards this future as well. Related ideas here that have anecdotal evidence of providing benefits, but haven't really stuck yet in practice include: agentic memory, processing agent trajectory logs, continuous learning, persistent note-taking etc.
Anthropic's post on the Claude Agent SDK (formerly Claude Code SDK) talks about how the agent "gathers context", and is fairly accurate as to how people do it today.
1. Agentic Search (give the agent tools and let it run its own search trajectory): specifically, the industry seems to have made really strong advances towards giving the agents POSIX filesystems and UNIX utilities (grep/sed/awk/jq/head etc) for navigating data. MCP for data retrieval also falls into this category, since the agent can choose to invoke tools to hit MCP servers for required data. But because coding agents know filesystems really well, it seems like that is outperforming everything else today ("bash is all you need").
2. Semantic Search (essentially chunking + embedding, a la RAG in 2022/2023): I've definitely noticed a growing trend amongst leading AI companies to move away from this. Especially if your data is easily represented as a filesystem, (1) seems to be the winning approach.
Interestingly though this approach has a pretty glaring flaw: all the approaches today really only provide the agents with raw unprocessed data. There's a ton of recomputation on raw data! Agents that have sifted through the raw data once (maybe it reads v1, v2 and v_final of a design document or something) will have to do the same thing again in the next session.
I have a strong thesis that this will change in 2026 (Knowledge Curation, not search, is the next data problem for AI) https://www.daft.ai/blog/knowledge-curation-not-search-is-th... and we're building towards this future as well. Related ideas here that have anecdotal evidence of providing benefits, but haven't really stuck yet in practice include: agentic memory, processing agent trajectory logs, continuous learning, persistent note-taking etc.