Wrapping up all Context-Window
How to manage the context-window. Getting agents to perform.
The anatomy of effective context
Well balanced prompts, that are specific enough to guide behavior but flexible enough to give the model strong heuristic, is where you want to be with your prompt.
A well-balanced prompt avoids wasting the model’s limited attention budget on unnecessary instructions.
Tools: Allow agents to engage with external sources and bring the data into their context-window. Tools, when defined, should be specific. They should handle a specific set of task/s - the fewer the better.
Context retrieval and agentic search
Agents can pull up data dynamically at runtime by maintaining a lightweight identifier to the data they have ( think sql queries, file paths, weblinks, timestamps).
Agents can use a hybrid approach, retrieve some data upfront for speed and then they can explore when needed giving them flexibility.
Context engineering for long-horizon tasks
Techniques to handle long working tasks:
- Compaction: Having everything in your context window summarized when you near the limit of the window, then using the summary as input for a new context-window. Over many windows, this can have negative impact since you stand the chance of losing important information/ finer details as we continue to summarize the message window. Good compaction preserves architectural decisions, unresolved bugs, and implementation details, while discarding redundant tool outputs or messages
- Good for when back-and-forth messaging/engaging is required.
- Structured Note-taking: aka agentic memory. Here the agent writes notes which get persisted to memory outside context-window. E.g. writing to a
.mdfile. After context window resets, an agent can read it own notes and pick up from there.- Good for iterative development with clear milestones.
- Sub-agent architectures: Rather than one agent attempting to maintain state across an entire project, specialized sub-agents can handle focused tasks with clean context windows. Main agent works with a high level plan while sub-agents perform deep technical work/use tools to find relevant information. Each sub-agent might make use of thousands of tokens but return a high quality summary for the main agent to process. Think of it as a Separation of Concerns where each sub-agent only works on a specific tasks/set of tasks
- Good to handle complex research & analysis where parallelization pays dividends.