Understanding Context Window & Context Engineering
Diving deeper into what is the context window and how to efficiently manage it.
Prompt vs Context Engineering
Prompt Engineering
Come up with better worded instructions that LLMs would use to get our desired outcomes. Better structured, more guided prompts allow LLMs to process our input and get us closer to what we actually want.
Context Window
Think of this window as a bag. The bag is only so big and can only fit a limited no. of items. Whatever is in the bag is what your LLM has access to in your chat/session. It will contain external sources/files, system prompt, chat history, current question.
Context Engineering
This is the art by which we manage what we put into the bag(context window). We limit the junk going in, to improve end result quality. The better the quality in the bag. The better the LLM can perform. This becomes extremely important when working with AGENTS. Because Agents can loop on their output, you want to make sure that their output is quality so that when they work on it in the next iteration, they have everything they could need to complete their objective.
Attention & Attention Scarcity:
For every word (token), an LLM will pair up with that word with every other word (including itself) from the input. These pairs are referred to Attention.
During Inference the LLM model will use weights it learned during training to score how relevant each token is to each other. These pairs are created to produce relevance scores between tokens. These scores are then used by the transformer layer to build meaning.
E.g Sentence: “The cat sat on mat”
<u>Pairs: </u>
| 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|
| the, the | the, cat | the, sat | the, on | the, mat |
| cat, the | cat, cat | cat, sat | cat, on | cat, mat |
| sat, the | sat, cat | sat, sat | sat, on | sat, mat |
| on, the | on, cat | on, sat | on, on | on, mat |
| mat, the | mat, cat | mat, sat | mat, on | mat, mat |
Now Attention Scarcity comes as a result of there being only so much available compute budget to go around. Imagine having millions of pairs but most of these pairs are junk, compute budget being wasted understanding junk is not great. Rather LLMs should use that budget understanding pairs that matter.