Every time you send a request to a model you're already providing all of the context history along with it. To edit the context, just send a different context history. You can send whatever you want as history, it's entirely up to you and entirely arbitrary.
We only think in conversational turns because that's what we've expected a conversation to 'look like'. But that's just a very deeply ingrained convention.
Forget that there is such a thing as 'turns' in a LLM convo for now, imagine that it's all 'one-shot'.
So you ask A, it responds A1.
But when you and B, and expect B1 - which depends on A and A1 already being in the convo history - consider that you are actually sending that again anyhow.
Behind the scenes when you think you're sending just 'B' (next prompt) you're actually sending A + A1 + B aka including the history.
A and A1 are usually 'cached' but that's not the simplest way to do it, the caching is an optimization.
Without caching the model would just process all of A + A1 + B and B1 in return just the same.
And then A + A1 + B + B1 + C and expect C1 in return.
It just so happens it will cache the state of the convo at your previous turn, and so it's optimized but the key insight is that you can send whatever context you want at any time.
If after you send A + A1 + B + B1 + C and get C1, if you want to then send A + B + C + D and expect D1 ... (basically sending the prompts with no responses) - you can totally do that. It will have to re-process all of that aka no cached state, but it will definitely do it for you.
Heck you can send Z + A + X, or A + A1 + X + Y - or whatever you want.
So in that sense - what you are really sending (if you're using the simplest form API), is sending 'a bunch of content' and 'expecting a response'. That's it. Everything is actually 'one shot' (prefill => response) and that's it. It feels conversational but structural and operational convention.
So the very simple answer to your question is: send whatever context you want. That's it.
When history is cached conversations tend not to be slower, because the LLM can 'continue' from a previous state.
So if there was already A + A1 + B + B1 + C + C1 and you asking 'D' ... well, [A->C1] is saved as state. It costs 10ms to prepare. Then, they add 'D' as your question and that will be done 'all tokens at once' in bulk - which is fast.
Then - they they generate D1 (the response) they have to do it one token at a time, which is slow. Each token has to be processed separately.
Also - even if they had to redo- all of [A->C1] 'from scratch' - its not that slow, because the entire block of tokens can be processed in one pass.
'prefill' (aka A->C1) is fast, which by the way is why it's 10x cheaper.
So prefill is 10x faster than generation, and cache is 10x cheaper than prefill as a very general rule of thumb.
Prefill is 10x faster than generation without caching, and 100x faster with caching - as a very crude measure. So it's not a matter of 'only the case'. Those are different scenarios. Some hosts are better than others with respect to managing caching, but the better one's provide decent SLA on that.
We only think in conversational turns because that's what we've expected a conversation to 'look like'. But that's just a very deeply ingrained convention.
Forget that there is such a thing as 'turns' in a LLM convo for now, imagine that it's all 'one-shot'.
So you ask A, it responds A1.
But when you and B, and expect B1 - which depends on A and A1 already being in the convo history - consider that you are actually sending that again anyhow.
Behind the scenes when you think you're sending just 'B' (next prompt) you're actually sending A + A1 + B aka including the history.
A and A1 are usually 'cached' but that's not the simplest way to do it, the caching is an optimization.
Without caching the model would just process all of A + A1 + B and B1 in return just the same.
And then A + A1 + B + B1 + C and expect C1 in return.
It just so happens it will cache the state of the convo at your previous turn, and so it's optimized but the key insight is that you can send whatever context you want at any time.
If after you send A + A1 + B + B1 + C and get C1, if you want to then send A + B + C + D and expect D1 ... (basically sending the prompts with no responses) - you can totally do that. It will have to re-process all of that aka no cached state, but it will definitely do it for you.
Heck you can send Z + A + X, or A + A1 + X + Y - or whatever you want.
So in that sense - what you are really sending (if you're using the simplest form API), is sending 'a bunch of content' and 'expecting a response'. That's it. Everything is actually 'one shot' (prefill => response) and that's it. It feels conversational but structural and operational convention.
So the very simple answer to your question is: send whatever context you want. That's it.