How can that be? Let's forget about quality, hallucinations, etc. The largest context window from an accessible/affordable LLM is 32k (Mixtral or GPT4). That's barely enough for a TODO app, let alone a real project. The smallest project I work on, a desktop app, has 60k LOC/6M characters/1.5M tokens.
So what changes are coming that would allow an LLM modify an existing codebase, e.g. to modify a feature and write its tests? (without having to spoonfeed it the perfect context the way we do now in ChatGPT)