Token Limits#
This document explains the Token limits per call type controls in the UI and the corresponding backend settings.
These limits do not change the model’s total context window. They control the maximum output length for a single model call at a specific stage.
In practice:
Raising a limit gives that stage more room to finish a proof, theorem statement, or draft.
Lowering a limit makes the stage cheaper and faster, but increases the chance of truncation.
The most useful limits to tune depend on which theory pipeline you are using.
Where These Settings Live#
Backend configuration lives in:
eurekaclaw/config.py
The UI config mapping lives in:
eurekaclaw/ui/server.py(_CONFIG_FIELDS)
The UI form lives in:
frontend/index.htmleurekaclaw/ui/static/index.html
Quick Reference#
Variable |
Default |
Used by |
|---|---|---|
|
|
Generic agent loops (SurveyAgent, WriterAgent, fallback paths) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Context compression summaries (fast model) |
Which Pipeline Uses What#
There are two theory pipelines:
defaultmemory_guided
default Pipeline#
The stages that matter most for default are:
ArchitectProverAssemblerTheoremCrystallizerVerifier
Relevant code:
If the default pipeline is struggling, the most useful knobs are usually:
ArchitectProverAssemblerTheoremCrystallizerVerifier
memory_guided Pipeline#
The stages that matter most for memory_guided are:
AnalystDecomposerProverAssemblerTheoremCrystallizerVerifier
Relevant code:
If the memory_guided pipeline is struggling, the most useful knobs are usually:
AnalystDecomposerProverAssemblerTheoremCrystallizerVerifier
What Each Important Stage Does#
Agent loop#
Used by generic agent calls, including:
SurveyAgentWriterAgentany stage that falls back to the generic base agent path
Relevant code:
Raise this when:
survey answers are too short
writer keeps stopping early
Prover#
Used for theorem-proof generation at the lemma level.
Relevant code:
Raise this when:
lemma proofs stop mid-argument
prover keeps omitting technical steps
Architect#
Used by ProofArchitect in the default pipeline.
Relevant code:
Raise this when:
proof plans are too shallow
the architect returns incomplete lemma structure
Analyst#
Used by MemoryGuidedAnalyzer, TemplateSelector, and ProofSkeletonBuilder in the memory_guided pipeline.
Relevant code:
Raise this when:
the pipeline picks poor proof templates
proof skeletons are too sketchy
memory-guided analysis is too thin
Decomposer#
Used by decomposition-style stages, including KeyLemmaExtractor.
Relevant code:
Raise this when:
key lemmas are missing
decomposition is too coarse
Assembler#
Used to produce state.assembled_proof.
Relevant code:
Raise this when:
the proof body ends mid-sentence
assembled_prooflooks truncatedtheorem extraction later fails because the proof narrative is incomplete
TheoremCrystallizer#
Used to produce state.formal_statement.
Relevant code:
Raise this when:
the theorem statement is truncated
the theorem ends mid-formula
the theorem block is missing assumptions or terms that are present in the assembled proof
Verifier#
Used for:
lemma peer review
consistency checking
Relevant code:
Raise this when:
reviewer outputs are too terse
consistency checks miss obvious issues
Formalizer / Refiner#
Shared budget for several theorem-support stages:
FormalizerRefinerCounterexampleSearcherResourceAnalystPaperReader
Relevant code:
Raise this when:
paper extraction is too shallow
counterexample search is too thin
refinement suggestions are incomplete
Practical Guidance#
If the assembled proof is truncated#
Increase:
AssemblerProverif lemma proofs are also too short
If the theorem statement is truncated#
Increase:
TheoremCrystallizerAssemblerif the proof narrative itself is incomplete
If default planning is weak#
Increase:
ArchitectProver
If memory_guided planning is weak#
Increase:
AnalystDecomposerProver
If writer or survey outputs are too short#
Increase:
Agent loop
Important Distinction#
These controls are not the same as the model’s full context window.
They only control:
how much a single stage may output in one call
They do not directly control:
how much total conversation history can be sent
how large the model’s full context window is
So a stage can fail in two different ways:
the stage output is too short because its own token limit is too low
the whole request is too heavy or too complex even though the per-call limit is large enough
Current UI Coverage#
The UI now exposes the major token-limit controls used by both theory pipelines, including:
ArchitectAssemblerTheoremCrystallizerAnalystSketch
If you add new max_tokens_* fields in the backend later, remember to update: