Agents#
EurekaClaw has five specialized agents coordinated by the MetaOrchestrator. Each agent runs a tool-use loop with periodic context compression.
BaseAgent#
All agents inherit from eurekaclaw/agents/base.py:
Key Methods:
Method |
Description |
|---|---|
|
Abstract. Run the agent on a task |
|
Abstract. Return allowed tool names for this agent |
|
Combine role prompt + injected skills |
|
Tool-use loop with context compression |
|
Summarize conversation with fast model every N turns |
|
LLM call with exponential-backoff retry |
Context Compression: Every CONTEXT_COMPRESS_AFTER_TURNS turns (default 6), history longer than 12 messages is compressed to a single summary using the fast model. This bounds input token growth across long runs.
SurveyAgent#
Role: SURVEY
File: eurekaclaw/agents/survey/agent.py
Max Turns: SURVEY_MAX_TURNS (default 8)
Purpose: Search the literature and populate the KnowledgeBus with papers, open problems, and key mathematical objects.
Tools:
arxiv_search— Find relevant papers on arXivsemantic_scholar_search— Search Semantic Scholar for citation counts and metadataweb_search— Supplement with general web searchcitation_manager— Format and store references
Inputs (from KnowledgeBus):
ResearchBrief.domainResearchBrief.queryResearchBrief.conjecture
Outputs:
Appends papers to
Bibliographyon the busWrites to
ResearchBrief:open_problems— list of identified open problemskey_mathematical_objects— core concepts and structures
Output JSON keys: papers, open_problems, key_mathematical_objects, research_frontier, insights
IdeationAgent#
Role: IDEATION
File: eurekaclaw/agents/ideation/agent.py
Max Turns: 3
Purpose: Generate 5 novel research hypotheses from survey findings. Each direction is scored on novelty_score, feasibility_score, and impact_score (mapped internally to the ResearchDirection fields novelty_score, soundness_score, transformative_score).
Direction selection does not happen inside IdeationAgent. After IdeationAgent writes ResearchBrief.directions, the orchestrator’s direction_selection_gate task invokes DivergentConvergentPlanner.converge() to pick the highest-scoring direction and set ResearchBrief.selected_direction.
Inputs (from KnowledgeBus):
Survey findings (
ResearchBrief)Bibliography
Outputs:
ResearchBrief.directions— 5ResearchDirectionobjects with composite scores
TheoryAgent#
Role: THEORY
File: eurekaclaw/agents/theory/agent.py
Max Iterations: THEORY_MAX_ITERATIONS (default 10)
Inner Stage Max Turns: THEORY_STAGE_MAX_TURNS (default 6)
Purpose: Prove the selected research direction via a 7-stage bottom-up proof pipeline.
Tools:
arxiv_search— Look up lemmas and techniques from paperswolfram_alpha— Symbolic computation and bound verificationlean4_verify— Formal proof verification in Lean4execute_python— Numerical checks and sanity tests
Inputs (from KnowledgeBus):
ResearchBrief.selected_direction
Outputs:
TheoryStatewithstatus=proved/refuted/abandoned
7-Stage Inner Loop (inner_loop_yaml.py):
Stage |
Class |
Input |
Output |
|---|---|---|---|
1 |
|
|
|
2 |
|
known_results + conjecture |
|
3 |
|
research_gap + known_results |
|
4 |
|
proof_plan, open_goals |
|
5 |
|
proven_lemmas |
|
6 |
|
assembled_proof |
|
7 |
|
full TheoryState |
consistency report |
LemmaDeveloper inner loop (per lemma):
for each open_goal:
Prover → Verifier → (if failed) Refiner → repeat
CounterexampleSearcher runs in parallel
Stagnation detection: if same error N times → force Refiner
Provenance system: Each lemma in the proof plan is annotated as known (directly citable), adapted (needs modification), or new (must be fully proved). Only adapted and new lemmas enter the proof loop.
Auto-verify: Proofs with confidence ≥ AUTO_VERIFY_CONFIDENCE (default 0.95) are accepted without an LLM verifier call. The LLM Verifier itself uses a separate pass threshold VERIFIER_PASS_CONFIDENCE (default 0.90).
ProofArchitect retry policy: If the full provenance-annotated plan fails (e.g. the LLM returns a field as null), the architect retries with a simplified 3-lemma prompt (foundational → central bound → main result). Only if both attempts fail does it fall back to a single main_result goal.
Outer iteration loop: After the Assembler runs, TheoremCrystallizer + ConsistencyChecker iterate up to theory_max_iterations times. The ConsistencyChecker classifies every failure into one of three severity levels, and the retry path is chosen accordingly:
Severity |
Meaning |
Retry path |
|---|---|---|
|
Proof logic is sound but proved lemmas are not cited in the assembled text |
Re-run |
|
A specific lemma is incorrect or the logical link between two lemmas is broken |
Re-run |
|
Fundamental proof breakdown — wrong approach or multiple incorrect lemmas |
Re-run from |
If the LLM does not return a severity field, it is inferred heuristically: failures with only uncited_lemmas and no issues are classified as uncited; all others as major.
Citation convention: The Assembler is instructed to cite every proved lemma by its identifier in square brackets, e.g. By [arm_pull_count_bound], .... The ConsistencyChecker verifies that all proved lemma IDs appear in the assembled proof and flags any that are missing.
Knowledge Graph writes: Lemma nodes and dependency edges are written to the Tier 3 KG whenever lemmas are proved, regardless of whether the final consistency check passes. This preserves the lemma-level graph even when the theorem statement crystallization fails.
ExperimentAgent (under development)#
Note
The ExperimentAgent and the execute_python tool are future work. Automated code execution against LLM-generated Python is not yet safely sandboxed for general use. Keep EXPERIMENT_MODE=false until proper sandboxing lands in a future release.
Role: EXPERIMENT
File: eurekaclaw/agents/experiment/agent.py
Max Turns: 5
Purpose: Empirically validate theoretical bounds via numerical experiments, particularly for low-confidence lemmas.
Tools:
execute_python— Run numerical simulations (future work — see note above)wolfram_alpha— Symbolic bound checkingDomain-specific tools (e.g.,
run_bandit_experimentfor MAB domain)
Auto-skip logic: The agent checks TheoryState.formal_statement for quantitative signals before running:
Run experiments:
O(,\Omega(, inequality operators,sample complexity,regret, convergence/generalization termsSkip:
\exists, existence quantifiers, pure algebraic/combinatorial structures
Low-confidence lemma testing:
Separates
proven_lemmasintoverifiedandlow_confidencegroupsFor each low-confidence lemma: samples random instances, checks the conclusion holds, computes
violation_rateLemmas with
violation_rate > 1%are flagged asnumerically_suspect
Inputs (from KnowledgeBus):
TheoryState(proven lemmas, formal statement)
Outputs:
ExperimentResulton KnowledgeBus withalignment_scoreand per-lemma check results
WriterAgent#
Role: WRITER
File: eurekaclaw/agents/writer/agent.py
Max Turns: WRITER_MAX_TURNS (default 4)
Purpose: Produce a complete academic paper in LaTeX (or Markdown) from all pipeline artifacts.
Tools:
citation_manager— Format bibliography entries and generate consistent cite keys
Inputs (from KnowledgeBus):
ResearchBrief(domain, conjecture, selected direction)TheoryState(all proofs, lemmas, formal statement)Bibliography(papers with exact cite keys)ExperimentResult(if available)
Outputs:
latex_paperstring stored inResearchOutput
LaTeX features:
Full
LATEX_PREAMBLEwith 13 theorem environments:theorem,lemma,corollary,definition,proposition,assumption,conjecture,claim,example,fact,observation,maintheorem,remarkCommon math macros:
\R,\N,\Z,\E,\Prob,\softmax,\Att,\argmax,\argmin,\norm,\abs,\inner
_extract_latex normalization pipeline:
Strip preamble /
\begin{document}/\end{document}wrappersNormalize environment names (
\begin{Proof}→\begin{proof},rem→remark,prop→proposition, etc.)Convert Markdown headings to LaTeX section commands
Remove
tikzpictureenvironments and emptyfigureenvironmentsFix syntax errors (
\begin lemma}→\begin{lemma},\begin{flushright→\begin{flushright})Replace
\endproofwith\end{proof}Remove manual QED boxes (
\begin{flushright}$\square$\end{flushright})Strip
\bibliographystyleand\bibliographylines from the bodyClose unclosed environments (two-pass: remove orphaned
\end{X}, then append missing ones)
Proof style enforcement (ENFORCE_PROOF_STYLE=true):
No skip words (“clearly”, “trivially”, “by standard arguments”) without immediate justification
Every inequality must name the lemma or theorem it uses
Each lemma proof opens with informal intuition
Low-confidence lemmas tagged with
\textcolor{orange}{[Unverified step]}Limitations section explains all unverified steps
Citation consistency: The writer prompt includes \cite{key} for each reference using the same key generation algorithm as _generate_bibtex in main.py.