Research Tools#

Tools are callable functions exposed to agents via the Anthropic tool-use API. Each tool has a name, description, input schema, and async call() method.

Tool Architecture#

class BaseTool(ABC):
    name: ClassVar[str]
    description: ClassVar[str]

    def input_schema(self) -> dict: ...    # JSON Schema for inputs
    async def call(self, **kwargs) -> str: ...   # returns JSON string
    def to_anthropic_tool_def(self) -> dict: ... # format for API

Tools are stored in a ToolRegistry. The default registry (build_default_registry()) includes the 7 built-in tools. Domain plugins can add extra tools via DomainPlugin.register_tools().


Built-in Tools#




lean4_verify#

File: eurekaclaw/tools/lean4.py

Purpose: Formally verify a proof using the Lean4 theorem prover.

Inputs:

Parameter

Type

Default

Description

proof_code

string

required

Lean4 proof code

theorem_name

string

""

Optional theorem name for reporting

Output:

{
  "verified": true,
  "theorem": "my_theorem",
  "message": "Proof checked successfully"
}

Or on failure:

{
  "verified": false,
  "lean4_output": "error: ...",
  "message": "Verification failed"
}

External dependency: Lean4 binary at LEAN4_BIN (default: lean). Imports Mathlib and Aesop. Timeout: 120 seconds. Max heartbeats: 400,000.


execute_python (under development)#

Warning

Safe sandboxed code execution is future work. Without Docker properly configured, this tool runs LLM-generated Python directly in a host subprocess with no filesystem or network isolation. Do not enable EXPERIMENT_MODE until a future release adds proper sandbox support.

File: eurekaclaw/tools/code_exec.py

Purpose: Execute Python code for numerical experiments and sanity checks.

Inputs:

Parameter

Type

Default

Description

code

string

required

Python code to execute

requirements

list[string]

[]

Extra packages to install before running

Output:

{"output": "stdout + stderr from execution"}

Or on error:

{"error": "exception message"}

Sandbox: Subprocess with 30-second timeout. Set USE_DOCKER_SANDBOX=true to run in a Docker container (python:3.11-slim, 512 MB RAM, network disabled) instead of the host. If Docker is unavailable, falls back silently to host subprocess. Package installation uses uv pip (falls back to pip).


wolfram_alpha#

File: eurekaclaw/tools/wolfram.py

Purpose: Symbolic computation, formula simplification, and bound verification.

Inputs:

Parameter

Type

Default

Description

query

string

required

Natural language or symbolic query

Output: JSON array of Wolfram Alpha pods:

[{"title": "Result", "result": "..."}]

External dependency: Wolfram Alpha API v2. Requires WOLFRAM_APP_ID.


citation_manager#

File: eurekaclaw/tools/citation.py

Purpose: Generate BibTeX entries and format citation keys consistently.

Actions:

Action

Description

generate_bibtex

Generate a BibTeX entry from paper metadata

format_cite

Return the \cite{key} command for a paper

list_entries

List all citation entries in the current session

Output: JSON with cite_key and bibtex strings.

Note: Uses the same key-generation algorithm as _generate_bibtex in main.py to ensure consistency between the writer’s \cite{} commands and the .bib file.


ToolRegistry#

File: eurekaclaw/tools/registry.py

class ToolRegistry:
    def register(tool: BaseTool) -> None
    def get(name: str) -> BaseTool | None
    def all_definitions() -> list[dict]         # all tools as Anthropic defs
    def definitions_for(names: list[str]) -> list[dict]  # subset
    async def call(name: str, inputs: dict) -> str
    def __contains__(name: str) -> bool
    def __len__() -> int

def build_default_registry() -> ToolRegistry   # create with all 7 built-in tools

Domain-Specific Tools#

Domain plugins can register additional tools via DomainPlugin.register_tools(registry).

MAB Domain: run_bandit_experiment#

File: eurekaclaw/domains/mab/tools/bandit_tool.py

Purpose: Run multi-armed bandit simulations to empirically validate regret bounds.

Inputs:

Parameter

Type

Description

algorithm

string

ucb1 or thompson_sampling

n_arms

integer

Number of arms K

n_rounds

integer

Time horizon T

distribution

string

gaussian or bernoulli

n_trials

integer

Monte Carlo trials for averaging

Output: JSON with empirical regret, per-arm stats, and comparison against theoretical bound.

Supporting modules:

  • domains/mab/envs/stochastic.pyGaussianBandit, BernoulliBandit

  • domains/mab/envs/runner.pyrun_experiment(), sweep_T()

  • domains/mab/tools/concentration.py — Hoeffding, Bernstein, sub-Gaussian bounds

  • domains/mab/tools/regret.py — Regret decomposition, Lai-Robbins lower bound

  • domains/mab/tools/information.py — KL(Bernoulli), KL(Gaussian), Fano’s inequality