Agentic Integration Plan¶

This page describes the planned boundary between ooai-llm, future ooai-agents, LangChain create_agent, raw LangGraph graphs, and Deep Agents.

The short version:

ooai-llm
├── model/runtime substrate
├── provider and model catalog logic
├── serializable profiles and lazy runtimes
├── usage, cost, context, cache, and metadata accounting
└── LangChain-compatible runnable adapters

ooai-agents
├── agent profiles and graph templates
├── prompts, tools, memory, subagents, skills, and workflows
├── LangGraph and Deep Agents orchestration
└── application-level routing decisions

ooai-llm should be boring, stable infrastructure. ooai-agents can then be more opinionated about agent behavior without reimplementing provider/model plumbing.

What `ooai-llm` Should Implement¶

Planned primitives:

ChatModelProfile: serializable provider/model configuration.
LLM: lazy runtime wrapper with id, uuid, runnable, usage/cost totals, cache behavior, logging metadata, and refresh support.
LLMRegistry: keyed access to many LLM runtimes with lookup by key, runtime id, profile id, or runtime UUID.
ContextBudget: reserved output tokens, warning threshold, hard limit, and max context-used percentage.
ContextSnapshot: model, max context tokens, estimated input tokens, reserved output tokens, remaining tokens, context-used percentage, and count source.
UsageRecorder: shared usage and cost events across runtimes, graph nodes, subagents, and callbacks.
Minimal adapter contracts such as AgentInjection, PromptPolicy, ToolPolicy, and AgentRoutingDecision when they are useful without a full agent package.

The most important interop rule:

model = runtime.runnable

runtime.runnable is the stable boundary for LangChain, LangGraph, and Deep Agents. runtime.invoke(...) remains an ooai-llm convenience path for direct calls with built-in usage recording.

Why This Split¶

This avoids a muddled object that is sometimes a model, sometimes an agent, and sometimes a graph. The separation keeps responsibilities clear:

Model choice, pricing, context, caches, and provider kwargs belong in ooai-llm.
Prompt assembly, tool choice, graph topology, memory, subagents, and workflows belong in ooai-agents.
LangChain and Deep Agents adapters should stay thin so they can change when upstream middleware APIs evolve.

The result is that ooai-agents can switch among LangChain create_agent, raw LangGraph, and Deep Agents without rewriting model accounting.

LangChain `create_agent`¶

Target shape:

from langchain.agents import create_agent
from ooai_llm import ChatModelProfile, LLMRegistry
from ooai_llm.agentic import ContextBudget, PromptPolicy, ToolPolicy

registry = LLMRegistry.from_profiles(
    {
        "cheap": ChatModelProfile(model="openai:gpt-5.4-mini", temperature=0),
        "coding": ChatModelProfile(model="anthropic:claude-sonnet-4.6"),
    }
)

augmented = registry.augment(
    "coding",
    prompt=PromptPolicy.static("Review code carefully and return concrete fixes."),
    tools=ToolPolicy(
        tools=[read_file, edit_file, run_tests],
        selection="filtered",
        allow_parallel_tool_calls=False,
        max_tool_retries=2,
    ),
    context=ContextBudget(max_used_percent=80, reserve_output_tokens=2_000),
)

agent = create_agent(**augmented.to_create_agent_kwargs())

The exported kwargs should include only the pieces LangChain needs:

model: usually runtime.runnable or a request-scoped bound runnable.
tools: all candidate tools for the agent.
middleware: prompt, model-routing, tool-policy, context, and accounting middleware.
Optional response_format and context_schema.

Middleware should select models and tools late:

def select_runtime(request, registry):
    if request.state.get("task_type") == "code":
        return registry["coding"]
    return registry["cheap"]


middleware = registry.model_router(selector=select_runtime)

Raw LangGraph¶

For lower-level LangGraph nodes, ooai-agents can use either the runnable or the runtime wrapper:

async def coding_node(state, config):
    runtime = registry["coding"]
    snapshot = runtime.context_snapshot(
        messages=state["messages"],
        reserve_output_tokens=2_000,
    )

    if snapshot.context_used_percent > 85:
        state = await summarize_or_trim(state, snapshot=snapshot)

    response = await runtime.ainvoke(state["messages"], config=config)
    return {"messages": [response]}

Use runtime.runnable directly when LangGraph or LangChain should own callback execution. Use runtime.invoke(...) / runtime.ainvoke(...) when ooai-llm should record usage directly from the response metadata.

Deep Agents¶

Deep Agents already owns the deep-agent harness: planning, todo, filesystem, skills, subagents, backend/store integration, context offloading, and summarization. ooai-llm should feed configured models and accounting metadata into that harness, not duplicate it.

Target shape:

from deepagents import create_deep_agent

research = registry.augment(
    "research",
    prompt=PromptPolicy.static("Research only. Return citations."),
    tools=ToolPolicy(tools=[web_search, read_file]),
)

agent = create_deep_agent(
    **registry.augment("default").to_deep_agent_kwargs(
        subagents=[research.to_subagent(name="research-agent")],
    ),
    backend=backend,
    store=store,
)

Deep Agents integration rules:

Prefer runtime.runnable when provider kwargs, reasoning config, caches, or metadata need to be preserved.
Treat each subagent as a separate augmented binding with its own runtime UUID and cost labels.
Record parent/child UUIDs in routing decisions so subagent spend can be traced.
Let Deep Agents assemble its own harness prompt. PromptPolicy supplies the user/task system instructions, not the entire deep-agent prompt.
Context snapshots should complement Deep Agents summarization/offloading by exposing context-used percentage and budget warnings.

Future `ooai-agents`¶

ooai-agents can consume ooai-llm like this:

registry = LLMRegistry.from_profiles(
    {
        "default": ChatModelProfile(model="openai:gpt-5.4-mini"),
        "research": ChatModelProfile(model="anthropic:claude-sonnet-4.6"),
    }
)

agent_profile = AgentProfile(
    name="research_assistant",
    default_llm="default",
    subagents=[
        SubagentProfile(
            name="research",
            llm="research",
            tools=["web_search", "read_file"],
        )
    ],
)

agent = create_ooai_agent(profile=agent_profile, llms=registry)

In this shape, ooai-agents decides what an agent is. ooai-llm supplies the registered model runtimes, UUIDs, usage recorder, model metadata, context percentages, and LangChain-compatible runnable objects.

Implementation Phases¶

Add pure data contracts. Implement ContextBudget, ContextSnapshot, AgentRoutingDecision, and minimal lookup-friendly registry types without importing LangChain middleware internals.
Add LLMRegistry. Support keyed runtime construction from profiles and suites, shared UsageRecorder, lookup by key/id/profile id/UUID, and aggregate summaries.
Add context estimation. Provide runtime.context_snapshot(...) for strings/messages/files using best-effort local counting first, with provider preflight counting as a later extension.
Add prompt/tool policies. Support strings, callables, LangChain PromptTemplate, ChatPromptTemplate, tool filtering, parallel-tool-call preferences, tool retry policy, and response-format metadata.
Add LangChain adapters. Export to_create_agent_kwargs(), model-routing middleware, dynamic prompt middleware, tool-policy middleware, and accounting middleware.
Add Deep Agents adapters. Export to_deep_agent_kwargs() and to_subagent(...) without reimplementing Deep Agents memory, filesystem, skills, todo, or summarization behavior.
Add ooai-agents consumption examples. Use ooai-llm as the model substrate while ooai-agents owns graph and workflow orchestration.

Testing Gates¶

Each phase should land with focused tests:

Serialization tests for new Pydantic models.
Registry lookup tests by key, id, profile id, and UUID.
Context percentage tests using deterministic fake token counts.
Prompt policy tests without invoking an LLM.
Tool policy tests that verify selected tools and parallel-tool-call behavior.
Middleware smoke tests using fake LangChain request objects.
Deep Agents adapter tests that assert exported kwargs shape without requiring live provider calls.