Agentic Integration Plan

This page describes the planned boundary between ooai-llm, future ooai-agents, LangChain create_agent, raw LangGraph graphs, and Deep Agents.

The short version:

ooai-llm
├── model/runtime substrate
├── provider and model catalog logic
├── serializable profiles and lazy runtimes
├── usage, cost, context, cache, and metadata accounting
└── LangChain-compatible runnable adapters

ooai-agents
├── agent profiles and graph templates
├── prompts, tools, memory, subagents, skills, and workflows
├── LangGraph and Deep Agents orchestration
└── application-level routing decisions

ooai-llm should be boring, stable infrastructure. ooai-agents can then be more opinionated about agent behavior without reimplementing provider/model plumbing.

What ooai-llm Should Implement

Planned primitives:

  • ChatModelProfile: serializable provider/model configuration.

  • LLM: lazy runtime wrapper with id, uuid, runnable, usage/cost totals, cache behavior, logging metadata, and refresh support.

  • LLMRegistry: keyed access to many LLM runtimes with lookup by key, runtime id, profile id, or runtime UUID.

  • ContextBudget: reserved output tokens, warning threshold, hard limit, and max context-used percentage.

  • ContextSnapshot: model, max context tokens, estimated input tokens, reserved output tokens, remaining tokens, context-used percentage, and count source.

  • UsageRecorder: shared usage and cost events across runtimes, graph nodes, subagents, and callbacks.

  • Minimal adapter contracts such as AgentInjection, PromptPolicy, ToolPolicy, and AgentRoutingDecision when they are useful without a full agent package.

The most important interop rule:

model = runtime.runnable

runtime.runnable is the stable boundary for LangChain, LangGraph, and Deep Agents. runtime.invoke(...) remains an ooai-llm convenience path for direct calls with built-in usage recording.

Why This Split

This avoids a muddled object that is sometimes a model, sometimes an agent, and sometimes a graph. The separation keeps responsibilities clear:

  • Model choice, pricing, context, caches, and provider kwargs belong in ooai-llm.

  • Prompt assembly, tool choice, graph topology, memory, subagents, and workflows belong in ooai-agents.

  • LangChain and Deep Agents adapters should stay thin so they can change when upstream middleware APIs evolve.

The result is that ooai-agents can switch among LangChain create_agent, raw LangGraph, and Deep Agents without rewriting model accounting.

LangChain create_agent

Target shape:

from langchain.agents import create_agent
from ooai_llm import ChatModelProfile, LLMRegistry
from ooai_llm.agentic import ContextBudget, PromptPolicy, ToolPolicy

registry = LLMRegistry.from_profiles(
    {
        "cheap": ChatModelProfile(model="openai:gpt-5.4-mini", temperature=0),
        "coding": ChatModelProfile(model="anthropic:claude-sonnet-4.6"),
    }
)

augmented = registry.augment(
    "coding",
    prompt=PromptPolicy.static("Review code carefully and return concrete fixes."),
    tools=ToolPolicy(
        tools=[read_file, edit_file, run_tests],
        selection="filtered",
        allow_parallel_tool_calls=False,
        max_tool_retries=2,
    ),
    context=ContextBudget(max_used_percent=80, reserve_output_tokens=2_000),
)

agent = create_agent(**augmented.to_create_agent_kwargs())

The exported kwargs should include only the pieces LangChain needs:

  • model: usually runtime.runnable or a request-scoped bound runnable.

  • tools: all candidate tools for the agent.

  • middleware: prompt, model-routing, tool-policy, context, and accounting middleware.

  • Optional response_format and context_schema.

Middleware should select models and tools late:

def select_runtime(request, registry):
    if request.state.get("task_type") == "code":
        return registry["coding"]
    return registry["cheap"]


middleware = registry.model_router(selector=select_runtime)

Raw LangGraph

For lower-level LangGraph nodes, ooai-agents can use either the runnable or the runtime wrapper:

async def coding_node(state, config):
    runtime = registry["coding"]
    snapshot = runtime.context_snapshot(
        messages=state["messages"],
        reserve_output_tokens=2_000,
    )

    if snapshot.context_used_percent > 85:
        state = await summarize_or_trim(state, snapshot=snapshot)

    response = await runtime.ainvoke(state["messages"], config=config)
    return {"messages": [response]}

Use runtime.runnable directly when LangGraph or LangChain should own callback execution. Use runtime.invoke(...) / runtime.ainvoke(...) when ooai-llm should record usage directly from the response metadata.

Deep Agents

Deep Agents already owns the deep-agent harness: planning, todo, filesystem, skills, subagents, backend/store integration, context offloading, and summarization. ooai-llm should feed configured models and accounting metadata into that harness, not duplicate it.

Target shape:

from deepagents import create_deep_agent

research = registry.augment(
    "research",
    prompt=PromptPolicy.static("Research only. Return citations."),
    tools=ToolPolicy(tools=[web_search, read_file]),
)

agent = create_deep_agent(
    **registry.augment("default").to_deep_agent_kwargs(
        subagents=[research.to_subagent(name="research-agent")],
    ),
    backend=backend,
    store=store,
)

Deep Agents integration rules:

  • Prefer runtime.runnable when provider kwargs, reasoning config, caches, or metadata need to be preserved.

  • Treat each subagent as a separate augmented binding with its own runtime UUID and cost labels.

  • Record parent/child UUIDs in routing decisions so subagent spend can be traced.

  • Let Deep Agents assemble its own harness prompt. PromptPolicy supplies the user/task system instructions, not the entire deep-agent prompt.

  • Context snapshots should complement Deep Agents summarization/offloading by exposing context-used percentage and budget warnings.

Future ooai-agents

ooai-agents can consume ooai-llm like this:

registry = LLMRegistry.from_profiles(
    {
        "default": ChatModelProfile(model="openai:gpt-5.4-mini"),
        "research": ChatModelProfile(model="anthropic:claude-sonnet-4.6"),
    }
)

agent_profile = AgentProfile(
    name="research_assistant",
    default_llm="default",
    subagents=[
        SubagentProfile(
            name="research",
            llm="research",
            tools=["web_search", "read_file"],
        )
    ],
)

agent = create_ooai_agent(profile=agent_profile, llms=registry)

In this shape, ooai-agents decides what an agent is. ooai-llm supplies the registered model runtimes, UUIDs, usage recorder, model metadata, context percentages, and LangChain-compatible runnable objects.

Implementation Phases

  1. Add pure data contracts. Implement ContextBudget, ContextSnapshot, AgentRoutingDecision, and minimal lookup-friendly registry types without importing LangChain middleware internals.

  2. Add LLMRegistry. Support keyed runtime construction from profiles and suites, shared UsageRecorder, lookup by key/id/profile id/UUID, and aggregate summaries.

  3. Add context estimation. Provide runtime.context_snapshot(...) for strings/messages/files using best-effort local counting first, with provider preflight counting as a later extension.

  4. Add prompt/tool policies. Support strings, callables, LangChain PromptTemplate, ChatPromptTemplate, tool filtering, parallel-tool-call preferences, tool retry policy, and response-format metadata.

  5. Add LangChain adapters. Export to_create_agent_kwargs(), model-routing middleware, dynamic prompt middleware, tool-policy middleware, and accounting middleware.

  6. Add Deep Agents adapters. Export to_deep_agent_kwargs() and to_subagent(...) without reimplementing Deep Agents memory, filesystem, skills, todo, or summarization behavior.

  7. Add ooai-agents consumption examples. Use ooai-llm as the model substrate while ooai-agents owns graph and workflow orchestration.

Testing Gates

Each phase should land with focused tests:

  • Serialization tests for new Pydantic models.

  • Registry lookup tests by key, id, profile id, and UUID.

  • Context percentage tests using deterministic fake token counts.

  • Prompt policy tests without invoking an LLM.

  • Tool policy tests that verify selected tools and parallel-tool-call behavior.

  • Middleware smoke tests using fake LangChain request objects.

  • Deep Agents adapter tests that assert exported kwargs shape without requiring live provider calls.