Agentic Integration Plan¶
This page describes the planned boundary between ooai-llm, future
ooai-agents, LangChain create_agent, raw LangGraph graphs, and Deep Agents.
The short version:
ooai-llm
├── model/runtime substrate
├── provider and model catalog logic
├── serializable profiles and lazy runtimes
├── usage, cost, context, cache, and metadata accounting
└── LangChain-compatible runnable adapters
ooai-agents
├── agent profiles and graph templates
├── prompts, tools, memory, subagents, skills, and workflows
├── LangGraph and Deep Agents orchestration
└── application-level routing decisions
ooai-llm should be boring, stable infrastructure. ooai-agents can then be
more opinionated about agent behavior without reimplementing provider/model
plumbing.
What ooai-llm Should Implement¶
Planned primitives:
ChatModelProfile: serializable provider/model configuration.LLM: lazy runtime wrapper withid,uuid,runnable, usage/cost totals, cache behavior, logging metadata, and refresh support.LLMRegistry: keyed access to manyLLMruntimes with lookup by key, runtime id, profile id, or runtime UUID.ContextBudget: reserved output tokens, warning threshold, hard limit, and max context-used percentage.ContextSnapshot: model, max context tokens, estimated input tokens, reserved output tokens, remaining tokens, context-used percentage, and count source.UsageRecorder: shared usage and cost events across runtimes, graph nodes, subagents, and callbacks.Minimal adapter contracts such as
AgentInjection,PromptPolicy,ToolPolicy, andAgentRoutingDecisionwhen they are useful without a full agent package.
The most important interop rule:
model = runtime.runnable
runtime.runnable is the stable boundary for LangChain, LangGraph, and Deep
Agents. runtime.invoke(...) remains an ooai-llm convenience path for direct
calls with built-in usage recording.
Why This Split¶
This avoids a muddled object that is sometimes a model, sometimes an agent, and sometimes a graph. The separation keeps responsibilities clear:
Model choice, pricing, context, caches, and provider kwargs belong in
ooai-llm.Prompt assembly, tool choice, graph topology, memory, subagents, and workflows belong in
ooai-agents.LangChain and Deep Agents adapters should stay thin so they can change when upstream middleware APIs evolve.
The result is that ooai-agents can switch among LangChain create_agent, raw
LangGraph, and Deep Agents without rewriting model accounting.
LangChain create_agent¶
Target shape:
from langchain.agents import create_agent
from ooai_llm import ChatModelProfile, LLMRegistry
from ooai_llm.agentic import ContextBudget, PromptPolicy, ToolPolicy
registry = LLMRegistry.from_profiles(
{
"cheap": ChatModelProfile(model="openai:gpt-5.4-mini", temperature=0),
"coding": ChatModelProfile(model="anthropic:claude-sonnet-4.6"),
}
)
augmented = registry.augment(
"coding",
prompt=PromptPolicy.static("Review code carefully and return concrete fixes."),
tools=ToolPolicy(
tools=[read_file, edit_file, run_tests],
selection="filtered",
allow_parallel_tool_calls=False,
max_tool_retries=2,
),
context=ContextBudget(max_used_percent=80, reserve_output_tokens=2_000),
)
agent = create_agent(**augmented.to_create_agent_kwargs())
The exported kwargs should include only the pieces LangChain needs:
model: usuallyruntime.runnableor a request-scoped bound runnable.tools: all candidate tools for the agent.middleware: prompt, model-routing, tool-policy, context, and accounting middleware.Optional
response_formatandcontext_schema.
Middleware should select models and tools late:
def select_runtime(request, registry):
if request.state.get("task_type") == "code":
return registry["coding"]
return registry["cheap"]
middleware = registry.model_router(selector=select_runtime)
Raw LangGraph¶
For lower-level LangGraph nodes, ooai-agents can use either the runnable or
the runtime wrapper:
async def coding_node(state, config):
runtime = registry["coding"]
snapshot = runtime.context_snapshot(
messages=state["messages"],
reserve_output_tokens=2_000,
)
if snapshot.context_used_percent > 85:
state = await summarize_or_trim(state, snapshot=snapshot)
response = await runtime.ainvoke(state["messages"], config=config)
return {"messages": [response]}
Use runtime.runnable directly when LangGraph or LangChain should own callback
execution. Use runtime.invoke(...) / runtime.ainvoke(...) when
ooai-llm should record usage directly from the response metadata.
Deep Agents¶
Deep Agents already owns the deep-agent harness: planning, todo, filesystem,
skills, subagents, backend/store integration, context offloading, and
summarization. ooai-llm should feed configured models and accounting metadata
into that harness, not duplicate it.
Target shape:
from deepagents import create_deep_agent
research = registry.augment(
"research",
prompt=PromptPolicy.static("Research only. Return citations."),
tools=ToolPolicy(tools=[web_search, read_file]),
)
agent = create_deep_agent(
**registry.augment("default").to_deep_agent_kwargs(
subagents=[research.to_subagent(name="research-agent")],
),
backend=backend,
store=store,
)
Deep Agents integration rules:
Prefer
runtime.runnablewhen provider kwargs, reasoning config, caches, or metadata need to be preserved.Treat each subagent as a separate augmented binding with its own runtime UUID and cost labels.
Record parent/child UUIDs in routing decisions so subagent spend can be traced.
Let Deep Agents assemble its own harness prompt.
PromptPolicysupplies the user/task system instructions, not the entire deep-agent prompt.Context snapshots should complement Deep Agents summarization/offloading by exposing context-used percentage and budget warnings.
Future ooai-agents¶
ooai-agents can consume ooai-llm like this:
registry = LLMRegistry.from_profiles(
{
"default": ChatModelProfile(model="openai:gpt-5.4-mini"),
"research": ChatModelProfile(model="anthropic:claude-sonnet-4.6"),
}
)
agent_profile = AgentProfile(
name="research_assistant",
default_llm="default",
subagents=[
SubagentProfile(
name="research",
llm="research",
tools=["web_search", "read_file"],
)
],
)
agent = create_ooai_agent(profile=agent_profile, llms=registry)
In this shape, ooai-agents decides what an agent is. ooai-llm supplies the
registered model runtimes, UUIDs, usage recorder, model metadata, context
percentages, and LangChain-compatible runnable objects.
Implementation Phases¶
Add pure data contracts. Implement
ContextBudget,ContextSnapshot,AgentRoutingDecision, and minimal lookup-friendly registry types without importing LangChain middleware internals.Add
LLMRegistry. Support keyed runtime construction from profiles and suites, sharedUsageRecorder, lookup by key/id/profile id/UUID, and aggregate summaries.Add context estimation. Provide
runtime.context_snapshot(...)for strings/messages/files using best-effort local counting first, with provider preflight counting as a later extension.Add prompt/tool policies. Support strings, callables, LangChain
PromptTemplate,ChatPromptTemplate, tool filtering, parallel-tool-call preferences, tool retry policy, and response-format metadata.Add LangChain adapters. Export
to_create_agent_kwargs(), model-routing middleware, dynamic prompt middleware, tool-policy middleware, and accounting middleware.Add Deep Agents adapters. Export
to_deep_agent_kwargs()andto_subagent(...)without reimplementing Deep Agents memory, filesystem, skills, todo, or summarization behavior.Add
ooai-agentsconsumption examples. Useooai-llmas the model substrate whileooai-agentsowns graph and workflow orchestration.
Testing Gates¶
Each phase should land with focused tests:
Serialization tests for new Pydantic models.
Registry lookup tests by key, id, profile id, and UUID.
Context percentage tests using deterministic fake token counts.
Prompt policy tests without invoking an LLM.
Tool policy tests that verify selected tools and parallel-tool-call behavior.
Middleware smoke tests using fake LangChain request objects.
Deep Agents adapter tests that assert exported kwargs shape without requiring live provider calls.