DSPy Integration Plan¶
This page records the DSPy integration contract across ooai-llm and the
future ooai-agents package. The ooai-llm LM substrate is implemented; the
program, runnable, node, optimizer, and artifact layers remain planned for
ooai-agents.
The short version:
ooai-llm
|-- Owns DSPy model configuration from ChatModelProfile
|-- Creates and configures dspy.LM instances
|-- Extracts DSPy usage/cost into UsageRecorder
`-- Exposes thin, optional helpers behind ooai-llm[dspy]
ooai-agents
|-- Owns DSPy programs, signatures, modules, tools, and optimizers
|-- Wraps DSPy programs as LangChain Runnables or LangGraph nodes
|-- Converts dspy.Prediction outputs into messages, dict updates, or typed data
`-- Stores optimized DSPy artifacts and evaluation results
Research Summary¶
DSPy is not just another model provider. It is a framework for programming language-model systems through declarative signatures, modules, adapters, and optimizers.
The main pieces relevant to OOAI are:
dspy.LM: model client configured with LiteLLM-style model strings such asopenai/gpt-5-mini,anthropic/claude-sonnet-..., orgemini/gemini-2.5-flash.Signatures: typed input/output contracts such as
"question -> answer: float"or class-baseddspy.Signaturedeclarations.Modules: reusable strategies such as
Predict,ChainOfThought,ReAct,ProgramOfThought,CodeAct,Parallel, and customdspy.Moduleclasses.Adapters: conversion from signatures and examples into model messages, including chat, JSON, XML, and two-step styles.
Optimizers: offline or experiment-time compilation methods such as
BootstrapFewShot,MIPROv2,GEPA,SIMBA, and finetuning workflows.Runtime concerns: DSPy has its own caching, async, streaming, usage tracking, and save/load paths for compiled programs.
The key design consequence is that ooai-llm should provide model substrate
support, while ooai-agents should provide program and workflow support.
Upstream Stability
DSPy currently documents a planned BaseLM transition for DSPy 3.3 through
4.0. The OOAI integration should avoid subclassing DSPy internals and should
prefer adapter functions and protocols.
Package Boundary¶
ooai-llm implements the minimum useful bridge:
Optional extra:
ooai-llm[dspy].DSPyLMConfig: serializable config fordspy.LM.create_dspy_lm(...): create a DSPy LM fromChatModelProfile,DSPyLMConfig,ModelString, or a raw model string.configure_dspy_lm(...): create the LM and calldspy.configure(lm=...).resolve_dspy_model_name(...): convert OOAI model choices to LiteLLM-style names.extract_dspy_usage(...): pullprediction.get_lm_usage()or LM history usage into OOAI usage events when available.record_dspy_usage(...): record DSPy usage intoUsageRecorder.create_dspy_lm_bundle(...): return a native DSPy LM with resolved model, metadata, config, and trace metadata.Documentation and examples showing how to choose a model from the catalog or profile layer and hand it to DSPy.
ooai-llm should not implement:
DSPy program builders.
DSPy signatures for particular tasks.
Agent workflows or graph topology.
Optimizer jobs and dataset management.
ARC-specific DSPy modules.
ooai-agents should implement the richer layer:
DSPyProgramSpec: declarative program/signature/module definition.DSPyRunnable: LangChain-style runnable wrapper around a DSPy program.DSPyNode: LangGraph node wrapper around a DSPy program.DSPyOutputPolicy: conversion policy fromdspy.Predictionto messages, state updates, JSON, or Pydantic models.DSPyOptimizationJob: optimizer configuration, datasets, metrics, budgets, and artifact paths.DSPyArtifactRegistry: saved compiled programs, run metadata, and eval reports.Domain packages such as ARC, coding, research, and extraction programs.
Model Flow¶
The model flow should stay boring:
from ooai_llm import ChatModelProfile, configure_dspy_lm
profile = ChatModelProfile(
id="dspy-coding",
model="openai:gpt-5-mini",
temperature=0,
max_tokens=2000,
cache={"namespace": "dspy", "key": "coding-v1"},
)
lm = configure_dspy_lm(profile, model_type="responses")
For downstream wrappers that need trace metadata, use the bundle helper:
from ooai_llm import create_dspy_lm_bundle
bundle = create_dspy_lm_bundle(profile, model_type="responses")
print(bundle.lm)
print(bundle.model.as_litellm())
print(bundle.trace_metadata["ooai_profile_id"])
The same bridge is available from an LLM runtime:
runtime = profile.create_runtime(id="coding-runtime")
lm = runtime.create_dspy_lm(model_type="responses")
Internally this should resolve:
ChatModelProfile(model="openai:gpt-5-mini")
`-- ModelString("openai:gpt-5-mini")
`-- "openai/gpt-5-mini"
`-- dspy.LM("openai/gpt-5-mini", ...)
ChatModelProfile remains the single source of model configuration. DSPy-only
options should live in DSPyLMConfig.lm_kwargs or explicit DSPy config fields.
Output Compatibility¶
DSPy returns dspy.Prediction objects. LangChain and LangGraph usually want
messages, runnable outputs, or graph state updates. The compatibility layer
belongs in ooai-agents, not ooai-llm.
Planned output modes:
type DSPyOutputMode = Literal[
"prediction",
"dict",
"ai_message",
"state_update",
"pydantic",
]
Recommended conversions:
prediction: return the rawdspy.Predictionfor pure DSPy workflows.dict: return JSON-safe field values from the prediction.ai_message: return a LangChainAIMessagewhose content is one selected field or a rendered JSON summary.state_update: return a LangGraph state update such as{"messages": [message], "dspy": metadata}.pydantic: validate prediction fields into a configured Pydantic model.
The AIMessage conversion should preserve provenance:
AIMessage(
content=rendered_output,
additional_kwargs={
"dspy": {
"program_id": "arc-hypothesis",
"signature": "task, examples -> hypothesis, confidence: float",
"module": "chain_of_thought",
"fields": prediction_dict,
}
},
response_metadata={
"ooai_runtime_id": runtime.id,
"ooai_runtime_uuid": str(runtime.uuid),
"dspy_program_id": "arc-hypothesis",
},
usage_metadata=usage_metadata,
)
The LangGraph node conversion should return only serializable state:
{
"messages": [ai_message],
"dspy_results": {
"arc-hypothesis": {
"fields": prediction_dict,
"confidence": prediction_dict.get("confidence"),
"usage": usage_snapshot,
}
},
}
Live DSPy objects such as modules, LMs, adapters, or compiled programs should not be stored in graph state. Keep them on the wrapper object or registry.
Traceability Contract¶
DSPy calls should be traceable by default when they are routed through OOAI wrappers. The trace boundary should be the runnable or graph node, not a hidden tool call, so LangChain and LangGraph can show named DSPy program steps.
Recommended trace shape:
agent_or_graph
|-- load_task
|-- dspy.arc_hypothesis
|-- model.call
|-- verify_candidate
`-- dspy.adjudicate
Every DSPy program call should preserve:
{
"ooai_runtime_id": runtime.id,
"ooai_runtime_uuid": str(runtime.uuid),
"ooai_profile_id": runtime.profile.id,
"ooai_model": resolved_model.as_langchain(),
"ooai_litellm_model": resolved_model.as_litellm(),
"dspy_program_id": spec.id,
"dspy_module": spec.module,
"dspy_signature": spec.signature,
"dspy_output_mode": spec.output.mode,
}
Usage and cost should be recorded after every DSPy prediction when DSPy or the underlying LM exposes usage data. Missing usage should not fail the agent run; record an event with a clear count source only when a real count is available.
Preferred tracing rules:
Use
DSPyRunnablefor reusable, traceable program calls.Use
create_dspy_node(...)for LangGraph state updates.Use DSPy as a LangChain tool only when the language model should choose whether to call that program.
Keep runtime id, runtime UUID, program id, signature, module, and output mode in every converted output.
Async, Batch, And Streaming Contract¶
The OOAI wrapper should support the LangChain runnable surface even when DSPy’s native support varies by version, program, or LM:
DSPyRunnable.invoke(...)
DSPyRunnable.ainvoke(...)
DSPyRunnable.batch(...)
DSPyRunnable.abatch(...)
DSPyRunnable.stream(...)
DSPyRunnable.astream(...)
Implementation rules:
If DSPy exposes native async for the program, use it.
If DSPy exposes native streaming for the program, convert streamed chunks through the configured
DSPyOutputPolicy.If async is unavailable, run the sync invocation in a worker thread or executor so LangGraph async paths still work.
If streaming is unavailable, yield one final converted output chunk.
Preserve the same metadata and usage recording behavior across sync, async, batch, and streaming paths.
This means a DSPy program can be used as a normal LangGraph node even before every DSPy module has first-class streaming semantics.
Serialization And Artifacts¶
Serialize declarations and artifacts, not live DSPy clients or modules.
Serializable:
DSPyProgramSpecDSPyOutputPolicyDSPy LM binding reference, usually an OOAI runtime key
optimizer config
dataset and metric references
artifact metadata
compiled program path, hash, and version metadata
Do not put these into config or graph state:
live
dspy.Moduleobjectslive
dspy.LMclientsprovider SDK clients
callbacks
open files
LangGraph runtime objects
The durable shape should be:
Config serializes what to build.
Artifact metadata serializes where compiled DSPy output lives.
Runtime wrappers reconstruct live objects when needed.
Graph state stores only JSON-safe outputs and provenance.
Planned artifact reference:
class DSPyArtifactRef(BaseModel):
id: str
program_id: str
path: Path
sha256: str
dspy_version: str
ooai_agents_version: str
created_at: datetime
optimizer: str | None = None
metric: str | None = None
Loading pickle or cloudpickle-based DSPy artifacts should be treated as a trusted-only operation. The metadata file should be safe to inspect without loading executable artifact payloads.
Target ooai-agents API¶
The target program declaration should be compact:
from ooai_agents.dspy import DSPyOutputPolicy, DSPyProgramSpec
spec = DSPyProgramSpec(
id="arc-hypothesis",
llm="reasoning",
module="chain_of_thought",
signature="task, examples -> hypothesis: str, confidence: float",
output=DSPyOutputPolicy(
mode="state_update",
message_field="hypothesis",
state_key="dspy_results",
),
)
Compile it into a LangGraph node:
from ooai_agents.dspy import create_dspy_node
node = create_dspy_node(spec, llms=registry)
graph.add_node("hypothesis", node)
Compile it into a LangChain-style runnable:
from ooai_agents.dspy import create_dspy_runnable
runnable = create_dspy_runnable(spec, llms=registry)
result = runnable.invoke({"task": task_text, "examples": examples})
Use it inside a normal OOAI agent:
agent = create_agent(
id="arc-agent",
runtimes=registry,
tools=[load_arc_task, verify_candidate],
middleware=[
create_dspy_node_middleware(
programs=[spec],
run_before_model=["arc-hypothesis"],
)
],
)
ooai-agents Module Layout¶
Target package layout:
src/ooai_agents/dspy/
|-- __init__.py
|-- config.py # DSPyProgramSpec, DSPyOutputPolicy, optimizer specs
|-- programs.py # builders for Predict, CoT, ReAct, custom modules
|-- runnable.py # LangChain Runnable wrappers
|-- nodes.py # LangGraph node wrappers
|-- output.py # Prediction -> AIMessage/dict/state/Pydantic
|-- usage.py # DSPy usage -> ooai UsageRecorder
|-- optimization.py # optimizer job specs and runners
|-- artifacts.py # save/load compiled programs and metadata
`-- testing.py # fake DSPy objects for tests
Optional domain packages can then consume this:
src/ooai_agents/agents/arc/dspy_programs.py
src/ooai_agents/agents/coding/dspy_programs.py
src/ooai_agents/agents/research/dspy_programs.py
Task Breakdown¶
Phase 1: ooai-llm DSPy Substrate¶
Add the optional
dspyextra.Add
DSPyLMConfig.Add
resolve_dspy_model_name(...).Add
create_dspy_lm(...).Add
configure_dspy_lm(...).Map
ChatModelProfilefields to DSPy safely:temperature,max_tokens,cache,num_retries,timeout,parallel_tool_calls, reasoning kwargs, and pass-throughlm_kwargs.Add DSPy usage extraction helpers.
Add unit tests with fake DSPy modules.
Add docs and examples.
Acceptance criteria:
Importing
ooai_llmdoes not import DSPy.Missing DSPy raises an actionable optional-extra error.
Profile-based model names resolve to LiteLLM format.
Unit tests do not require network, provider keys, or real DSPy.
Docs clearly mark DSPy program support as owned by
ooai-agents.
Phase 2: ooai-agents DSPy Program Specs¶
Add an optional
dspyextra toooai-agents.Add
DSPyProgramSpec.Add
DSPyOutputPolicy.Add builders for common modules:
predict,chain_of_thought,react,program_of_thought, andcode_act.Add support for raw inline signatures and imported class-based signatures.
Add tests using fake DSPy modules.
Acceptance criteria:
A spec can be serialized to JSON/YAML-like data.
Program construction stays lazy and does not require DSPy at import time.
Unsupported module names fail with clear messages.
Phase 3: Runnable And LangGraph Compatibility¶
Implement
DSPyRunnable.Implement
create_dspy_runnable(...).Implement
create_dspy_node(...).Convert
dspy.Predictionto:dict,AIMessage, state update, raw prediction, and Pydantic output.Preserve program id, runtime id, runtime UUID, signature, module type, and usage metadata.
Add sync and async paths when DSPy supports them.
Acceptance criteria:
The runnable supports
.invoke(...)and.ainvoke(...).The node returns a serializable LangGraph state update.
No live DSPy object is written into graph state.
Usage can be recorded into an existing
UsageRecorder.
Phase 4: Optimizer And Artifact Workflows¶
Add
DSPyOptimizationJob.Add optimizer registry names for
bootstrap_few_shot,mipro_v2,gepa,simba, and future optimizers.Add dataset and metric references instead of embedding large datasets in config.
Add budget fields: max examples, max trials, max LM calls, max estimated cost, and timeout.
Add artifact save/load metadata around DSPy’s native save/load.
Add CLI commands:
ooai-agents dspy run arc-hypothesis --input task.json
ooai-agents dspy optimize arc-hypothesis --optimizer mipro-v2
ooai-agents dspy inspect artifacts/dspy/arc-hypothesis
Acceptance criteria:
Optimizer jobs are explicit, budgeted, and opt-in.
Artifacts include spec, model profile id, optimizer config, dataset refs, metrics, timestamp, and package versions.
Loading pickle/cloudpickle artifacts is clearly marked trusted-only.
Phase 5: Domain Integrations¶
ARC: hypothesis generation, rule proposal, transformation proposal, verification critique, and ensemble adjudication.
Coding: bug classification, repair-plan generation, patch critique, and generated-test evaluation.
Research: query decomposition, citation faithfulness, extraction, and synthesis.
Multi-agent: DSPy program nodes as specialist subagents or pre-model evidence generators.
Acceptance criteria:
Each domain has at least one runnable example.
DSPy program output can be compared against LangChain-agent output.
Usage/cost is visible in the same OOAI recorder summaries.
Testing Strategy¶
ooai-llm tests:
Fake
dspy.LMconstructor receives expected model name and kwargs.ChatModelProfilemaps toDSPyLMConfig.configure_dspy_lmcallsdspy.configure.Missing dependency raises an optional-extra error.
Usage extraction handles absent usage gracefully.
ooai-agents tests:
DSPyProgramSpecserializes.Program builders call the expected DSPy module constructor.
Runnable adapters return the configured output mode.
LangGraph nodes return serializable state updates.
Usage recording works with fake predictions.
Optimizer job specs validate budget and artifact paths.
Live tests should be opt-in only:
OOAI_REQUIRE_LIVE=true OOAI_LIVE_PROVIDERS=openai pdm run pytest -m live --no-cov
Open Questions¶
Should
DSPyRunnablelive inooai-agentsonly, or should a tiny protocol live inooai-llmfor reuse outside agent workflows?Which DSPy output field should become
AIMessage.contentby default: first output field, explicitly configured field, or JSON rendering?Should optimizer artifacts live under
ooai-agentsstorage policies or a separate experiment/artifact package later?How much of DSPy’s native cache configuration should be mapped from
CacheKeyPolicyversus left as DSPy-native kwargs?Should
ooai-agentsexpose CLI commands for DSPy immediately, or only after the program/runnable/node layer is stable?