Usage¶
Model strings¶
from ooai_llm import ModelString
model = ModelString.parse("gpt-5.4-mini")
assert model.provider_prefix == "openai"
assert model.model_name == "gpt-5.4-mini"
Provider defaults¶
from ooai_llm import AppSettings
settings = AppSettings()
assert settings.resolve_model(alias="cheap") == "openai:gpt-5.4-nano"
assert settings.resolve_model(alias="latest") == "openai:gpt-5.5"
reasoning_model = settings.resolve_model(provider="google", preset="reasoning")
assert reasoning_model == "google_genai:gemini-2.5-pro"
Cache bootstrap¶
from ooai_llm import AppSettings, configure_global_llm_cache
settings = AppSettings()
cache = configure_global_llm_cache(settings)
print(cache)
Supported cache backends are sqlite, memory, sqlalchemy, redis, and
upstash_redis:
settings = AppSettings(
llm={
"cache": {
"backend": "redis",
"redis_url": "redis://localhost:6379/0",
"ttl": 3600,
}
}
)
configure_global_llm_cache(settings)
Factory helper¶
from ooai_llm import create_llm
llm = create_llm("openai:gpt-5.4-mini", temperature=0)
Bare model names can be paired with a provider when you want the provider to be explicit in code:
from ooai_llm import create_llm
llm = create_llm("claude-3-5-haiku-20241022", provider="anthropic")
Serializable profiles and runtime¶
Use ChatModelProfile when the same model configuration should be serialized,
validated, and reused across application code:
from ooai_llm import ChatModelProfile
profile = ChatModelProfile(
id="assistant-profile",
model="openai:gpt-5.4-mini",
temperature=0,
top_p=0.9,
max_retries=2,
reasoning="fast",
cache={"namespace": "agents", "key": "assistant-v1"},
run_name="assistant",
tags=["prod"],
cost_labels={"team": "platform"},
)
json_text = profile.to_json()
llm = profile.create_llm()
bundle = profile.create_bundle()
Use LLM when you want lazy construction plus observed usage/cost totals:
from ooai_llm import LLM, UsageRecorder, configure_logging
configure_logging(preset="dev")
runtime = LLM(id="assistant-runtime", profile=profile, recorder=UsageRecorder())
# result = runtime.invoke("Hello")
print(runtime.id)
print(runtime.uuid)
print(runtime.usage_summary.model_dump())
Profile controls include temperature, max_tokens, top_p, penalties,
seed, stop, max_retries, parallel_tool_calls, timeout, streaming,
model_kwargs, and constructor_kwargs.
Parallel tool calls are provider/model dependent. If omitted, the provider
wrapper keeps its native default. Set parallel_tool_calls=False when a tool
loop requires one tool call at a time:
profile = ChatModelProfile(
model="openai:gpt-5.4-mini",
parallel_tool_calls=False,
)
Profile JSON can be checked from the CLI:
ooai-llm profiles validate --input profile.json
ooai-llm profiles render --input profile.json
ooai-llm profiles resolve --input profile.json --format json
Live model discovery¶
from ooai_llm import AppSettings, ListModelsConfig, list_available_models
settings = AppSettings()
result = list_available_models(
"openai",
settings=settings,
config=ListModelsConfig(limit=5),
)
for model in result.models:
print(model.model_string, model.display_name)
Provider SDKs are preferred when installed. REST fallbacks are used for supported providers where SDK listing is unavailable or explicitly disabled:
from ooai_llm import ListModelsConfig, list_available_models
result = list_available_models(
"anthropic",
config=ListModelsConfig(prefer_sdk=False, page_size=20),
)
Use list_model_catalog(...) or the CLI for cross-provider model inventory
with release-date, capability, cost, and context filters:
from ooai_llm import list_model_catalog
catalog = list_model_catalog(
providers=["openai", "anthropic", "mistral"],
source="litellm",
capabilities=["tool_calling", "structured_output"],
released_after="2026-01",
min_context_tokens=128_000,
min_output_tokens=8_000,
max_output_cost_per_1m=200,
sort_by="output_tokens",
)
for model in catalog.models:
print(
model.model_string,
model.release_date,
model.context_window,
model.max_output_tokens,
model.capability_labels,
)
ooai-llm models list \
--source litellm \
--providers openai,anthropic,mistral \
--tool-calling-only \
--structured-output-only \
--released-after 2026-01 \
--min-input-tokens 128000 \
--min-output-tokens 8000 \
--max-output-cost-per-1m 200 \
--sort output_tokens
Interactive table output uses Rich when available. The Rich view is intentionally
more compact than CSV/JSON: it shows a summary panel, combined input/output
pricing, input/output-token limits, release, and capability badges. Capability
filters include tool_calling, tool_choice, parallel_tool_calls, and
structured_output in addition to chat/reasoning/coding/vision/cheap labels.
Install ooai-llm[cli] for Rich output. Automation should use --format json
or --format csv; deterministic plain terminal output can use --no-rich:
ooai-llm models list --source litellm --providers openai,mistral --limit 5
ooai-llm models list --source litellm --providers openai,mistral --format csv
ooai-llm models list --source litellm --structured-output-only --sort output_tokens
ooai-llm models list --source litellm --providers openai,mistral --no-rich
ooai-llm recipes --topic rich
pdm run ooai-llm tui --providers mistral --views cheapest,catalog --limit 10
Use compare_model_catalog(...) when you want cost-ranked planning views for
a representative call shape:
from ooai_llm import compare_model_catalog, get_coding_model_comparison
comparison = compare_model_catalog(
providers=["openai", "anthropic", "google", "deepseek", "mistral"],
source="litellm",
input_tokens=10_000,
output_tokens=2_000,
per_provider=True,
)
for estimate in comparison.estimates:
print(
estimate.model,
estimate.call_cost_usd,
estimate.calls_per_usd,
estimate.max_input_tokens,
estimate.max_output_tokens,
)
coding = get_coding_model_comparison(
providers=["openai", "deepseek", "mistral"],
source="litellm",
input_tokens=10_000,
output_tokens=2_000,
)
CLI equivalents:
ooai-llm models compare \
--source litellm \
--providers openai,anthropic,google,deepseek,mistral \
--input-tokens 10000 \
--output-tokens 2000 \
--per-provider
ooai-llm models compare \
--source litellm \
--providers openai,anthropic,deepseek,mistral \
--coding-only \
--tool-calling-only \
--baseline openai:gpt-5-mini
ooai-llm models compare \
--source litellm \
--providers openai,mistral \
--structured-output-only \
--min-output-tokens 8000 \
--sort output_tokens \
--input-tokens 10000 \
--output-tokens 2000 \
--baseline openai:gpt-5-mini \
--limit 10 \
--format csv
ModelCostComparison.model_list(), model_dict(), by_provider(), and
equivalents(...) are useful for LangGraph routing experiments and budget
dashboards. The numbers are estimates from the selected catalog source; real
post-call accounting should use provider usage metadata.
Optional benchmark exploration¶
Benchmark exploration helpers are isolated under ooai_llm.benchmarks because
they may depend on third-party benchmark websites with less stable APIs than
provider model catalogs. Install the optional extra when you want the Rich CLI
views:
pdm add ooai-llm[benchmarks]
LiveCodeBench Pro support is intentionally exploratory. It reads the public leaderboard backend used by the website and can show overall ratings, easy/medium/hard pass rates, per-problem verdicts, and individual submission details. The backend is public but undocumented, so automation should treat it as best-effort.
from ooai_llm.benchmarks.livecodebench_pro import LiveCodeBenchProClient
client = LiveCodeBenchProClient()
models = client.list_models(status="active", limit=10)
hard = client.get_difficulty("hard", providers=["openai"], limit=5)
print([model.label for model in models])
print([(row.label, row.passrate_percent) for row in hard.llms])
ooai-llm benchmarks lcb-pro summary
ooai-llm benchmarks lcb-pro models --status active --limit 10
ooai-llm benchmarks lcb-pro difficulty --difficulty hard --provider openai
ooai-llm benchmarks lcb-pro submissions \
--model-name gpt-5.2-2025-12-11 \
--model-provider openai \
--difficulty hard \
--format json
Model suites¶
Use model suites when you want a repeatable shortlist for model comparisons, LangGraph node variants, or provider experiments:
from ooai_llm import get_model_suite
suite = get_model_suite(
"comparison",
providers=["openai", "anthropic", "deepseek", "mistral"],
temperature=0,
parallel_tool_calls=False,
)
assert suite.model_list()
profiles = suite.filter(roles=["cheap", "balanced"]).to_profiles()
runtimes = suite.create_runtimes()
The preset-backed suites use current AppSettings, so refreshed provider
defaults flow into them. Catalog-backed suites use the same filters as
list_model_catalog(...):
from ooai_llm import model_suite_from_catalog
suite = model_suite_from_catalog(
providers=["openai", "anthropic", "mistral"],
source="litellm",
capabilities=["tool_calling", "structured_output"],
released_after="2026-01",
min_context_tokens=128_000,
min_output_tokens=8_000,
max_output_cost_per_1m=200,
sort_by="output_tokens",
limit=5,
)
ooai-llm models suite --suite practical --providers openai,anthropic,mistral
ooai-llm models suite --suite comparison --parallel-tool-calls false
ooai-llm models suite \
--from-catalog \
--source litellm \
--tool-calling-only \
--structured-output-only \
--sort output_tokens \
--limit 5
ooai-llm models suite --suite comparison --providers openai,mistral --no-rich
Update factory aliases and provider presets from live catalogs or LiteLLM metadata:
from ooai_llm import AppSettings, update_model_defaults
settings = AppSettings()
update = update_model_defaults(
settings,
providers=["openai", "anthropic", "mistral"],
source="litellm",
)
settings = update.settings
print(settings.resolve_model(alias="latest"))
print(settings.resolve_model(provider="mistral", preset="coding"))
Write reusable overrides from the CLI:
ooai-llm models update --source litellm --providers openai,anthropic,mistral --format json
ooai-llm models update --source auto --provider openai --format env --output .env.models
Enable factory-time automatic refresh when the application should refresh aliases before model creation:
from ooai_llm import AppSettings, create_llm
settings = AppSettings(
llm={
"auto_refresh_models": {
"enabled": True,
"source": "auto",
"providers": ["openai", "anthropic", "mistral"],
}
}
)
llm = create_llm(alias="latest", settings=settings)
This path is opt-in and cached for one hour by default. Use
force_model_refresh=True on a factory call to bypass the cache once.
Reasoning¶
from ooai_llm import ReasoningConfig, build_reasoning_resolution, create_llm
resolution = build_reasoning_resolution(
model="anthropic:claude-sonnet-4-20250514",
reasoning="deep",
)
assert resolution is not None
assert resolution.constructor_kwargs["thinking"]["type"] == "adaptive"
llm = create_llm(
"google_genai:gemini-2.5-flash",
reasoning=ReasoningConfig(budget_tokens=1024, include_thoughts=True),
)
Metadata and cost accounting¶
from ooai_llm import BudgetPolicy, UsageRecorder, create_llm_bundle, make_litellm_cost_callback
bundle = create_llm_bundle("openai:gpt-5.4-mini", reasoning="fast")
print(bundle.metadata.capabilities.raw_profile)
print(bundle.metadata.pricing.input_cost_per_token)
recorder = UsageRecorder()
callback = make_litellm_cost_callback(
recorder,
budget=BudgetPolicy(warn_total_tokens=5000),
)
For LangChain and LangGraph flows, attach LangChainUsageCallbackHandler when
you want observed framework callback usage to land in the same recorder:
from ooai_llm import LangChainUsageCallbackHandler, UsageRecorder
recorder = UsageRecorder()
handler = LangChainUsageCallbackHandler(
recorder,
model="openai:gpt-5.4-mini",
run_name="graph-run",
)