Usage¶

Model strings¶

from ooai_llm import ModelString

model = ModelString.parse("gpt-5.4-mini")
assert model.provider_prefix == "openai"
assert model.model_name == "gpt-5.4-mini"

Provider defaults¶

from ooai_llm import AppSettings

settings = AppSettings()

assert settings.resolve_model(alias="cheap") == "openai:gpt-5.4-nano"
assert settings.resolve_model(alias="latest") == "openai:gpt-5.5"
reasoning_model = settings.resolve_model(provider="google", preset="reasoning")
assert reasoning_model == "google_genai:gemini-2.5-pro"

Cache bootstrap¶

from ooai_llm import AppSettings, configure_global_llm_cache

settings = AppSettings()
cache = configure_global_llm_cache(settings)
print(cache)

Supported cache backends are sqlite, memory, sqlalchemy, redis, and upstash_redis:

settings = AppSettings(
    llm={
        "cache": {
            "backend": "redis",
            "redis_url": "redis://localhost:6379/0",
            "ttl": 3600,
        }
    }
)
configure_global_llm_cache(settings)

Factory helper¶

from ooai_llm import create_llm

llm = create_llm("openai:gpt-5.4-mini", temperature=0)

Bare model names can be paired with a provider when you want the provider to be explicit in code:

from ooai_llm import create_llm

llm = create_llm("claude-3-5-haiku-20241022", provider="anthropic")

Serializable profiles and runtime¶

Use ChatModelProfile when the same model configuration should be serialized, validated, and reused across application code:

from ooai_llm import ChatModelProfile

profile = ChatModelProfile(
    id="assistant-profile",
    model="openai:gpt-5.4-mini",
    temperature=0,
    top_p=0.9,
    max_retries=2,
    reasoning="fast",
    cache={"namespace": "agents", "key": "assistant-v1"},
    run_name="assistant",
    tags=["prod"],
    cost_labels={"team": "platform"},
)

json_text = profile.to_json()
llm = profile.create_llm()
bundle = profile.create_bundle()

Use LLM when you want lazy construction plus observed usage/cost totals:

from ooai_llm import LLM, UsageRecorder, configure_logging

configure_logging(preset="dev")

runtime = LLM(id="assistant-runtime", profile=profile, recorder=UsageRecorder())

# result = runtime.invoke("Hello")
print(runtime.id)
print(runtime.uuid)
print(runtime.usage_summary.model_dump())

Profile controls include temperature, max_tokens, top_p, penalties, seed, stop, max_retries, parallel_tool_calls, timeout, streaming, model_kwargs, and constructor_kwargs.

Parallel tool calls are provider/model dependent. If omitted, the provider wrapper keeps its native default. Set parallel_tool_calls=False when a tool loop requires one tool call at a time:

profile = ChatModelProfile(
    model="openai:gpt-5.4-mini",
    parallel_tool_calls=False,
)

Profile JSON can be checked from the CLI:

ooai-llm profiles validate --input profile.json
ooai-llm profiles render --input profile.json
ooai-llm profiles resolve --input profile.json --format json

Live model discovery¶

from ooai_llm import AppSettings, ListModelsConfig, list_available_models

settings = AppSettings()
result = list_available_models(
    "openai",
    settings=settings,
    config=ListModelsConfig(limit=5),
)

for model in result.models:
    print(model.model_string, model.display_name)

Provider SDKs are preferred when installed. REST fallbacks are used for supported providers where SDK listing is unavailable or explicitly disabled:

from ooai_llm import ListModelsConfig, list_available_models

result = list_available_models(
    "anthropic",
    config=ListModelsConfig(prefer_sdk=False, page_size=20),
)

Use list_model_catalog(...) or the CLI for cross-provider model inventory with release-date, capability, cost, and context filters:

from ooai_llm import list_model_catalog

catalog = list_model_catalog(
    providers=["openai", "anthropic", "mistral"],
    source="litellm",
    capabilities=["tool_calling", "structured_output"],
    released_after="2026-01",
    min_context_tokens=128_000,
    min_output_tokens=8_000,
    max_output_cost_per_1m=200,
    sort_by="output_tokens",
)

for model in catalog.models:
    print(
        model.model_string,
        model.release_date,
        model.context_window,
        model.max_output_tokens,
        model.capability_labels,
    )

ooai-llm models list \
  --source litellm \
  --providers openai,anthropic,mistral \
  --tool-calling-only \
  --structured-output-only \
  --released-after 2026-01 \
  --min-input-tokens 128000 \
  --min-output-tokens 8000 \
  --max-output-cost-per-1m 200 \
  --sort output_tokens

Interactive table output uses Rich when available. The Rich view is intentionally more compact than CSV/JSON: it shows a summary panel, combined input/output pricing, input/output-token limits, release, and capability badges. Capability filters include tool_calling, tool_choice, parallel_tool_calls, and structured_output in addition to chat/reasoning/coding/vision/cheap labels. Install ooai-llm[cli] for Rich output. Automation should use --format json or --format csv; deterministic plain terminal output can use --no-rich:

ooai-llm models list --source litellm --providers openai,mistral --limit 5
ooai-llm models list --source litellm --providers openai,mistral --format csv
ooai-llm models list --source litellm --structured-output-only --sort output_tokens
ooai-llm models list --source litellm --providers openai,mistral --no-rich
ooai-llm recipes --topic rich
pdm run ooai-llm tui --providers mistral --views cheapest,catalog --limit 10

Use compare_model_catalog(...) when you want cost-ranked planning views for a representative call shape:

from ooai_llm import compare_model_catalog, get_coding_model_comparison

comparison = compare_model_catalog(
    providers=["openai", "anthropic", "google", "deepseek", "mistral"],
    source="litellm",
    input_tokens=10_000,
    output_tokens=2_000,
    per_provider=True,
)

for estimate in comparison.estimates:
    print(
        estimate.model,
        estimate.call_cost_usd,
        estimate.calls_per_usd,
        estimate.max_input_tokens,
        estimate.max_output_tokens,
    )

coding = get_coding_model_comparison(
    providers=["openai", "deepseek", "mistral"],
    source="litellm",
    input_tokens=10_000,
    output_tokens=2_000,
)

CLI equivalents:

ooai-llm models compare \
  --source litellm \
  --providers openai,anthropic,google,deepseek,mistral \
  --input-tokens 10000 \
  --output-tokens 2000 \
  --per-provider

ooai-llm models compare \
  --source litellm \
  --providers openai,anthropic,deepseek,mistral \
  --coding-only \
  --tool-calling-only \
  --baseline openai:gpt-5-mini

ooai-llm models compare \
  --source litellm \
  --providers openai,mistral \
  --structured-output-only \
  --min-output-tokens 8000 \
  --sort output_tokens \
  --input-tokens 10000 \
  --output-tokens 2000 \
  --baseline openai:gpt-5-mini \
  --limit 10 \
  --format csv

ModelCostComparison.model_list(), model_dict(), by_provider(), and equivalents(...) are useful for LangGraph routing experiments and budget dashboards. The numbers are estimates from the selected catalog source; real post-call accounting should use provider usage metadata.

Optional benchmark exploration¶

Benchmark exploration helpers are isolated under ooai_llm.benchmarks because they may depend on third-party benchmark websites with less stable APIs than provider model catalogs. Install the optional extra when you want the Rich CLI views:

pdm add ooai-llm[benchmarks]

LiveCodeBench Pro support is intentionally exploratory. It reads the public leaderboard backend used by the website and can show overall ratings, easy/medium/hard pass rates, per-problem verdicts, and individual submission details. The backend is public but undocumented, so automation should treat it as best-effort.

from ooai_llm.benchmarks.livecodebench_pro import LiveCodeBenchProClient

client = LiveCodeBenchProClient()
models = client.list_models(status="active", limit=10)
hard = client.get_difficulty("hard", providers=["openai"], limit=5)

print([model.label for model in models])
print([(row.label, row.passrate_percent) for row in hard.llms])

ooai-llm benchmarks lcb-pro summary
ooai-llm benchmarks lcb-pro models --status active --limit 10
ooai-llm benchmarks lcb-pro difficulty --difficulty hard --provider openai
ooai-llm benchmarks lcb-pro submissions \
  --model-name gpt-5.2-2025-12-11 \
  --model-provider openai \
  --difficulty hard \
  --format json

Model suites¶

Use model suites when you want a repeatable shortlist for model comparisons, LangGraph node variants, or provider experiments:

from ooai_llm import get_model_suite

suite = get_model_suite(
    "comparison",
    providers=["openai", "anthropic", "deepseek", "mistral"],
    temperature=0,
    parallel_tool_calls=False,
)

assert suite.model_list()
profiles = suite.filter(roles=["cheap", "balanced"]).to_profiles()
runtimes = suite.create_runtimes()

The preset-backed suites use current AppSettings, so refreshed provider defaults flow into them. Catalog-backed suites use the same filters as list_model_catalog(...):

from ooai_llm import model_suite_from_catalog

suite = model_suite_from_catalog(
    providers=["openai", "anthropic", "mistral"],
    source="litellm",
    capabilities=["tool_calling", "structured_output"],
    released_after="2026-01",
    min_context_tokens=128_000,
    min_output_tokens=8_000,
    max_output_cost_per_1m=200,
    sort_by="output_tokens",
    limit=5,
)

ooai-llm models suite --suite practical --providers openai,anthropic,mistral
ooai-llm models suite --suite comparison --parallel-tool-calls false
ooai-llm models suite \
  --from-catalog \
  --source litellm \
  --tool-calling-only \
  --structured-output-only \
  --sort output_tokens \
  --limit 5
ooai-llm models suite --suite comparison --providers openai,mistral --no-rich

Update factory aliases and provider presets from live catalogs or LiteLLM metadata:

from ooai_llm import AppSettings, update_model_defaults

settings = AppSettings()
update = update_model_defaults(
    settings,
    providers=["openai", "anthropic", "mistral"],
    source="litellm",
)

settings = update.settings
print(settings.resolve_model(alias="latest"))
print(settings.resolve_model(provider="mistral", preset="coding"))

Write reusable overrides from the CLI:

ooai-llm models update --source litellm --providers openai,anthropic,mistral --format json
ooai-llm models update --source auto --provider openai --format env --output .env.models

Enable factory-time automatic refresh when the application should refresh aliases before model creation:

from ooai_llm import AppSettings, create_llm

settings = AppSettings(
    llm={
        "auto_refresh_models": {
            "enabled": True,
            "source": "auto",
            "providers": ["openai", "anthropic", "mistral"],
        }
    }
)

llm = create_llm(alias="latest", settings=settings)

This path is opt-in and cached for one hour by default. Use force_model_refresh=True on a factory call to bypass the cache once.

Reasoning¶

from ooai_llm import ReasoningConfig, build_reasoning_resolution, create_llm

resolution = build_reasoning_resolution(
    model="anthropic:claude-sonnet-4-20250514",
    reasoning="deep",
)
assert resolution is not None
assert resolution.constructor_kwargs["thinking"]["type"] == "adaptive"

llm = create_llm(
    "google_genai:gemini-2.5-flash",
    reasoning=ReasoningConfig(budget_tokens=1024, include_thoughts=True),
)

Metadata and cost accounting¶

from ooai_llm import BudgetPolicy, UsageRecorder, create_llm_bundle, make_litellm_cost_callback

bundle = create_llm_bundle("openai:gpt-5.4-mini", reasoning="fast")
print(bundle.metadata.capabilities.raw_profile)
print(bundle.metadata.pricing.input_cost_per_token)

recorder = UsageRecorder()
callback = make_litellm_cost_callback(
    recorder,
    budget=BudgetPolicy(warn_total_tokens=5000),
)

For LangChain and LangGraph flows, attach LangChainUsageCallbackHandler when you want observed framework callback usage to land in the same recorder:

from ooai_llm import LangChainUsageCallbackHandler, UsageRecorder

recorder = UsageRecorder()
handler = LangChainUsageCallbackHandler(
    recorder,
    model="openai:gpt-5.4-mini",
    run_name="graph-run",
)