CLI Guide

The CLI is designed around the same mental model as the Python package:

  • Catalog and suite commands help you choose and compare models.

  • ChatModelProfile stores serializable model configuration.

  • LLM is the runtime wrapper that invokes a model and records observed usage/cost.

Use ooai-llm --help when you need orientation. Use ooai-llm recipes when you want copy/paste CLI and Python package examples.

Install

The base console script is installed with the package. Rich table rendering is optional:

pip install "ooai-llm[cli]"

For provider catalogs through LiteLLM:

pip install "ooai-llm[litellm]"

For the interactive Textual UI:

pip install "ooai-llm[tui]"

Common Commands

Print guided help:

ooai-llm --help
ooai-llm models --help
ooai-llm models cheapest --help

Print package and terminal recipes:

ooai-llm recipes --topic cheapest
ooai-llm recipes --topic coding
ooai-llm recipes --topic rich
ooai-llm recipes --topic runtime --format markdown

Pretty terminal output is opt-in through extras. Install ooai-llm[cli] for Rich tables and ooai-llm[tui] for the Textual explorer. Rich tables are used automatically when installed; pass --no-rich for a plain deterministic table.

Machine output is available for catalog, comparison, suite, and benchmark commands:

ooai-llm models cheapest --providers mistral --format json
ooai-llm models coding --providers openai,mistral --format csv

Interactive exploration:

ooai-llm tui \
  --source litellm \
  --providers openai,anthropic,mistral \
  --input-tokens 10000 \
  --output-tokens 2000 \
  --budget-usd 10 \
  --theme slate \
  --views cheapest,coding,catalog \
  --refresh-cooldown 2

Inside the TUI, use the top navigation buttons, Tab/Shift+Tab, n/p, or 1-7 to switch views. Use / to search visible metadata, the provider selector to narrow rows, and r to refresh after the cooldown. Use --refresh-cooldown 0 only when you are sure the selected source will not make expensive provider calls.

Use --theme paper, --theme slate, --theme mono, or --theme forest to control the TUI palette.

Use --views cheapest,catalog or repeat --view catalog to load only selected table-backed surfaces. This limits snapshot work before catalog/comparison/suite helpers are called; it is stronger than filtering after every view has loaded.

CLI Package Layout

The terminal surface is implemented as a package so new command groups and rendering helpers can be added without growing one giant module:

ooai_llm.cli.app      parser construction, dispatch, and command handlers
ooai_llm.cli.help     examples, choices, and shared parser arguments
ooai_llm.cli.recipes  copy/paste CLI and Python recipes

The console script still points at ooai_llm.cli:main, so existing installs and tests do not need a new entry point. The old ooai_llm.cli_help and ooai_llm.cli_recipes imports remain as compatibility shims.

Future CLI extensions should stay optional. Python’s standard-library argparse remains the base because it avoids another runtime dependency, while Rich handles current table output. If shell completion becomes a priority, add argcomplete as an optional cli-completion extra. If the command surface becomes truly typed/declarative, consider a Typer migration only after the current argparse behavior is covered by compatibility tests.

Cheapest Models

Use models cheapest when the question is “how many calls do I get for this budget and token shape?”

ooai-llm models cheapest \
  --source litellm \
  --providers mistral \
  --input-tokens 10000 \
  --output-tokens 2000 \
  --budget-usd 10 \
  --limit 20

Compare one winner per provider:

ooai-llm models cheapest \
  --source litellm \
  --providers openai,anthropic,google,deepseek,mistral \
  --input-tokens 10000 \
  --output-tokens 2000 \
  --per-provider

Equivalent Python:

from ooai_llm import compare_model_catalog

comparison = compare_model_catalog(
    providers=["mistral"],
    source="litellm",
    input_tokens=10_000,
    output_tokens=2_000,
    budget_usd=10,
    sort_by="call_cost",
)

for row in comparison.estimates:
    print(row.model.as_langchain(), row.call_cost_usd, row.calls_per_budget)

Coding Models

Use models coding when you want models marked or inferred as coding-oriented. It accepts the same cost, context, date, and capability filters as models compare.

ooai-llm models coding \
  --source litellm \
  --providers openai,anthropic,google,deepseek,mistral \
  --tool-calling-only \
  --structured-output-only \
  --input-tokens 10000 \
  --output-tokens 2000 \
  --per-provider

Useful sorts:

  • --sort call_cost ranks by representative call cost.

  • --sort output_tokens finds larger completion budgets.

  • --sort context finds larger input/context windows.

  • --sort calls_per_usd ranks by calls per dollar.

Equivalent Python:

from ooai_llm import get_coding_model_comparison

comparison = get_coding_model_comparison(
    providers=["openai", "anthropic", "mistral"],
    source="litellm",
    input_tokens=10_000,
    output_tokens=2_000,
    capabilities=["tool_calling", "structured_output"],
)

for row in comparison.estimates:
    print(row.model.as_langchain(), row.call_cost_usd, row.capabilities)

Catalog Filters

Use models list when you want raw catalog metadata:

ooai-llm models list \
  --source litellm \
  --providers mistral \
  --sort cost \
  --limit 0

Capability filters can be combined:

ooai-llm models list \
  --source litellm \
  --tool-calling-only \
  --parallel-tool-calls-only \
  --structured-output-only \
  --min-input-tokens 128000 \
  --min-output-tokens 8000 \
  --sort output_tokens

Cost and release filters:

ooai-llm models list \
  --source litellm \
  --released-after 2026-01 \
  --max-input-cost-per-1m 1 \
  --max-output-cost-per-1m 5

Model Suites

Suites are reusable shortlists. They are useful for LangGraph node variants, experiments, and enum/dict-style runtime setup.

ooai-llm models suite \
  --suite comparison \
  --providers openai,anthropic,mistral

Build a suite from catalog filters:

ooai-llm models suite \
  --from-catalog \
  --source litellm \
  --coding-only \
  --tool-calling-only \
  --structured-output-only \
  --sort cost \
  --limit 5 \
  --format json

Equivalent Python:

from ooai_llm import UsageRecorder, model_suite_from_catalog

suite = model_suite_from_catalog(
    providers=["mistral"],
    source="litellm",
    capabilities=["coding"],
    max_output_cost_per_1m=3,
    sort_by="cost",
    limit=5,
    temperature=0,
    parallel_tool_calls=True,
)

recorder = UsageRecorder()
runtimes = {
    key: profile.create_runtime(recorder=recorder, id=key)
    for key, profile in suite.to_profiles().items()
}

Profiles And Runtime

Validate, normalize, and resolve profile files:

ooai-llm profiles validate --input profile.json
ooai-llm profiles render --input profile.json
ooai-llm profiles resolve --input profile.json --format json

Package-side runtime setup:

from enum import StrEnum
from ooai_llm import ChatModelProfile, UsageRecorder

class ModelChoice(StrEnum):
    CHEAP = "cheap"
    CODING = "coding"

profiles = {
    ModelChoice.CHEAP: ChatModelProfile(
        id="cheap",
        model="openai:gpt-5-mini",
        temperature=0,
    ),
    ModelChoice.CODING: ChatModelProfile(
        id="coding",
        model="mistral:YOUR_CODING_MODEL_FROM_CATALOG",
        temperature=0,
        parallel_tool_calls=True,
    ),
}

recorder = UsageRecorder()
runtimes = {
    key: profile.create_runtime(recorder=recorder, id=str(key))
    for key, profile in profiles.items()
}

Catalog comparison estimates cost from assumed token counts. Runtime accounting records observed usage after model calls when LangChain/provider metadata is available.