LLM cost optimizationAI latencymodel routing
Measure unit economics early
Cost-per-successful-workflow matters more than cost-per-token alone.
Cache grounded contexts when queries repeat within support shifts.
Routing strategies
Route simple lookups to smaller models; reserve frontier models for multi-step reasoning.
Retrieval shortcuts skip generation when confidence from search alone is sufficient.