Illustrative RESTful API for per-request AI cost tracking, model tier comparison, workflow cost tracing, and spend analytics. This documentation is part of a demonstration environment using synthetic request, workflow, and pricing examples.
// Demo API · Synthetic responses · Not connected to any live system or provider
The RequestCost API provides programmatic access to per-request AI cost attribution, model tier comparison, workflow cost tracing, and spend analytics. This RESTful API is designed to integrate with AI gateways, agent orchestration systems, FinOps platforms, and internal cost management tooling.
All API requests use Bearer token authentication passed via the
Authorization header. API keys are scoped per workspace
and carry configurable permission levels.
Authorization: Bearer rc_live_xxxxxxxxxxxxxxxxxxxxxxxx
// Demo key format — not a real credential
Authorization: Bearer rc_demo_a1b2c3d4e5f6g7h8i9j0
| Key Type | Prefix | Scope | Rate Limit |
|---|---|---|---|
| Live Key | rc_live_ |
Full API access | 1,000 / min |
| Read-Only Key | rc_read_ |
GET endpoints only | 500 / min |
| Demo Key | rc_demo_ |
Synthetic data only | 100 / min |
All API requests should be directed to the following base URL:
https://api.requestcost.com/v1
// All endpoints are relative to this base
// Example: GET https://api.requestcost.com/v1/request/cost
Rate limits are applied per API key per minute. Exceeded limits
return a 429 Too Many Requests response with a
Retry-After header.
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1735689600
Retry-After: 34
| Status | Code | Description |
|---|---|---|
| 200 | success |
Request completed successfully |
| 400 | invalid_params |
Missing or malformed request parameters |
| 401 | unauthorized |
Invalid or missing API key |
| 422 | unprocessable |
Valid format but semantically invalid input |
| 429 | rate_limited |
Rate limit exceeded — see Retry-After header |
| 500 | server_error |
Unexpected server-side error |
Returns a detailed cost breakdown for a specific AI request, including token costs, tool call overhead, workflow depth multiplier, and cache savings applied.
| Parameter | Type | Required | Description |
|---|---|---|---|
request_id |
string | Yes | Unique identifier for the AI request |
model_tier |
string | No | Filter by model tier: efficient, advanced, frontier, opensource |
include_breakdown |
boolean | No | Include per-factor cost breakdown in response. Default: true |
currency |
string | No | Response currency code. Default: USD |
Submits a request payload for cost analysis. Returns a full cost attribution breakdown including all contributing factors, workflow depth multiplier, and optimization recommendations.
| Field | Type | Required | Description |
|---|---|---|---|
model_tier |
string | Yes | Model tier used: efficient, advanced, frontier, opensource |
input_tokens |
integer | Yes | Total input tokens including system prompt |
output_tokens |
integer | Yes | Total output tokens generated |
tool_calls |
integer | No | Number of tool calls invoked. Default: 0 |
workflow_depth |
integer | No | Number of workflow steps. Default: 1 |
cache_hit |
boolean | No | Whether a cache hit was applied. Default: false |
request_type |
string | No | Type: chat, rag, agent, tool, batch, embedding |
Returns an illustrative cost comparison across all available model tiers for a given request configuration. Useful for evaluating tier selection decisions based on workload type and volume.
Returns a step-by-step cost attribution trace for a workflow or agent chain. Each step includes its individual cost contribution, percentage of total cost, and step type classification.
Accepts an array of request configurations and returns aggregated cost estimates for the entire batch, with per-request and summary-level breakdowns.
{
"requests": [
{
"model_tier": "efficient",
"input_tokens": 1100,
"output_tokens": 400,
"request_type": "batch"
},
{
"model_tier": "advanced",
"input_tokens": 3200,
"output_tokens": 1200,
"request_type": "rag"
}
],
"currency": "USD"
}
{
"status": "success",
"environment": "demo",
"request_count": 2,
"total_cost_usd": 0.0634,
"avg_cost_usd": 0.0317,
"results": [
{
"index": 0,
"model_tier": "efficient",
"estimated_cost": 0.0008,
"input_cost": 0.0006,
"output_cost": 0.0002
},
{
"index": 1,
"model_tier": "advanced",
"estimated_cost": 0.0276,
"input_cost": 0.0096,
"output_cost": 0.0180
}
],
"note": "Synthetic demo response. Illustrative pricing only."
}
Returns an aggregated spend summary for a given time period, broken down by model tier, request type, and workflow category.
{
"status": "success",
"environment": "demo",
"period": "2026-01",
"total_requests": 248400,
"total_cost_usd": 3847.22,
"by_model_tier": {
"frontier": { "requests": 12400, "cost_usd": 1842.10 },
"advanced": { "requests": 98200, "cost_usd": 1624.88 },
"efficient": { "requests": 128600, "cost_usd": 342.14 },
"opensource": { "requests": 9200, "cost_usd": 38.10 }
},
"by_request_type": {
"rag": 1284.40,
"agent": 1102.18,
"chat": 842.34,
"batch": 482.10,
"embedding": 136.20
},
"note": "Synthetic demo response. Illustrative figures only."
}
Returns a projected cost forecast based on current usage patterns and a configurable growth rate assumption. All projections are illustrative estimates based on synthetic data.
{
"status": "success",
"environment": "demo",
"forecast_months": 3,
"growth_assumption": "10% monthly",
"projection": [
{ "month": "2026-02", "est_cost_usd": 4231.94, "est_requests": 273240 },
{ "month": "2026-03", "est_cost_usd": 4655.14, "est_requests": 300564 },
{ "month": "2026-04", "est_cost_usd": 5120.65, "est_requests": 330620 }
],
"note": "Illustrative projection. Synthetic assumptions only."
}
Returns an aggregated breakdown of cost by contributing factor across all requests in the specified period. Useful for identifying the primary cost drivers in your AI workload.
{
"status": "success",
"environment": "demo",
"period": "2026-01",
"factors": {
"input_tokens": { "cost_usd": 1542.10, "pct": 40.1 },
"output_tokens": { "cost_usd": 1284.40, "pct": 33.4 },
"tool_calls": { "cost_usd": 482.18, "pct": 12.5 },
"workflow_depth": { "cost_usd": 346.24, "pct": 9.0 },
"retries": { "cost_usd": 154.22, "pct": 4.0 },
"cache_savings": { "cost_usd": -61.92, "pct": -1.6 }
},
"note": "Synthetic demo response. Illustrative figures only."
}
A single AI request can vary significantly in cost depending on a combination of the following factors. Understanding these factors is the foundation of request-level cost intelligence.
| Factor | Impact | Description |
|---|---|---|
model_tier_selected |
High | Primary cost driver — pricing varies significantly across tiers |
input_output_token_length |
High | Token count directly determines base inference cost |
tool_calls_invoked |
Medium | Each tool call adds overhead beyond base token cost |
workflow_depth |
Medium | Multi-step chains multiply base cost per step |
retries_and_fallbacks |
Medium | Failed or retried requests add to effective cost |
caching_behavior |
Reduces | Prompt caching can reduce effective input token cost |
routing_logic |
Variable | Smart routing may reduce cost by directing to lower tiers |
provider_pricing_tier |
Variable | Volume discounts and tier agreements affect effective rate |
The RequestCost demo uses four illustrative model tier classifications. These tiers are generic descriptors and are not affiliated with any specific provider or model family.
| Tier | Input / 1K tokens | Output / 1K tokens | Typical Use |
|---|---|---|---|
frontier |
~$0.015 | ~$0.060 | Complex reasoning, research, advanced agents |
advanced |
~$0.003 | ~$0.015 | Balanced workloads, RAG, moderate complexity |
efficient |
~$0.0005 | ~$0.0015 | High-volume, simple chat, classification |
opensource |
~$0.0001 | ~$0.0003 | Self-hosted infrastructure, infra cost not included |
Workflow depth refers to the number of sequential steps in an agent or automation chain. Each additional step can add cost through additional inference calls, tool invocations, and retries.
| Classification | Steps | Cost Multiplier | Example |
|---|---|---|---|
simple |
1 | ×1.0 | Single chat turn, classification |
moderate |
2 — 4 | ×1.4 | RAG pipeline, simple tool use |
complex |
5 — 10 | ×2.1 | Multi-tool agent, research workflow |
deep |
10+ | ×3.5 | Deep agent chain, autonomous task runner |
List endpoints use cursor-based pagination. Each response includes
a next_cursor value that can be passed to retrieve
the next page of results.
{
"data": [ /* ... results ... */ ],
"pagination": {
"has_more": true,
"next_cursor": "cur_a1b2c3d4e5f6",
"page_size": 50,
"total_count": 4820
}
}
Relevant to AI gateways, model operations, workflow tooling, and cost-aware platform teams. Includes demo platform and API documentation assets. Descriptive .com term. For project, partnership, or ownership-related inquiries, please use the contact page.