- Refactored handleChatCore to include Caveman functionality, allowing for terse-style system prompts to reduce output token usage.
- Updated APIPageClient to manage Caveman settings, including enabling/disabling and selecting compression levels.
- Adjusted AntigravityExecutor to consolidate function declarations for compatibility with Gemini.
- Removed unnecessary console logs during translator initialization across multiple routes.
Add clientDetector utility to identify CLI tools (Claude Code, Gemini CLI,
Antigravity, Codex) from request headers. When the CLI tool and provider
are a native pair, skip all translation — only swap model and Bearer token.
Made-with: Cursor
- Respect Accept: application/json header to return non-streaming JSON
instead of SSE, fixing AI SDK generateObject/generateText compatibility
- Strip markdown code block markers (```json...```) from Claude
non-streaming responses to prevent JSON parse errors
Cherry-picked and adapted from PR #290 by @rothnic
https://github.com/decolua/9router/pull/290
Made-with: Cursor
- Centralize proxy management with reusable proxy pools
- Per-connection proxy binding with legacy fallback
- Add strictProxy option: fail hard instead of silently falling back to direct
- Resolve alicode-intl conflict: keep alicode-intl support + proxy support
Made-with: Cursor
- Tracks endpoints like /v1/chat/completions, /v1/messages, /v1/responses
- New sortable/groupable table in usage dashboard with expandable groups
- Enhanced usage database aggregation by endpoint + model + provider
- Added endpoint tracking to all saveRequestUsage/saveRequestDetail calls
- Maintains backward compatibility with existing data structure
- Add response logging for non-streaming requests (5_res_provider.json, 7_res_client.json)
- Fix extractUsageFromResponse() to check Claude format before OpenAI format
- Prevents format misidentification that caused tokens to show as 0
- Claude uses input_tokens/output_tokens vs OpenAI's prompt_tokens/completion_tokens
Fixes dashboard Details tab showing 0 tokens for Claude requests
- Added detailed request logging and latency tracking in handleChatCore.
- Improved error handling for SSE to JSON conversion and JSON parsing in streamToJsonConverter.
- Introduced a safe JSON parsing utility to handle potential parsing errors gracefully in requestDetailsDb.
Co-authored-by: zx <me@char.moe>
* feat: add AI request details feature with latency tracking
Add comprehensive request history and debugging capability to the Usage dashboard:
**Storage Layer** (usageDb.js):
- Add saveRequestDetail() for storing full request/response details
- Implement FIFO queue with 1000-record limit in request-details.json
- Auto-sanitize sensitive headers (authorization, api-key, cookie, token)
- Add getRequestDetails() with pagination and filtering support
- Add getRequestDetailById() for single record lookup
**Pipeline Integration** (chatCore.js):
- Track request start time and calculate total latency
- Record TTFT (Time To First Token) and total latency for all requests
- Capture full request details (messages, model, parameters)
- Save response content for non-streaming, mark streaming responses
- Handle error cases with detailed error information
- Async non-blocking saves to avoid impacting request performance
**API Layer** (/api/usage/request-details):
- GET endpoint with pagination (page, pageSize: 1-100)
- Filter by provider, model, connectionId, status, date range
- Returns { details: [...], pagination: {...} } format
**UI Components**:
- Drawer.js: Right slide-out panel with backdrop blur and ESC close
- Pagination.js: Full pagination with page size selector (10/20/50)
- RequestDetailsTab.js: Complete table view with filters and detail drawer
**Dashboard Integration**:
- Add "Details" tab to Usage page (4th tab after Overview/Logger/Limits)
- Table columns: Timestamp, Model, Provider, Input Tokens, Output Tokens, Latency (TTFT/Total), Action
- Provider filter dropdown (9 providers supported)
- Date range filters (start/end datetime)
- Click "Detail" button to view full request/response JSON in slide-out drawer
**Features**:
- Real-time latency monitoring (TTFT & Total)
- Complete request/response inspection for debugging
- Filterable and searchable request history
- Responsive design with mobile-friendly filters
- Data security with automatic header sanitization
- Performance: async saves don't block request pipeline
**Files Created/Modified**:
- src/lib/usageDb.js (modified)
- open-sse/handlers/chatCore.js (modified)
- src/app/api/usage/request-details/route.js (new)
- src/shared/components/Drawer.js (new)
- src/shared/components/Pagination.js (new)
- src/app/(dashboard)/dashboard/usage/components/RequestDetailsTab.js (new)
- src/app/(dashboard)/dashboard/usage/page.js (modified)
Closes: AI Observability Dashboard feature
* feat: enhance request details with full config and streaming content capture
Improve Request Details feature to capture comprehensive request parameters
and actual streaming response content:
**Request Configuration Enhancement** (chatCore.js):
- Add extractRequestConfig() helper function to capture all request parameters
- Include temperature controls: temperature, top_p, top_k
- Include token limits: max_tokens, max_completion_tokens
- Include thinking/reasoning modes: thinking, reasoning, enable_thinking
- Include OpenAI parameters: presence_penalty, frequency_penalty, seed, stop,
tools, tool_choice, response_format, n, logprobs, top_logprobs, logit_bias,
user, parallel_tool_calls, prediction, store, metadata
- Apply to all request types: non-streaming, streaming, and error cases
**Streaming Content Capture** (chatCore.js & stream.js):
- Add onStreamComplete callback mechanism to stream processors
- Accumulate content from all formats: OpenAI, Claude, Gemini
- Track content from delta.content, delta.reasoning_content, delta.text,
delta.thinking, and Gemini content.parts
- Save initial record with "[Streaming in progress...]" marker
- Update record with actual content when stream completes
- Include usage tokens when available from stream
**Files Modified**:
- open-sse/handlers/chatCore.js - extractRequestConfig() + streaming capture
- open-sse/utils/stream.js - onStreamComplete callback + content accumulation
**Benefits**:
- View complete request configuration in Request Details (thinking mode, etc.)
- See actual streaming response content instead of placeholder
- Better debugging and observability for AI requests
Refs: #request-details-enhancement
* feat: separate thinking/reasoning content from response content
Improve Request Details to display thinking process separately from final response:
**Backend Changes**:
- stream.js: Capture content and thinking separately in streaming mode
- Add accumulatedThinking variable alongside accumulatedContent
- Route delta.content to content, delta.reasoning_content to thinking
- Support OpenAI (reasoning_content), Claude (thinking), Gemini (part.thought)
- Update onStreamComplete callback to return { content, thinking } object
- chatCore.js: Update response structure to include thinking field
- Non-streaming: Extract thinking from reasoning_content field
- Streaming: Receive { content, thinking } from stream callback
- Error responses: Include thinking: null
- Initial streaming save: Include thinking: null
**Frontend Changes**:
- RequestDetailsTab.js: Display thinking and content in separate sections
- Add amber/yellow themed "Thinking Process" section with psychology icon
- Show "Final Response" label when thinking is present
- Use distinct visual styling for thinking (amber bg) vs content (gray bg)
- Only show thinking section when thinking content exists
**Benefits**:
- Users can clearly see model's reasoning process vs final answer
- Better debugging for models with thinking capabilities (Claude, o1, etc.)
- Visual distinction makes it easy to identify thinking vs response
Refs: #thinking-content-separation
* fix: map Claude thinking to reasoning_content field
Fix Claude thinking content to be properly captured as reasoning_content
instead of regular content, enabling separate display in Request Details:
**Changes**:
- claude-to-openai.js: Use reasoning_content field for thinking blocks
- thinking start: send { reasoning_content: "" } instead of { content: "```\n```" }
- thinking delta: map to reasoning_content instead of content
- thinking stop: send { reasoning_content: "" } instead of { content: "```\n```" }
**Why This Matters**:
- Previously Claude thinking was sent as `content` field, mixed with actual response
- Now thinking uses `reasoning_content` field, matching OpenAI's o1 format
- stream.js can now properly route thinking to accumulatedThinking variable
- Request Details UI will show Claude thinking in separate "Thinking Process" section
**Supported Thinking Formats**:
- OpenAI: delta.reasoning_content → thinking
- Claude: delta.thinking → reasoning_content (now fixed)
- Gemini: part.thought === true → thinking
Refs: #claude-thinking-fix
* feat(observability): capture and display full 4-layer request chain
Capture complete request/response chain in AI Request Details:
- Add providerRequest field (translated request sent to provider)
- Add providerResponse field (raw provider response, streaming indicator)
- Update chatCore.js at all 5 saveRequestDetail() call sites
- Reorganize UI into 4 collapsible sections with Material icons
- Preserve backward compatibility for old records
- Add distinct styling for streaming indicator
* fix(observability): resolve React duplicate key warning in request details table
- Use composite key (detail.id + index) to ensure unique keys
- Prevents React warnings when database contains duplicate IDs from old ID generation
* fix(observability): display actual content in streaming request details
Change providerResponse field for streaming requests from placeholder
"[Streaming - raw response not captured]" to actual final content.
This improves debugging experience by showing the real AI response
in the "Provider Response (Raw)" section instead of a confusing
placeholder message.
Files changed:
- open-sse/handlers/chatCore.js: Save contentObj.content to providerResponse
- src/app/.../RequestDetailsTab.js: Remove special handling for placeholder
* refactor(observability): migrate request details to SQLite for improved concurrency
- Replace LowDB JSON storage with better-sqlite3
- Enable WAL mode for true concurrent read/write support
- Add 5 indexes to accelerate queries (timestamp, provider, model, connection_id, status)
- Perform pagination at the database level to reduce memory footprint
- Maintain 1000 record limit with automatic cleanup of old data
- Ensure API compatibility via re-exports, requiring no caller changes
Performance improvements:
- Concurrent Writes: Lock-free WAL mode prevents data contention
- Query Efficiency: Index-based searches replace full dataset loading
- Data Integrity: Atomic operations prevent file corruption
* fix(observability): resolve pagination statistics display issues
- Fix issue where totalItems=0 showed 'Showing 1 to 0 of 0 results'
- Hide pagination controls when totalItems=0 or totalPages<=1
- Standardize API response fields: pagination.total -> pagination.totalItems
Before: Incorrect stats shown for empty data, and pager visible even for single-page results
After: Stats hidden for empty data, pager hidden when navigation is unnecessary
* feat(observability): display friendly provider names in request details
- Add /api/usage/providers endpoint to dynamically fetch provider list with names
- Replace hardcoded provider options with dynamic loading from database
- Display friendly provider names instead of IDs in both table and detail drawer
- Support custom provider nodes (e.g., OpenAI-compatible) with user-defined names
- Add provider name caching to optimize performance
* fix(observability): use INSERT OR REPLACE for request details to handle streaming updates
* fix(observability): resolve zero-token display issue by ensuring streaming usage capture and fixing key mismatch
* fix(observability): separate TTFT and total latency calculation for streaming requests
* feat(observability): implement SQLite write queue and JSON size limits
- Added in-memory buffer and batch writing for SQLite to prevent lock contention
- Implemented with configurable 1MB limit to prevent DB bloat
- Added dashboard UI for observability performance and data management settings
- Integrated graceful shutdown handlers to prevent data loss
* fix(observability): resolve ReferenceError by declaring dbInstance
- Added new observability settings in the dashboard for max records, batch size, flush interval, and max JSON size.
- Introduced `extractRequestConfig` function to capture full request configurations.
- Enhanced error handling by saving detailed request information on failures.
- Updated usage tracking to include new token metrics.
- Modified streaming functions to support detailed content and reasoning tracking.
The previous merge used sourceFormat check which broke Cursor when it
sends openai-responses format requests. Now uses user-agent detection:
- Droid CLI (user-agent contains 'droid' or 'codex-cli') → passthrough
- Other clients (Cursor, etc.) → translate to Chat Completions format
This fixes the API translation for both clients.
Co-authored-by: Hellodebasishsahu <itsyourboydevil@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Support multiple OpenAI-compatible providers with custom prefix/baseUrl
- Add provider nodes CRUD (create/read/update/delete)
- URL building: baseUrl + /chat/completions or /responses
- Model import from /models endpoint
- API key validation via /models
- Usage type safety across all translators
- OAuth token auto-refresh for expired tokens
- Force streaming for Codex/OpenAI models to fix non-streaming bug
- Strip unsupported params (user, metadata, stream_options, prompt_cache_retention)
- Force response translation from openai-responses to openai format
- Migrate middleware.js to proxy.js for Next.js 16
- Use webpack explicitly in dev/build scripts
- Updated Codex User-Agent