- Refactored handleChatCore to include Caveman functionality, allowing for terse-style system prompts to reduce output token usage.
- Updated APIPageClient to manage Caveman settings, including enabling/disabling and selecting compression levels.
- Adjusted AntigravityExecutor to consolidate function declarations for compatibility with Gemini.
- Removed unnecessary console logs during translator initialization across multiple routes.
- Updated refreshCredentials methods in various executors (Antigravity, Base, Default, Github, Kiro) to accept optional proxyOptions for improved proxy handling.
- Modified token refresh logic to utilize proxy-aware fetch for better network management.
- Enhanced usage retrieval functions to support proxy options, ensuring seamless integration with proxy configurations.
- Updated ModelSelectModal and ProviderInfoCard components to incorporate kind filtering for improved user experience in model selection.
- Added validation for API keys in the provider validation route, including support for webSearch/webFetch providers.
- Added new image models for GPT 5.2, 5.3, and 5.4, including capabilities for text-to-image and editing.
- Updated embedding handling to include optional dimensions in requests.
- Introduced support for custom embedding providers, allowing dynamic fetching and validation of custom nodes.
- Improved image generation handling with Codex integration, including progress tracking and error handling.
- Enhanced UI components to support adding custom embeddings and displaying their status.
- Introduced OpenCode Go provider with relevant configurations.
- Enhanced model management by allowing users to add and delete custom models.
- Updated UI components to support model selection for image types.
- Adjusted sidebar visibility to include image media kinds.
- Integrated Google TTS languages from a separate module for better maintainability.
- Updated local device voice fetching to support both macOS and Windows, improving cross-platform compatibility.
- Enhanced dashboard route protection by adding dynamic settings for login requirements and tunnel access.
- Introduced UI elements for managing security settings related to API key requirements and dashboard access via tunnel.
- Added default TTS response example in the media provider page for better user guidance.
- Updated constants to reflect changes in TTS provider configurations.
This commit improves the overall user experience and security of the TTS features.
sseToJsonHandler.js unconditionally deleted reasoning_content from all
non-streaming responses (added for Firecrawl SDK compatibility). This
breaks thinking models (Qwen3.5, Claude extended thinking, etc.) where
the model may use all tokens for reasoning, leaving content empty.
When reasoning_content is stripped in that case, the response appears
completely empty to the client.
Fix: only strip reasoning_content when the response also has non-empty
content, so that reasoning output is preserved when it is the only
useful output.
Co-authored-by: Agent Zero <agent@agent-zero.local>
Add clientDetector utility to identify CLI tools (Claude Code, Gemini CLI,
Antigravity, Codex) from request headers. When the CLI tool and provider
are a native pair, skip all translation — only swap model and Bearer token.
Made-with: Cursor
Some upstream providers (e.g. Antigravity) return non-standard finish_reason
values like 'other' instead of the OpenAI-standard 'tool_calls' when the
model invokes tools. This causes downstream consumers (e.g. OpenClaw) to
fail to execute tool calls, breaking agentic sub-agent workflows.
Changes:
- nonStreamingHandler: post-translation guard that normalizes finish_reason
to 'tool_calls' when message.tool_calls is present
- sseToJsonHandler: accumulate tool_calls from streaming deltas in
parseSSEToOpenAIResponse; extract function_call items from Responses API
output in handleForcedSSEToJson
- openai-responses translator: use toolCallIndex to choose between
'tool_calls' and 'stop' in flush and response.completed events
Tested: 7 scenarios (non-stream text, single/multiple tool calls, stream
text/tool calls, multi-turn tool conversation, tools present but unused)
Root cause: Codex/OpenAI Responses streams multiple alternating reasoning and
message output items. The first message block often has empty output_text; the
visible answer lives in a later message. Previous code used output.find() which
always picked the first (empty) message block.
Fix: walk message items from end and use the last message whose extracted text
is non-empty; fall back to final message if all are empty.
Note: Removed debug logging code from original PR #383 to keep implementation clean.
Co-authored-by: lokinh <locnh@uniultra.xyz>
Made-with: Cursor
- Respect Accept: application/json header to return non-streaming JSON
instead of SSE, fixing AI SDK generateObject/generateText compatibility
- Strip markdown code block markers (```json...```) from Claude
non-streaming responses to prevent JSON parse errors
Cherry-picked and adapted from PR #290 by @rothnic
https://github.com/decolua/9router/pull/290
Made-with: Cursor
- Centralize proxy management with reusable proxy pools
- Per-connection proxy binding with legacy fallback
- Add strictProxy option: fail hard instead of silently falling back to direct
- Resolve alicode-intl conflict: keep alicode-intl support + proxy support
Made-with: Cursor
- Tracks endpoints like /v1/chat/completions, /v1/messages, /v1/responses
- New sortable/groupable table in usage dashboard with expandable groups
- Enhanced usage database aggregation by endpoint + model + provider
- Added endpoint tracking to all saveRequestUsage/saveRequestDetail calls
- Maintains backward compatibility with existing data structure
- Add response logging for non-streaming requests (5_res_provider.json, 7_res_client.json)
- Fix extractUsageFromResponse() to check Claude format before OpenAI format
- Prevents format misidentification that caused tokens to show as 0
- Claude uses input_tokens/output_tokens vs OpenAI's prompt_tokens/completion_tokens
Fixes dashboard Details tab showing 0 tokens for Claude requests
- Added detailed request logging and latency tracking in handleChatCore.
- Improved error handling for SSE to JSON conversion and JSON parsing in streamToJsonConverter.
- Introduced a safe JSON parsing utility to handle potential parsing errors gracefully in requestDetailsDb.
Co-authored-by: zx <me@char.moe>
* feat: add AI request details feature with latency tracking
Add comprehensive request history and debugging capability to the Usage dashboard:
**Storage Layer** (usageDb.js):
- Add saveRequestDetail() for storing full request/response details
- Implement FIFO queue with 1000-record limit in request-details.json
- Auto-sanitize sensitive headers (authorization, api-key, cookie, token)
- Add getRequestDetails() with pagination and filtering support
- Add getRequestDetailById() for single record lookup
**Pipeline Integration** (chatCore.js):
- Track request start time and calculate total latency
- Record TTFT (Time To First Token) and total latency for all requests
- Capture full request details (messages, model, parameters)
- Save response content for non-streaming, mark streaming responses
- Handle error cases with detailed error information
- Async non-blocking saves to avoid impacting request performance
**API Layer** (/api/usage/request-details):
- GET endpoint with pagination (page, pageSize: 1-100)
- Filter by provider, model, connectionId, status, date range
- Returns { details: [...], pagination: {...} } format
**UI Components**:
- Drawer.js: Right slide-out panel with backdrop blur and ESC close
- Pagination.js: Full pagination with page size selector (10/20/50)
- RequestDetailsTab.js: Complete table view with filters and detail drawer
**Dashboard Integration**:
- Add "Details" tab to Usage page (4th tab after Overview/Logger/Limits)
- Table columns: Timestamp, Model, Provider, Input Tokens, Output Tokens, Latency (TTFT/Total), Action
- Provider filter dropdown (9 providers supported)
- Date range filters (start/end datetime)
- Click "Detail" button to view full request/response JSON in slide-out drawer
**Features**:
- Real-time latency monitoring (TTFT & Total)
- Complete request/response inspection for debugging
- Filterable and searchable request history
- Responsive design with mobile-friendly filters
- Data security with automatic header sanitization
- Performance: async saves don't block request pipeline
**Files Created/Modified**:
- src/lib/usageDb.js (modified)
- open-sse/handlers/chatCore.js (modified)
- src/app/api/usage/request-details/route.js (new)
- src/shared/components/Drawer.js (new)
- src/shared/components/Pagination.js (new)
- src/app/(dashboard)/dashboard/usage/components/RequestDetailsTab.js (new)
- src/app/(dashboard)/dashboard/usage/page.js (modified)
Closes: AI Observability Dashboard feature
* feat: enhance request details with full config and streaming content capture
Improve Request Details feature to capture comprehensive request parameters
and actual streaming response content:
**Request Configuration Enhancement** (chatCore.js):
- Add extractRequestConfig() helper function to capture all request parameters
- Include temperature controls: temperature, top_p, top_k
- Include token limits: max_tokens, max_completion_tokens
- Include thinking/reasoning modes: thinking, reasoning, enable_thinking
- Include OpenAI parameters: presence_penalty, frequency_penalty, seed, stop,
tools, tool_choice, response_format, n, logprobs, top_logprobs, logit_bias,
user, parallel_tool_calls, prediction, store, metadata
- Apply to all request types: non-streaming, streaming, and error cases
**Streaming Content Capture** (chatCore.js & stream.js):
- Add onStreamComplete callback mechanism to stream processors
- Accumulate content from all formats: OpenAI, Claude, Gemini
- Track content from delta.content, delta.reasoning_content, delta.text,
delta.thinking, and Gemini content.parts
- Save initial record with "[Streaming in progress...]" marker
- Update record with actual content when stream completes
- Include usage tokens when available from stream
**Files Modified**:
- open-sse/handlers/chatCore.js - extractRequestConfig() + streaming capture
- open-sse/utils/stream.js - onStreamComplete callback + content accumulation
**Benefits**:
- View complete request configuration in Request Details (thinking mode, etc.)
- See actual streaming response content instead of placeholder
- Better debugging and observability for AI requests
Refs: #request-details-enhancement
* feat: separate thinking/reasoning content from response content
Improve Request Details to display thinking process separately from final response:
**Backend Changes**:
- stream.js: Capture content and thinking separately in streaming mode
- Add accumulatedThinking variable alongside accumulatedContent
- Route delta.content to content, delta.reasoning_content to thinking
- Support OpenAI (reasoning_content), Claude (thinking), Gemini (part.thought)
- Update onStreamComplete callback to return { content, thinking } object
- chatCore.js: Update response structure to include thinking field
- Non-streaming: Extract thinking from reasoning_content field
- Streaming: Receive { content, thinking } from stream callback
- Error responses: Include thinking: null
- Initial streaming save: Include thinking: null
**Frontend Changes**:
- RequestDetailsTab.js: Display thinking and content in separate sections
- Add amber/yellow themed "Thinking Process" section with psychology icon
- Show "Final Response" label when thinking is present
- Use distinct visual styling for thinking (amber bg) vs content (gray bg)
- Only show thinking section when thinking content exists
**Benefits**:
- Users can clearly see model's reasoning process vs final answer
- Better debugging for models with thinking capabilities (Claude, o1, etc.)
- Visual distinction makes it easy to identify thinking vs response
Refs: #thinking-content-separation
* fix: map Claude thinking to reasoning_content field
Fix Claude thinking content to be properly captured as reasoning_content
instead of regular content, enabling separate display in Request Details:
**Changes**:
- claude-to-openai.js: Use reasoning_content field for thinking blocks
- thinking start: send { reasoning_content: "" } instead of { content: "```\n```" }
- thinking delta: map to reasoning_content instead of content
- thinking stop: send { reasoning_content: "" } instead of { content: "```\n```" }
**Why This Matters**:
- Previously Claude thinking was sent as `content` field, mixed with actual response
- Now thinking uses `reasoning_content` field, matching OpenAI's o1 format
- stream.js can now properly route thinking to accumulatedThinking variable
- Request Details UI will show Claude thinking in separate "Thinking Process" section
**Supported Thinking Formats**:
- OpenAI: delta.reasoning_content → thinking
- Claude: delta.thinking → reasoning_content (now fixed)
- Gemini: part.thought === true → thinking
Refs: #claude-thinking-fix
* feat(observability): capture and display full 4-layer request chain
Capture complete request/response chain in AI Request Details:
- Add providerRequest field (translated request sent to provider)
- Add providerResponse field (raw provider response, streaming indicator)
- Update chatCore.js at all 5 saveRequestDetail() call sites
- Reorganize UI into 4 collapsible sections with Material icons
- Preserve backward compatibility for old records
- Add distinct styling for streaming indicator
* fix(observability): resolve React duplicate key warning in request details table
- Use composite key (detail.id + index) to ensure unique keys
- Prevents React warnings when database contains duplicate IDs from old ID generation
* fix(observability): display actual content in streaming request details
Change providerResponse field for streaming requests from placeholder
"[Streaming - raw response not captured]" to actual final content.
This improves debugging experience by showing the real AI response
in the "Provider Response (Raw)" section instead of a confusing
placeholder message.
Files changed:
- open-sse/handlers/chatCore.js: Save contentObj.content to providerResponse
- src/app/.../RequestDetailsTab.js: Remove special handling for placeholder
* refactor(observability): migrate request details to SQLite for improved concurrency
- Replace LowDB JSON storage with better-sqlite3
- Enable WAL mode for true concurrent read/write support
- Add 5 indexes to accelerate queries (timestamp, provider, model, connection_id, status)
- Perform pagination at the database level to reduce memory footprint
- Maintain 1000 record limit with automatic cleanup of old data
- Ensure API compatibility via re-exports, requiring no caller changes
Performance improvements:
- Concurrent Writes: Lock-free WAL mode prevents data contention
- Query Efficiency: Index-based searches replace full dataset loading
- Data Integrity: Atomic operations prevent file corruption
* fix(observability): resolve pagination statistics display issues
- Fix issue where totalItems=0 showed 'Showing 1 to 0 of 0 results'
- Hide pagination controls when totalItems=0 or totalPages<=1
- Standardize API response fields: pagination.total -> pagination.totalItems
Before: Incorrect stats shown for empty data, and pager visible even for single-page results
After: Stats hidden for empty data, pager hidden when navigation is unnecessary
* feat(observability): display friendly provider names in request details
- Add /api/usage/providers endpoint to dynamically fetch provider list with names
- Replace hardcoded provider options with dynamic loading from database
- Display friendly provider names instead of IDs in both table and detail drawer
- Support custom provider nodes (e.g., OpenAI-compatible) with user-defined names
- Add provider name caching to optimize performance
* fix(observability): use INSERT OR REPLACE for request details to handle streaming updates
* fix(observability): resolve zero-token display issue by ensuring streaming usage capture and fixing key mismatch
* fix(observability): separate TTFT and total latency calculation for streaming requests
* feat(observability): implement SQLite write queue and JSON size limits
- Added in-memory buffer and batch writing for SQLite to prevent lock contention
- Implemented with configurable 1MB limit to prevent DB bloat
- Added dashboard UI for observability performance and data management settings
- Integrated graceful shutdown handlers to prevent data loss
* fix(observability): resolve ReferenceError by declaring dbInstance
- Added new observability settings in the dashboard for max records, batch size, flush interval, and max JSON size.
- Introduced `extractRequestConfig` function to capture full request configurations.
- Enhanced error handling by saving detailed request information on failures.
- Updated usage tracking to include new token metrics.
- Modified streaming functions to support detailed content and reasoning tracking.