feat: add STT support, Gemini TTS, and expand usage tracking

- Speech-to-Text: full pipeline with sttCore handler, /v1/audio/transcriptions endpoint, sttConfig for OpenAI, Gemini, Groq, Deepgram, AssemblyAI, HuggingFace, NVIDIA Parakeet; new 9router-stt skill - Gemini TTS: add gemini provider with 30 prebuilt voices and TTS_PROVIDER_CONFIG - Usage: implement GLM (intl/cn) and MiniMax (intl/cn) quota fetchers; refactor Gemini CLI usage to use retrieveUserQuota with per-model buckets - Disabled models: lowdb-backed disabledModelsDb + /api/models/disabled route - Header search: reusable Zustand store (headerSearchStore) wired into Header - CLI tools: add Claude Cowork tool card and cowork-settings API - Providers: introduce mediaPriority sorting in getProvidersByKind, add Kimi K2.6, reorder hermes, drop qwen STT kind - UI: expand media-providers/[kind]/[id] page (+314), enhance OAuthModal, ModelSelectModal, ProviderTopology, ProxyPools, ProviderLimits - Assets: refresh provider PNGs (alicode, byteplus, cloudflare-ai, nvidia, ollama, vertex, volcengine-ark) and add aws-polly, fal-ai, jina-ai, recraft, runwayml, stability-ai, topaz, black-forest-labs
2026-05-08 12:01:28 +00:00 · 2026-05-05 10:32:59 +07:00
parent bfb7d42164
commit d4bc42e1f5
67 changed files with 2930 additions and 234 deletions
--- a/skills/9router-stt/SKILL.md
+++ b/skills/9router-stt/SKILL.md
@@ -0,0 +1,77 @@
+---
+name: 9router-stt
+description: Speech-to-text via 9Router /v1/audio/transcriptions using OpenAI Whisper / Groq / Gemini / Deepgram / AssemblyAI / NVIDIA / HuggingFace models. Use when the user wants to transcribe audio, convert speech to text, or get subtitles from audio files.
+---
+
+# 9Router — Speech-to-Text
+
+Requires `NINEROUTER_URL` (and `NINEROUTER_KEY` if auth enabled). See https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router/SKILL.md for setup.
+
+## Discover models
+
+```bash
+curl $NINEROUTER_URL/v1/models/stt | jq '.data[].id'
+```
+
+`model` = STT model ID (e.g. `openai/whisper-1`, `groq/whisper-large-v3`, `deepgram/nova-3`, `gemini/gemini-2.5-flash`).
+
+## Endpoint
+
+`POST $NINEROUTER_URL/v1/audio/transcriptions` (OpenAI Whisper compatible, `multipart/form-data`)
+
+| Field | Required | Notes |
+|---|---|---|
+| `model` | yes | from `/v1/models/stt` |
+| `file` | yes | audio file (mp3, wav, m4a, webm, ogg, flac) |
+| `language` | no | ISO-639-1 (e.g. `en`, `vi`) |
+| `prompt` | no | hint text to guide transcription |
+| `response_format` | no | `json` (default) / `text` / `verbose_json` / `srt` / `vtt` |
+| `temperature` | no | 0–1 |
+
+## Examples
+
+```bash
+curl -X POST "$NINEROUTER_URL/v1/audio/transcriptions" \
+  -H "Authorization: Bearer $NINEROUTER_KEY" \
+  -F "model=openai/whisper-1" \
+  -F "file=@audio.mp3" \
+  -F "language=vi"
+```
+
+JS (Node):
+
+```js
+import { createReadStream } from "node:fs";
+const form = new FormData();
+form.append("model", "groq/whisper-large-v3-turbo");
+form.append("file", new Blob([await (await import("node:fs/promises")).readFile("audio.mp3")]), "audio.mp3");
+const r = await fetch(`${process.env.NINEROUTER_URL}/v1/audio/transcriptions`, {
+  method: "POST",
+  headers: { "Authorization": `Bearer ${process.env.NINEROUTER_KEY}` },
+  body: form,
+});
+const { text } = await r.json();
+console.log(text);
+```
+
+## Response shape
+
+Default (`response_format=json`):
+```json
+{ "text": "Xin chào, đây là bản ghi âm." }
+```
+
+`verbose_json` adds `language`, `duration`, `segments[]` with timestamps.
+`srt` / `vtt` return subtitle text.
+
+## Provider quirks
+
+| Provider | `model` format | Notes |
+|---|---|---|
+| `openai` | `whisper-1`, `gpt-4o-transcribe`, `gpt-4o-mini-transcribe` | Native OpenAI shape |
+| `groq` | `whisper-large-v3`, `whisper-large-v3-turbo`, `distil-whisper-large-v3-en` | Fastest; OpenAI shape |
+| `gemini` | `gemini-2.5-flash`, `gemini-2.5-pro`, `gemini-2.5-flash-lite` | Server converts to `generateContent` with audio inline |
+| `deepgram` | `nova-3`, `nova-2`, `whisper-large` | Token auth; server adapts response |
+| `assemblyai` | `universal-3-pro`, `universal-2` | Async upload+poll handled server-side |
+| `nvidia` | `nvidia/parakeet-ctc-1.1b-asr` | NIM endpoint |
+| `huggingface` | `openai/whisper-large-v3`, `openai/whisper-small` | HF Inference API |
--- a/skills/9router/SKILL.md
+++ b/skills/9router/SKILL.md
@@ -49,6 +49,7 @@ When the user needs a specific capability, fetch that skill's `SKILL.md` from it
 | Chat / code-gen | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-chat/SKILL.md |
 | Image generation | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-image/SKILL.md |
 | Text-to-speech | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-tts/SKILL.md |
+| Speech-to-text | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-stt/SKILL.md |
 | Embeddings | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-embeddings/SKILL.md |
 | Web search | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-search/SKILL.md |
 | Web fetch (URL → markdown) | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-fetch/SKILL.md |
--- a/skills/README.md
+++ b/skills/README.md
@@ -12,6 +12,7 @@ Drop-in skills for any AI agent (Claude, Cursor, ChatGPT, custom SDK). Just **co
 | Chat / code-gen | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-chat/SKILL.md |
 | Image generation | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-image/SKILL.md |
 | Text-to-speech | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-tts/SKILL.md |
+| Speech-to-text | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-stt/SKILL.md |
 | Embeddings | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-embeddings/SKILL.md |
 | Web search | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-search/SKILL.md |
 | Web fetch (URL → markdown) | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-fetch/SKILL.md |