mirror of
https://github.com/decolua/9router.git
synced 2026-05-08 12:01:28 +00:00
feat: add STT support, Gemini TTS, and expand usage tracking
- Speech-to-Text: full pipeline with sttCore handler, /v1/audio/transcriptions endpoint, sttConfig for OpenAI, Gemini, Groq, Deepgram, AssemblyAI, HuggingFace, NVIDIA Parakeet; new 9router-stt skill - Gemini TTS: add gemini provider with 30 prebuilt voices and TTS_PROVIDER_CONFIG - Usage: implement GLM (intl/cn) and MiniMax (intl/cn) quota fetchers; refactor Gemini CLI usage to use retrieveUserQuota with per-model buckets - Disabled models: lowdb-backed disabledModelsDb + /api/models/disabled route - Header search: reusable Zustand store (headerSearchStore) wired into Header - CLI tools: add Claude Cowork tool card and cowork-settings API - Providers: introduce mediaPriority sorting in getProvidersByKind, add Kimi K2.6, reorder hermes, drop qwen STT kind - UI: expand media-providers/[kind]/[id] page (+314), enhance OAuthModal, ModelSelectModal, ProviderTopology, ProxyPools, ProviderLimits - Assets: refresh provider PNGs (alicode, byteplus, cloudflare-ai, nvidia, ollama, vertex, volcengine-ark) and add aws-polly, fal-ai, jina-ai, recraft, runwayml, stability-ai, topaz, black-forest-labs
This commit is contained in:
77
skills/9router-stt/SKILL.md
Normal file
77
skills/9router-stt/SKILL.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
name: 9router-stt
|
||||
description: Speech-to-text via 9Router /v1/audio/transcriptions using OpenAI Whisper / Groq / Gemini / Deepgram / AssemblyAI / NVIDIA / HuggingFace models. Use when the user wants to transcribe audio, convert speech to text, or get subtitles from audio files.
|
||||
---
|
||||
|
||||
# 9Router — Speech-to-Text
|
||||
|
||||
Requires `NINEROUTER_URL` (and `NINEROUTER_KEY` if auth enabled). See https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router/SKILL.md for setup.
|
||||
|
||||
## Discover models
|
||||
|
||||
```bash
|
||||
curl $NINEROUTER_URL/v1/models/stt | jq '.data[].id'
|
||||
```
|
||||
|
||||
`model` = STT model ID (e.g. `openai/whisper-1`, `groq/whisper-large-v3`, `deepgram/nova-3`, `gemini/gemini-2.5-flash`).
|
||||
|
||||
## Endpoint
|
||||
|
||||
`POST $NINEROUTER_URL/v1/audio/transcriptions` (OpenAI Whisper compatible, `multipart/form-data`)
|
||||
|
||||
| Field | Required | Notes |
|
||||
|---|---|---|
|
||||
| `model` | yes | from `/v1/models/stt` |
|
||||
| `file` | yes | audio file (mp3, wav, m4a, webm, ogg, flac) |
|
||||
| `language` | no | ISO-639-1 (e.g. `en`, `vi`) |
|
||||
| `prompt` | no | hint text to guide transcription |
|
||||
| `response_format` | no | `json` (default) / `text` / `verbose_json` / `srt` / `vtt` |
|
||||
| `temperature` | no | 0–1 |
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
curl -X POST "$NINEROUTER_URL/v1/audio/transcriptions" \
|
||||
-H "Authorization: Bearer $NINEROUTER_KEY" \
|
||||
-F "model=openai/whisper-1" \
|
||||
-F "file=@audio.mp3" \
|
||||
-F "language=vi"
|
||||
```
|
||||
|
||||
JS (Node):
|
||||
|
||||
```js
|
||||
import { createReadStream } from "node:fs";
|
||||
const form = new FormData();
|
||||
form.append("model", "groq/whisper-large-v3-turbo");
|
||||
form.append("file", new Blob([await (await import("node:fs/promises")).readFile("audio.mp3")]), "audio.mp3");
|
||||
const r = await fetch(`${process.env.NINEROUTER_URL}/v1/audio/transcriptions`, {
|
||||
method: "POST",
|
||||
headers: { "Authorization": `Bearer ${process.env.NINEROUTER_KEY}` },
|
||||
body: form,
|
||||
});
|
||||
const { text } = await r.json();
|
||||
console.log(text);
|
||||
```
|
||||
|
||||
## Response shape
|
||||
|
||||
Default (`response_format=json`):
|
||||
```json
|
||||
{ "text": "Xin chào, đây là bản ghi âm." }
|
||||
```
|
||||
|
||||
`verbose_json` adds `language`, `duration`, `segments[]` with timestamps.
|
||||
`srt` / `vtt` return subtitle text.
|
||||
|
||||
## Provider quirks
|
||||
|
||||
| Provider | `model` format | Notes |
|
||||
|---|---|---|
|
||||
| `openai` | `whisper-1`, `gpt-4o-transcribe`, `gpt-4o-mini-transcribe` | Native OpenAI shape |
|
||||
| `groq` | `whisper-large-v3`, `whisper-large-v3-turbo`, `distil-whisper-large-v3-en` | Fastest; OpenAI shape |
|
||||
| `gemini` | `gemini-2.5-flash`, `gemini-2.5-pro`, `gemini-2.5-flash-lite` | Server converts to `generateContent` with audio inline |
|
||||
| `deepgram` | `nova-3`, `nova-2`, `whisper-large` | Token auth; server adapts response |
|
||||
| `assemblyai` | `universal-3-pro`, `universal-2` | Async upload+poll handled server-side |
|
||||
| `nvidia` | `nvidia/parakeet-ctc-1.1b-asr` | NIM endpoint |
|
||||
| `huggingface` | `openai/whisper-large-v3`, `openai/whisper-small` | HF Inference API |
|
||||
@@ -49,6 +49,7 @@ When the user needs a specific capability, fetch that skill's `SKILL.md` from it
|
||||
| Chat / code-gen | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-chat/SKILL.md |
|
||||
| Image generation | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-image/SKILL.md |
|
||||
| Text-to-speech | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-tts/SKILL.md |
|
||||
| Speech-to-text | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-stt/SKILL.md |
|
||||
| Embeddings | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-embeddings/SKILL.md |
|
||||
| Web search | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-search/SKILL.md |
|
||||
| Web fetch (URL → markdown) | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-fetch/SKILL.md |
|
||||
|
||||
@@ -12,6 +12,7 @@ Drop-in skills for any AI agent (Claude, Cursor, ChatGPT, custom SDK). Just **co
|
||||
| Chat / code-gen | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-chat/SKILL.md |
|
||||
| Image generation | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-image/SKILL.md |
|
||||
| Text-to-speech | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-tts/SKILL.md |
|
||||
| Speech-to-text | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-stt/SKILL.md |
|
||||
| Embeddings | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-embeddings/SKILL.md |
|
||||
| Web search | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-search/SKILL.md |
|
||||
| Web fetch (URL → markdown) | https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router-web-fetch/SKILL.md |
|
||||
|
||||
Reference in New Issue
Block a user