mirror of
https://github.com/decolua/9router.git
synced 2026-05-08 12:01:28 +00:00
- Speech-to-Text: full pipeline with sttCore handler, /v1/audio/transcriptions endpoint, sttConfig for OpenAI, Gemini, Groq, Deepgram, AssemblyAI, HuggingFace, NVIDIA Parakeet; new 9router-stt skill - Gemini TTS: add gemini provider with 30 prebuilt voices and TTS_PROVIDER_CONFIG - Usage: implement GLM (intl/cn) and MiniMax (intl/cn) quota fetchers; refactor Gemini CLI usage to use retrieveUserQuota with per-model buckets - Disabled models: lowdb-backed disabledModelsDb + /api/models/disabled route - Header search: reusable Zustand store (headerSearchStore) wired into Header - CLI tools: add Claude Cowork tool card and cowork-settings API - Providers: introduce mediaPriority sorting in getProvidersByKind, add Kimi K2.6, reorder hermes, drop qwen STT kind - UI: expand media-providers/[kind]/[id] page (+314), enhance OAuthModal, ModelSelectModal, ProviderTopology, ProxyPools, ProviderLimits - Assets: refresh provider PNGs (alicode, byteplus, cloudflare-ai, nvidia, ollama, vertex, volcengine-ark) and add aws-polly, fal-ai, jina-ai, recraft, runwayml, stability-ai, topaz, black-forest-labs
2.8 KiB
2.8 KiB
name, description
| name | description |
|---|---|
| 9router-stt | Speech-to-text via 9Router /v1/audio/transcriptions using OpenAI Whisper / Groq / Gemini / Deepgram / AssemblyAI / NVIDIA / HuggingFace models. Use when the user wants to transcribe audio, convert speech to text, or get subtitles from audio files. |
9Router — Speech-to-Text
Requires NINEROUTER_URL (and NINEROUTER_KEY if auth enabled). See https://raw.githubusercontent.com/decolua/9router/refs/heads/master/skills/9router/SKILL.md for setup.
Discover models
curl $NINEROUTER_URL/v1/models/stt | jq '.data[].id'
model = STT model ID (e.g. openai/whisper-1, groq/whisper-large-v3, deepgram/nova-3, gemini/gemini-2.5-flash).
Endpoint
POST $NINEROUTER_URL/v1/audio/transcriptions (OpenAI Whisper compatible, multipart/form-data)
| Field | Required | Notes |
|---|---|---|
model |
yes | from /v1/models/stt |
file |
yes | audio file (mp3, wav, m4a, webm, ogg, flac) |
language |
no | ISO-639-1 (e.g. en, vi) |
prompt |
no | hint text to guide transcription |
response_format |
no | json (default) / text / verbose_json / srt / vtt |
temperature |
no | 0–1 |
Examples
curl -X POST "$NINEROUTER_URL/v1/audio/transcriptions" \
-H "Authorization: Bearer $NINEROUTER_KEY" \
-F "model=openai/whisper-1" \
-F "file=@audio.mp3" \
-F "language=vi"
JS (Node):
import { createReadStream } from "node:fs";
const form = new FormData();
form.append("model", "groq/whisper-large-v3-turbo");
form.append("file", new Blob([await (await import("node:fs/promises")).readFile("audio.mp3")]), "audio.mp3");
const r = await fetch(`${process.env.NINEROUTER_URL}/v1/audio/transcriptions`, {
method: "POST",
headers: { "Authorization": `Bearer ${process.env.NINEROUTER_KEY}` },
body: form,
});
const { text } = await r.json();
console.log(text);
Response shape
Default (response_format=json):
{ "text": "Xin chào, đây là bản ghi âm." }
verbose_json adds language, duration, segments[] with timestamps.
srt / vtt return subtitle text.
Provider quirks
| Provider | model format |
Notes |
|---|---|---|
openai |
whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe |
Native OpenAI shape |
groq |
whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en |
Fastest; OpenAI shape |
gemini |
gemini-2.5-flash, gemini-2.5-pro, gemini-2.5-flash-lite |
Server converts to generateContent with audio inline |
deepgram |
nova-3, nova-2, whisper-large |
Token auth; server adapts response |
assemblyai |
universal-3-pro, universal-2 |
Async upload+poll handled server-side |
nvidia |
nvidia/parakeet-ctc-1.1b-asr |
NIM endpoint |
huggingface |
openai/whisper-large-v3, openai/whisper-small |
HF Inference API |