voicegain/platform
Voicegain Enterprise Speech-to-Text Platform (API, Portal, etc.)
GitHub repository with 33 stars and 21 forks.
Language: HTML
Topics: asr, deep-neural-networks, ivr, mrcp, rtc, speech-to-text, transcription
Voicegain Enterprise Speech-to-Text Platform (API, Portal, etc.)
GitHub repository with 33 stars and 21 forks.
Language: HTML
Topics: asr, deep-neural-networks, ivr, mrcp, rtc, speech-to-text, transcription
2026-06-05: 33 stars and 21 forks.
Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话,松开即得润色后的文字)
GitHub repository with 2,150 stars and 169 forks.
Trending score: 2.75; stars gained: +25; forks gained: +1.
Language: HTML
Topics: ai-prompt, asr, dictation, linux, llm, macos
AI Agent 学习路线与资料库收集
GitHub repository with 2,917 stars and 291 forks.
Trending score: 4.01; stars gained: +285; forks gained: +24.
Language: HTML
🪧 Claude Code / Codex skill — generate Xiaohongshu carousels & WeChat 21:9+1:1 cover pairs. Editorial × Swiss visual systems, 28 layouts, 10 themes, single-file HTML → PNG. 小红书图文 + 公众号封面对
GitHub repository with 2,950 stars and 273 forks.
Trending score: 3.95; stars gained: +132; forks gained: +8.
Language: HTML
Topics: agent-skill, ai-agent, anthropic, claude-code, claude-skill, codex
✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.
GitHub repository with 6,147 stars and 600 forks.
Trending score: 3.48; stars gained: +77; forks gained: +10.
Language: HTML
Topics: agent-skills, agentic, ai-agents, ai-design, ai-editor, byok
Programmatic video for coding agents — turn HTML, CSS & data into real MP4s on your laptop. Pluggable render engines, 21 templates, AI soundtrack. Apache-2.0, no per-render fees. An official project by the Open Design team.
GitHub repository with 1,287 stars and 131 forks.
Trending score: 2.77; stars gained: +577; forks gained: +56.
Language: HTML
Topics: ai-agent, apache-2, coding-agent, css, ffmpeg, html
Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话,松开即得润色后的文字)
GitHub repository with 2,150 stars and 169 forks.
Trending score: 2.75; stars gained: +25; forks gained: +1.
Language: HTML
Topics: ai-prompt, asr, dictation, linux, llm, macos
A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the skills they use.
GitHub repository with 5,091 stars and 686 forks.
Trending score: 2.64; stars gained: +510; forks gained: +40.
Language: HTML
Topics: claude-code, claude-code-plugin, harness, harness-engineering
Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话,松开即得润色后的文字)
GitHub repository with 2,150 stars and 169 forks.
Trending score: 2.75; stars gained: +25; forks gained: +1.
Language: HTML
Topics: ai-prompt, asr, dictation, linux, llm, macos
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**
GitHub repository with 957 stars and 61 forks.
Trending score: 2.72; stars gained: +33; forks gained: -1.
Language: Python
Topics: asr, robust
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
GitHub repository with 16,750 stars and 1,720 forks.
Trending score: 1.93; stars gained: +56; forks gained: +2.
Language: Python
Topics: pytorch, speech-recognition, paraformer, punctuation, speaker-diarization, voice-activity-detection
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
GitHub repository with 183 stars and 7 forks.
Trending score: 1.58; stars gained: +40; forks gained: +5.
Language: Python
Topics: asr, llm, sd, sdr, speech-recognition
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
GitHub repository with 12,740 stars and 1,453 forks.
Trending score: 1.41; stars gained: +29; forks gained: +1.
Language: C++
Topics: asr, onnx, windows, linux, macos, cpp
Turn raw footage into brand-ready, platform-optimized video with one command. Local-first: FFmpeg + WhisperX/MLX + Ollama.
GitHub repository with 51 stars and 12 forks.
Trending score: 0.89; stars gained: +7; forks gained: +1.
Language: JavaScript
Topics: asr, captions, cli, ffmpeg, llm, local-first