Pennantiscrew/SoulX-Transcriber-726
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
GitHub repository with 56 stars and 0 forks.
Language: Python
Topics: asr, llm, sd, sdr, speech-recognition
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
GitHub repository with 56 stars and 0 forks.
Language: Python
Topics: asr, llm, sd, sdr, speech-recognition
2026-06-04: 56 stars and 0 forks.
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**
GitHub repository with 957 stars and 61 forks.
Trending score: 2.72; stars gained: +33; forks gained: -1.
Language: Python
Topics: asr, robust
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
GitHub repository with 16,750 stars and 1,720 forks.
Trending score: 1.93; stars gained: +56; forks gained: +2.
Language: Python
Topics: pytorch, speech-recognition, paraformer, punctuation, speaker-diarization, voice-activity-detection
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
GitHub repository with 184 stars and 7 forks.
Trending score: 1.58; stars gained: +40; forks gained: +5.
Language: Python
Topics: asr, llm, sd, sdr, speech-recognition
Generate degraded speech datasets for noise-robust ASR benchmarking
GitHub repository with 15 stars and 0 forks.
Trending score: 0.50; stars gained: +2; forks gained: +0.
Language: Python
Topics: asr, audio, audiomentations, benchmark, cli, dataset-generation
A training-free orchestration framework for building interactive omni-modal assistants by composing off-the-shelf modality experts, explicit LLM routing, text-centric cross-modal memory, and interruption-aware streaming interaction.
GitHub repository with 11 stars and 2 forks.
Trending score: 0.29; stars gained: +1; forks gained: +0.
Language: Python
Topics: asr, llm, omni, trainingfree, video-audio
The agent that grows with you
GitHub repository with 182,737 stars and 31,332 forks.
Trending score: 5.95; stars gained: +1,867; forks gained: +361.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Academic Research Skills for Claude Code: research → write → review → revise → finalize
GitHub repository with 27,643 stars and 2,276 forks.
Trending score: 5.52; stars gained: +1,079; forks gained: +89.
Language: Python
Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review
Learn it. Build it. Ship it for others.
GitHub repository with 28,771 stars and 4,705 forks.
Trending score: 5.32; stars gained: +1,261; forks gained: +238.
Language: Python
Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course
An opinionated list of Python frameworks, libraries, tools, and resources
GitHub repository with 301,459 stars and 28,045 forks.
Trending score: 4.60; stars gained: +518; forks gained: +24.
Language: Python
Topics: awesome, python, collections, python-frameworks, python-libraries, python-tools
Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)
GitHub repository with 32,561 stars and 4,946 forks.
Trending score: 4.56; stars gained: +467; forks gained: +82.
Language: Python
The agent engineering platform.
GitHub repository with 138,601 stars and 22,962 forks.
Trending score: 4.53; stars gained: +171; forks gained: +31.
Language: Python
Topics: ai, anthropic, gemini, langchain, llm, openai
Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话,松开即得润色后的文字)
GitHub repository with 2,159 stars and 170 forks.
Trending score: 2.75; stars gained: +25; forks gained: +1.
Language: HTML
Topics: ai-prompt, asr, dictation, macos, open-source, speech-to-text
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**
GitHub repository with 957 stars and 61 forks.
Trending score: 2.72; stars gained: +33; forks gained: -1.
Language: Python
Topics: asr, robust
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
GitHub repository with 16,750 stars and 1,720 forks.
Trending score: 1.93; stars gained: +56; forks gained: +2.
Language: Python
Topics: pytorch, speech-recognition, paraformer, punctuation, speaker-diarization, voice-activity-detection
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
GitHub repository with 184 stars and 7 forks.
Trending score: 1.58; stars gained: +40; forks gained: +5.
Language: Python
Topics: asr, llm, sd, sdr, speech-recognition
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
GitHub repository with 12,740 stars and 1,453 forks.
Trending score: 1.41; stars gained: +29; forks gained: +1.
Language: C++
Topics: asr, onnx, windows, linux, macos, cpp
Turn raw footage into brand-ready, platform-optimized video with one command. Local-first: FFmpeg + WhisperX/MLX + Ollama.
GitHub repository with 51 stars and 12 forks.
Trending score: 0.89; stars gained: +7; forks gained: +1.
Language: JavaScript
Topics: asr, captions, cli, ffmpeg, llm, local-first