Ploscha/Awesome-Audio-Generation

Awesome-Audio-Generation is a collection of resources for Text-to-Audio Generation, focusing on ambient sound and music. 🎵 Explore foundational models and contribute your findings to help grow this GitHub community! 🐙

GitHub repository with 13 stars and 1 forks.

Topics: asr, audio-driven-talking-face, audio-generation, awesome, awesome-music-generation, controllable-generation, music-generation, paperlist, speech-driven-talking-face, talking-face-generation

Open provider repository

Latest metric snapshot

2026-06-05: 13 stars and 1 forks.

Similar repositories

1. Open-Less/openless

Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话，松开即得润色后的文字)

GitHub repository with 2,139 stars and 165 forks.

Trending score: 2.75; stars gained: +25; forks gained: +1.

Language: HTML

Topics: ai-prompt, asr, dictation, macos, open-source, speech-to-text
2. xzf-thu/Mega-ASR

First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**

GitHub repository with 955 stars and 61 forks.

Trending score: 2.72; stars gained: +33; forks gained: -1.

Language: Python

Topics: asr, robust
3. modelscope/FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

GitHub repository with 16,750 stars and 1,720 forks.

Trending score: 1.93; stars gained: +56; forks gained: +2.

Language: Python

Topics: pytorch, speech-recognition, paraformer, punctuation, speaker-diarization, voice-activity-detection
4. Soul-AILab/SoulX-Transcriber

An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

GitHub repository with 182 stars and 7 forks.

Trending score: 1.58; stars gained: +40; forks gained: +5.

Language: Python

Topics: asr, llm, sd, sdr, speech-recognition
5. soniqo/speech-swift

AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML

GitHub repository with 785 stars and 101 forks.

Trending score: 1.25; stars gained: +2; forks gained: +0.

Language: Swift

Topics: apple-silicon, asr, coreml, ios, macos, mlx
6. BillLucky/echocut

Turn raw footage into brand-ready, platform-optimized video with one command. Local-first: FFmpeg + WhisperX/MLX + Ollama.

GitHub repository with 51 stars and 12 forks.

Trending score: 0.89; stars gained: +7; forks gained: +1.

Language: JavaScript

Topics: asr, captions, cli, ffmpeg, llm, local-first

Ploscha/Awesome-Audio-Generation

Latest metric snapshot

Similar repositories

Trending topic: asr