Ploscha/Awesome-Audio-Generation

Awesome-Audio-Generation is a collection of resources for Text-to-Audio Generation, focusing on ambient sound and music. 🎵 Explore foundational models and contribute your findings to help grow this GitHub community! 🐙

GitHub repository with 13 stars and 1 forks.

Topics: asr, audio-driven-talking-face, audio-generation, awesome, awesome-music-generation, controllable-generation, music-generation, paperlist, speech-driven-talking-face, talking-face-generation

Open provider repository

Latest metric snapshot

2026-06-05: 13 stars and 1 forks.

Similar repositories

  1. 1. Open-Less/openless

    Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话,松开即得润色后的文字)

    GitHub repository with 2,139 stars and 165 forks.

    Trending score: 2.75; stars gained: +25; forks gained: +1.

    Language: HTML

    Topics: ai-prompt, asr, dictation, macos, open-source, speech-to-text

  2. 2. xzf-thu/Mega-ASR

    First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**

    GitHub repository with 955 stars and 61 forks.

    Trending score: 2.72; stars gained: +33; forks gained: -1.

    Language: Python

    Topics: asr, robust

  3. 3. modelscope/FunASR

    Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

    GitHub repository with 16,750 stars and 1,720 forks.

    Trending score: 1.93; stars gained: +56; forks gained: +2.

    Language: Python

    Topics: pytorch, speech-recognition, paraformer, punctuation, speaker-diarization, voice-activity-detection

  4. 4. Soul-AILab/SoulX-Transcriber

    An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

    GitHub repository with 182 stars and 7 forks.

    Trending score: 1.58; stars gained: +40; forks gained: +5.

    Language: Python

    Topics: asr, llm, sd, sdr, speech-recognition

  5. 5. soniqo/speech-swift

    AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML

    GitHub repository with 785 stars and 101 forks.

    Trending score: 1.25; stars gained: +2; forks gained: +0.

    Language: Swift

    Topics: apple-silicon, asr, coreml, ios, macos, mlx

  6. 6. BillLucky/echocut

    Turn raw footage into brand-ready, platform-optimized video with one command. Local-first: FFmpeg + WhisperX/MLX + Ollama.

    GitHub repository with 51 stars and 12 forks.

    Trending score: 0.89; stars gained: +7; forks gained: +1.

    Language: JavaScript

    Topics: asr, captions, cli, ffmpeg, llm, local-first

Trending topic: asr

  1. 1. Open-Less/openless

    Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice input for macOS & Windows. (按住快捷键说话,松开即得润色后的文字)

    GitHub repository with 2,139 stars and 165 forks.

    Trending score: 2.75; stars gained: +25; forks gained: +1.

    Language: HTML

    Topics: ai-prompt, asr, dictation, macos, open-source, speech-to-text

  2. 2. xzf-thu/Mega-ASR

    First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**

    GitHub repository with 955 stars and 61 forks.

    Trending score: 2.72; stars gained: +33; forks gained: -1.

    Language: Python

    Topics: asr, robust

  3. 3. modelscope/FunASR

    Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

    GitHub repository with 16,750 stars and 1,720 forks.

    Trending score: 1.93; stars gained: +56; forks gained: +2.

    Language: Python

    Topics: pytorch, speech-recognition, paraformer, punctuation, speaker-diarization, voice-activity-detection

  4. 4. Soul-AILab/SoulX-Transcriber

    An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

    GitHub repository with 182 stars and 7 forks.

    Trending score: 1.58; stars gained: +40; forks gained: +5.

    Language: Python

    Topics: asr, llm, sd, sdr, speech-recognition

  5. 5. soniqo/speech-swift

    AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML

    GitHub repository with 785 stars and 101 forks.

    Trending score: 1.25; stars gained: +2; forks gained: +0.

    Language: Swift

    Topics: apple-silicon, asr, coreml, ios, macos, mlx

  6. 6. BillLucky/echocut

    Turn raw footage into brand-ready, platform-optimized video with one command. Local-first: FFmpeg + WhisperX/MLX + Ollama.

    GitHub repository with 51 stars and 12 forks.

    Trending score: 0.89; stars gained: +7; forks gained: +1.

    Language: JavaScript

    Topics: asr, captions, cli, ffmpeg, llm, local-first