Mattral/Composed-Mixture-of-Experts-Engine

moe-engine is a research-grade infrastructure layer for training large Mixture-of-Experts language models at hyperscale. It is designed around one core constraint: at 10K+ GPUs, nodes die continuously. The system must keep training alive end-to-end — routing correctly, checkpointing durably, and resuming without operator intervention.

GitHub repository with 10 stars and 8 forks.

Language: Python

Topics: distributed-training, fault-tolerance, llm-training, machine-learning, mixture-of-experts, moe, production-infrastructure, pytorch, sparse-training, triton

Open provider repository

24h trend summary

Trending score 0.61, freshness score 0.92, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-15: 10 stars and 8 forks.

Similar repositories

  1. 1. skypilot-org/skypilot

    Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

    GitHub repository with 10,158 stars and 1,099 forks.

    Trending score: 3.24; stars gained: +63; forks gained: +6.

    Language: Python

    Topics: cloud-computing, cloud-management, cost-optimization, deep-learning, distributed-training, gpu

  2. 2. The-AI-Alliance/tapestry

    Project Tapestry aims to give every nation and participant frontier AI they can call their own — uniting a global consortium to train a shared frontier model from which partners build and own sovereign models aligned to their national, socio-cultural, and industrial needs.

    GitHub repository with 108 stars and 11 forks.

    Trending score: 2.51; stars gained: +14; forks gained: +3.

    Language: Python

    Topics: ai-alliance, ai-security, consortium-training, cultural-alignment, data-sovereignty, digital-sovereignty

  3. 3. huggingface/pytorch-image-models

    The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

    GitHub repository with 36,846 stars and 5,162 forks.

    Trending score: 0.86; stars gained: +3; forks gained: +0.

    Language: Python

    Topics: pytorch, resnet, pretrained-models, pretrained-weights, distributed-training, mobile-deep-learning

  4. 4. Mattral/Composed-Mixture-of-Experts-Engine

    moe-engine is a research-grade infrastructure layer for training large Mixture-of-Experts language models at hyperscale. It is designed around one core constraint: at 10K+ GPUs, nodes die continuously. The system must keep training alive end-to-end — routing correctly, checkpointing durably, and resuming without operator intervention.

    GitHub repository with 10 stars and 8 forks.

    Trending score: 0.61; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: distributed-training, fault-tolerance, llm-training, machine-learning, mixture-of-experts, moe

  5. 5. AMD-AGI/Primus-Turbo

    A high-performance acceleration library dedicated to large-scale model training on AMD GPUs

    GitHub repository with 64 stars and 21 forks.

    Trending score: 0.20; stars gained: +0; forks gained: -1.

    Language: Python

    Topics: amd-gpu, distributed-training, training, training-at-scale

Trending in Python

  1. 1. harry0703/MoneyPrinterTurbo

    利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

    GitHub repository with 88,031 stars and 12,625 forks.

    Trending score: 6.02; stars gained: +1,097; forks gained: +218.

    Language: Python

    Topics: ai, automation, chatgpt, moviepy, python, shortvideo

  2. 2. pewdiepie-archdaemon/odysseus

    Self-hosted AI workspace.

    GitHub repository with 71,427 stars and 9,106 forks.

    Trending score: 5.98; stars gained: +834; forks gained: +140.

    Language: Python

  3. 3. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 194,093 stars and 33,985 forks.

    Trending score: 5.92; stars gained: +753; forks gained: +209.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  4. 4. NVIDIA/SkillSpector

    Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

    GitHub repository with 5,962 stars and 441 forks.

    Trending score: 5.61; stars gained: +874; forks gained: +76.

    Language: Python

  5. 5. rohitg00/ai-engineering-from-scratch

    Learn it. Build it. Ship it for others.

    GitHub repository with 32,676 stars and 5,366 forks.

    Trending score: 5.59; stars gained: +762; forks gained: +135.

    Language: Python

    Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

  6. 6. Agents365-ai/drawio-skill

    Generate draw.io diagrams from natural language — 6 presets, vision self-check + up to 5-round refinement, codebase-to-diagram, 10,000+ official shapes & 321 AI/LLM brand logos. Exports PNG/SVG/PDF/JPG.

    GitHub repository with 3,445 stars and 240 forks.

    Trending score: 5.51; stars gained: +1,369; forks gained: +113.

    Language: Python

    Topics: agent-skill, agent-skills, architecture-diagram, claude-code, claude-code-skill, claude-skills

Trending topic: distributed-training

  1. 1. skypilot-org/skypilot

    Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

    GitHub repository with 10,158 stars and 1,099 forks.

    Trending score: 3.24; stars gained: +63; forks gained: +6.

    Language: Python

    Topics: cloud-computing, cloud-management, cost-optimization, deep-learning, distributed-training, gpu

  2. 2. The-AI-Alliance/tapestry

    Project Tapestry aims to give every nation and participant frontier AI they can call their own — uniting a global consortium to train a shared frontier model from which partners build and own sovereign models aligned to their national, socio-cultural, and industrial needs.

    GitHub repository with 108 stars and 11 forks.

    Trending score: 2.51; stars gained: +14; forks gained: +3.

    Language: Python

    Topics: ai-alliance, ai-security, consortium-training, cultural-alignment, data-sovereignty, digital-sovereignty

  3. 3. huggingface/pytorch-image-models

    The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

    GitHub repository with 36,846 stars and 5,162 forks.

    Trending score: 0.86; stars gained: +3; forks gained: +0.

    Language: Python

    Topics: pytorch, resnet, pretrained-models, pretrained-weights, distributed-training, mobile-deep-learning

  4. 4. Mattral/Composed-Mixture-of-Experts-Engine

    moe-engine is a research-grade infrastructure layer for training large Mixture-of-Experts language models at hyperscale. It is designed around one core constraint: at 10K+ GPUs, nodes die continuously. The system must keep training alive end-to-end — routing correctly, checkpointing durably, and resuming without operator intervention.

    GitHub repository with 10 stars and 8 forks.

    Trending score: 0.61; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: distributed-training, fault-tolerance, llm-training, machine-learning, mixture-of-experts, moe

  5. 5. AMD-AGI/Primus-Turbo

    A high-performance acceleration library dedicated to large-scale model training on AMD GPUs

    GitHub repository with 64 stars and 21 forks.

    Trending score: 0.20; stars gained: +0; forks gained: -1.

    Language: Python

    Topics: amd-gpu, distributed-training, training, training-at-scale

  6. 6. AMD-AGI/Primus-SaFE

    Primus-SaFE(Stability and Fault Endurance)

    GitHub repository with 56 stars and 2 forks.

    Trending score: 0.19; stars gained: +0; forks gained: +0.

    Language: Go

    Topics: training, distributed-training, training-at-scale, training-observability, training-stability