winstonsmith1897/DantinoX
DantinoX: A modular, memory-efficient Transformer implementation in JAX/Flax NNX. Includes Sparse MoE, GQA, Sliding Window Attention, Gradient Accumulation and Checkpointing
GitHub repository with 5 stars and 1 forks.
Language: Python
Topics: attention-mechanism, mixture-of-experts, transformer-architecture, flax, jax, fine, llm, pre-training