artalis-io/bitnet.c
Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.
GitHub repository with 20 stars and 7 forks.
Language: C
Topics: avx2, c, cpu-inference, gguf, inference, kv-cache, llm, moe, neon, quantization