zengxiao-he/tessera
From teacher to tiles — a from-scratch LLM distillation & serving engine: custom Triton/CUDA kernels, FSDP distillation, paged-KV continuous batching, speculative decoding, a Rust gateway, a JAX oracle, and interpretability tooling.
GitHub repository with 181 stars and 1 forks.
Language: Python
Topics: cuda, flash-attention, fsdp, inference-engine, jax, knowledge-distillation, kv-cache, llm, mechanistic-interpretability, ml-systems