openvinotoolkit/nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

GitHub repository with 1,169 stars and 294 forks.

Language: Python

Topics: quantization, pruning, sparsity, quantization-aware-training, mixed-precision-training, compression, semantic-segmentation, object-detection, classification, nlp

Open provider repository

Latest metric snapshot

2026-06-05: 1,169 stars and 294 forks.

Similar repositories

  1. 1. huawei-csl/KVarN

    KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

    GitHub repository with 279 stars and 10 forks.

    Trending score: 1.84; stars gained: +75; forks gained: +4.

    Language: Python

    Topics: agentic-ai, kv-cache, llm, llm-inference, long-context, quantization

  2. 2. RyanCodrai/turbovec

    A vector index built on TurboQuant, written in Rust with Python bindings

    GitHub repository with 4,086 stars and 389 forks.

    Trending score: 1.70; stars gained: +63; forks gained: +9.

    Language: Python

    Topics: ann, avx512, embeddings, faiss, nearest-neighbor, neon

  3. 3. intel/auto-round

    A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

    GitHub repository with 1,436 stars and 135 forks.

    Trending score: 1.06; stars gained: +6; forks gained: +0.

    Language: Python

    Topics: int4, quantization, rounding, transformers, vllm, mxfp4

  4. 4. Mininglamp-AI/cider

    W8A8/W4A8 inference + optimized SDPA on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, plus FlashInfer-inspired GQA decode attention for up to 1.6× SDPA speedup, built as MLX custom primitives.

    GitHub repository with 324 stars and 15 forks.

    Trending score: 0.83; stars gained: +6; forks gained: +0.

    Language: Python

    Topics: apple-silicon, metal, mlx, quantization, w4a8, w8a8

  5. 5. pytorch/ao

    PyTorch native quantization and sparsity for training and inference

    GitHub repository with 2,846 stars and 515 forks.

    Trending score: 0.65; stars gained: +3; forks gained: +2.

    Language: Python

    Topics: brrr, dtypes, inference, mx, pytorch, quantization

  6. 6. wanshuiyin/ARIS-in-AI-Offer

    Bilingual (中文+EN) ML / LLM / diffusion / agent interview cheat sheets for AI 秋招 — auto-generated by the ARIS /render-html workflow into single-file HTML, reads anywhere — plus a CV→DBLP-fact-checked academic homepage generator and long-form blogs/surveys 🌱

    GitHub repository with 167 stars and 6 forks.

    Trending score: 0.59; stars gained: +3; forks gained: +0.

    Language: Python

    Topics: ai-interview, aris, autumn-recruiting, cheatsheet, chinese, claude-code

Trending in Python

  1. 1. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 182,659 stars and 31,317 forks.

    Trending score: 5.95; stars gained: +1,867; forks gained: +361.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  2. 2. Imbad0202/academic-research-skills

    Academic Research Skills for Claude Code: research → write → review → revise → finalize

    GitHub repository with 27,643 stars and 2,276 forks.

    Trending score: 5.52; stars gained: +1,079; forks gained: +89.

    Language: Python

    Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review

  3. 3. rohitg00/ai-engineering-from-scratch

    Learn it. Build it. Ship it for others.

    GitHub repository with 28,711 stars and 4,695 forks.

    Trending score: 5.32; stars gained: +1,261; forks gained: +238.

    Language: Python

    Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

  4. 4. vinta/awesome-python

    An opinionated list of Python frameworks, libraries, tools, and resources

    GitHub repository with 301,435 stars and 28,046 forks.

    Trending score: 4.60; stars gained: +518; forks gained: +24.

    Language: Python

    Topics: awesome, collections, python, python-frameworks, python-libraries, python-tools

  5. 5. Alishahryar1/free-claude-code

    Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)

    GitHub repository with 32,540 stars and 4,942 forks.

    Trending score: 4.56; stars gained: +467; forks gained: +82.

    Language: Python

  6. 6. langchain-ai/langchain

    The agent engineering platform.

    GitHub repository with 138,587 stars and 22,961 forks.

    Trending score: 4.53; stars gained: +171; forks gained: +31.

    Language: Python

    Topics: agents, ai, ai-agents, anthropic, chatgpt, deepagents

Trending topic: quantization

  1. 1. huawei-csl/KVarN

    KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

    GitHub repository with 279 stars and 10 forks.

    Trending score: 1.84; stars gained: +75; forks gained: +4.

    Language: Python

    Topics: agentic-ai, kv-cache, llm, llm-inference, long-context, quantization

  2. 2. RyanCodrai/turbovec

    A vector index built on TurboQuant, written in Rust with Python bindings

    GitHub repository with 4,086 stars and 389 forks.

    Trending score: 1.70; stars gained: +63; forks gained: +9.

    Language: Python

    Topics: ann, avx512, embeddings, faiss, nearest-neighbor, neon

  3. 3. timtoole02/Camelid

    Camelid: a Rust-native local inference backend with evidence-gated model compatibility.

    GitHub repository with 53 stars and 10 forks.

    Trending score: 1.25; stars gained: +17; forks gained: +2.

    Language: Rust

    Topics: apple-silicon, gguf, inference, llama, llm, local-first

  4. 4. intel/auto-round

    A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

    GitHub repository with 1,436 stars and 135 forks.

    Trending score: 1.06; stars gained: +6; forks gained: +0.

    Language: Python

    Topics: int4, quantization, rounding, transformers, vllm, mxfp4

  5. 5. Mininglamp-AI/cider

    W8A8/W4A8 inference + optimized SDPA on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, plus FlashInfer-inspired GQA decode attention for up to 1.6× SDPA speedup, built as MLX custom primitives.

    GitHub repository with 324 stars and 15 forks.

    Trending score: 0.83; stars gained: +6; forks gained: +0.

    Language: Python

    Topics: apple-silicon, metal, mlx, quantization, w4a8, w8a8

  6. 6. pytorch/ao

    PyTorch native quantization and sparsity for training and inference

    GitHub repository with 2,846 stars and 515 forks.

    Trending score: 0.65; stars gained: +3; forks gained: +2.

    Language: Python

    Topics: brrr, dtypes, inference, mx, pytorch, quantization