Indras-Mirror/llama.cpp-turboq-mtp

Fused TBQ4 Flash Attention + MTP + Shared Tensors for llama.cpp — 82+ tok/s with lossless 4.25 bpv KV cache at 200K context on RTX 4090

GitHub repository with 78 stars and 5 forks.

Language: C++

Topics: cuda, flash-attention, fwht, kv-cache, llama-cpp, mtp, multi-token-prediction, quantization, qwen, rtx-4090

Open provider repository

24h trend summary

Trending score 0.08, activity score 0.00, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-04: 78 stars and 5 forks.

Similar repositories

  1. 1. tenstorrent/tt-metal

    :metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

    GitHub repository with 1,493 stars and 480 forks.

    Trending score: 1.66; stars gained: +5; forks gained: +3.

    Language: C++

    Topics: accelerator, ai, cuda, deepseek, gpu, img-gen

  2. 2. SunOner/sunone_aimbot_2

    Aim-bot based on AI for all FPS and TPS games

    GitHub repository with 514 stars and 105 forks.

    Trending score: 1.48; stars gained: +4; forks gained: +1.

    Language: C++

    Topics: ai, ai-aimbot, aimbot, arduino, cpp, cs2

  3. 3. Luce-Org/lucebox-hub

    Fast LLM speculative inference server for consumer hardware.

    GitHub repository with 2,329 stars and 217 forks.

    Trending score: 1.08; stars gained: +9; forks gained: +1.

    Language: C++

    Topics: cuda, cuda-kernels, dflash, kernel, llama-cpp, local-ai

  4. 4. shader-slang/slang

    Making it easier to work with shaders

    GitHub repository with 5,348 stars and 451 forks.

    Trending score: 0.76; stars gained: +5; forks gained: +1.

    Language: C++

    Topics: cuda, d3d12, glsl, hlsl, shaders, vulkan

  5. 5. MrNeRF/LichtFeld-Studio

    Train, inspect, edit, automate, and export 3D Gaussian Splatting scenes from a single native application.

    GitHub repository with 3,161 stars and 344 forks.

    Trending score: 0.76; stars gained: +5; forks gained: +0.

    Language: C++

    Topics: computer-graphics, computer-vision, cuda, gaussian-splatting, optimization

  6. 6. iree-org/iree

    A retargetable MLIR-based machine learning compiler and runtime toolkit.

    GitHub repository with 3,787 stars and 918 forks.

    Trending score: 0.74; stars gained: +3; forks gained: +0.

    Language: C++

    Topics: mlir, vulkan, tensorflow, spirv, cuda, jax

Trending in C++

  1. 1. ggml-org/llama.cpp

    LLM inference in C/C++

    GitHub repository with 114,606 stars and 19,170 forks.

    Trending score: 4.00; stars gained: +164; forks gained: +36.

    Language: C++

    Topics: ggml

  2. 2. duckdb/duckdb

    DuckDB is an analytical in-process SQL database management system

    GitHub repository with 38,604 stars and 3,293 forks.

    Trending score: 2.81; stars gained: +24; forks gained: +2.

    Language: C++

    Topics: analytics, database, embedded-database, olap, sql

  3. 4. ClickHouse/ClickHouse

    ClickHouse® is a real-time analytics database management system

    GitHub repository with 47,812 stars and 8,466 forks.

    Trending score: 2.61; stars gained: +24; forks gained: +4.

    Language: C++

    Topics: ai, analytics, big-data, clickhouse, cloud-native, cpp

  4. 5. tensorflow/tensorflow

    An Open Source Machine Learning Framework for Everyone

    GitHub repository with 195,410 stars and 75,345 forks.

    Trending score: 2.39; stars gained: +31; forks gained: +1.

    Language: C++

    Topics: deep-learning, deep-neural-networks, distributed, machine-learning, ml, neural-network

  5. 6. godotengine/godot

    Godot Engine – Multi-platform 2D and 3D game engine

    GitHub repository with 112,059 stars and 25,554 forks.

    Trending score: 2.37; stars gained: +332; forks gained: +24.

    Language: C++

    Topics: game-engine, godot, godotengine, open-source, multi-platform, gamedev

Trending topic: cuda

  1. 1. sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    GitHub repository with 28,872 stars and 6,332 forks.

    Trending score: 2.89; stars gained: +14; forks gained: +31.

    Language: Python

    Topics: attention, blackwell, cuda, deepseek, diffusion, glm

  2. 2. c0deJedi/nbd-vram

    Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work

    GitHub repository with 385 stars and 10 forks.

    Trending score: 2.51; stars gained: +220; forks gained: +5.

    Language: Shell

    Topics: cuda, gpu, laptop, linux, memory, nbd

  3. 3. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 81,936 stars and 17,651 forks.

    Trending score: 2.08; stars gained: +131; forks gained: +40.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  4. 4. Avarok-Cybersecurity/atlas

    Pure Rust Inference Engine

    GitHub repository with 472 stars and 65 forks.

    Trending score: 1.99; stars gained: +10; forks gained: +1.

    Language: Rust

    Topics: cuda, dgx, dgx-spark, gb10, llm-inference, mamba

  5. 5. gpustack/gpustack

    A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

    GitHub repository with 5,100 stars and 541 forks.

    Trending score: 1.85; stars gained: +10; forks gained: +2.

    Language: Python

    Topics: ascend, cuda, deepseek, distributed-inference, genai, inference

  6. 6. tenstorrent/tt-metal

    :metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

    GitHub repository with 1,493 stars and 480 forks.

    Trending score: 1.66; stars gained: +5; forks gained: +3.

    Language: C++

    Topics: accelerator, ai, cuda, deepseek, gpu, img-gen