novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

GitHub repository with 130 stars and 19 forks.

Language: Rust

Topics: inference, kv-cache, llm, vllm

Open provider repository

Latest metric snapshot

2026-06-05: 130 stars and 19 forks.

Similar repositories

  1. 1. timtoole02/Camelid

    Camelid: a Rust-native local inference backend with evidence-gated model compatibility.

    GitHub repository with 49 stars and 10 forks.

    Trending score: 1.25; stars gained: +17; forks gained: +2.

    Language: Rust

    Topics: apple-silicon, gguf, inference, llama, llm, local-first

  2. 2. Venkat2811/wombatkv

    Object-storage-native KV cache for LLM inference & RL. Cross-restart, cross-conversation, cross-engine via shared S3 bucket.

    GitHub repository with 12 stars and 1 forks.

    Trending score: 0.33; stars gained: +1; forks gained: +0.

    Language: Rust

    Topics: amd, caching, ds4, inference, kv-cache, llm

  3. 3. inferx-net/inferx

    InferX: Inference as a Service Platform

    GitHub repository with 217 stars and 25 forks.

    Trending score: 0.11; stars gained: +0; forks gained: +0.

    Language: Rust

    Topics: faas, faas-platform, inference, serverless

Trending in Rust

  1. 1. BigPizzaV3/CodexPlusPlus

    An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具,努力让Codex变得更好用更舒服

    GitHub repository with 13,893 stars and 863 forks.

    Trending score: 5.16; stars gained: +916; forks gained: +44.

    Language: Rust

  2. 2. rtk-ai/rtk

    CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies

    GitHub repository with 59,069 stars and 3,636 forks.

    Trending score: 4.96; stars gained: +654; forks gained: +44.

    Language: Rust

    Topics: agentic-coding, ai-coding, anthropic, claude-code, cli, command-line-tool

  3. 3. openai/codex

    Lightweight coding agent that runs in your terminal

    GitHub repository with 88,876 stars and 13,059 forks.

    Trending score: 4.58; stars gained: +326; forks gained: +48.

    Language: Rust

  4. 4. tinyhumansai/openhuman

    Your Personal AI super intelligence. Private, Simple and extremely powerful.

    GitHub repository with 30,862 stars and 2,980 forks.

    Trending score: 4.37; stars gained: +332; forks gained: +50.

    Language: Rust

  5. 5. fallow-rs/fallow

    Codebase intelligence for TypeScript and JavaScript. Free static layer: unused code, duplication, circular deps, complexity hotspots, architecture boundaries. Optional paid runtime layer: hot-path review and cold-path deletion evidence from real production traffic. Rust-native, sub-second, zero-config framework support.

    GitHub repository with 3,092 stars and 95 forks.

    Trending score: 4.05; stars gained: +346; forks gained: +16.

    Language: Rust

    Topics: cli, code-duplication, code-quality, codebase-intelligence, copy-paste-detection, dead-code

  6. 6. aaif-goose/goose

    an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM

    GitHub repository with 46,604 stars and 4,868 forks.

    Trending score: 3.80; stars gained: +152; forks gained: +28.

    Language: Rust

Trending topic: inference

  1. 1. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 81,995 stars and 17,676 forks.

    Trending score: 3.75; stars gained: +79; forks gained: +46.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  2. 2. vllm-project/vllm-ascend

    Community maintained hardware plugin for vLLM on Ascend

    GitHub repository with 2,201 stars and 1,350 forks.

    Trending score: 3.25; stars gained: +16; forks gained: +22.

    Language: C++

    Topics: ascend, inference, llm, llm-serving, llmops, mlops

  3. 3. gpustack/gpustack

    A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

    GitHub repository with 5,107 stars and 542 forks.

    Trending score: 2.51; stars gained: +11; forks gained: +1.

    Language: Python

    Topics: ascend, cuda, deepseek, distributed-inference, genai, high-performance-inference

  4. 4. LMCache/LMCache

    LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

    GitHub repository with 8,422 stars and 1,246 forks.

    Trending score: 2.17; stars gained: +11; forks gained: +6.

    Language: Python

    Topics: amd, cuda, fast, inference, kv-cache, llm

  5. 5. Andyyyy64/whichllm

    Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

    GitHub repository with 2,773 stars and 159 forks.

    Trending score: 1.88; stars gained: +95; forks gained: +7.

    Language: Python

    Topics: ai, cli, llm, local-llm, command-line-tool, gguf

  6. 6. ggml-org/whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    GitHub repository with 50,474 stars and 5,619 forks.

    Trending score: 1.87; stars gained: +69; forks gained: +8.

    Language: C++

    Topics: openai, speech-to-text, transformer, whisper, inference, speech-recognition