smouj/kimari-local-ai

๐Ÿฆ Local AI for Consumer GPUs โ€” Run powerful LLMs on GTX 1060/1080. No cloud. No subscriptions. Built on llama.cpp + CUDA.

GitHub repository with 7 stars and 1 forks.

Language: Python

Topics: cuda, gguf, llama-cpp, llm-inference, open-webui, openai-compatible-api, openclaw, cli, consumer-gpu, gtx-1060

Open provider repository

Latest metric snapshot

2026-06-09: 7 stars and 1 forks.

Similar repositories

  1. 1. LMCache/LMCache

    LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

    GitHub repository with 9,093 stars and 1,322 forks.

    Trending score: 4.67; stars gained: +411; forks gained: +26.

    Language: Python

    Topics: amd, cuda, fast, inference, kv-cache, llm

  2. 2. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 82,931 stars and 18,084 forks.

    Trending score: 4.18; stars gained: +80; forks gained: +23.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  3. 3. sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    GitHub repository with 29,041 stars and 6,540 forks.

    Trending score: 3.37; stars gained: +33; forks gained: +15.

    Language: Python

    Topics: attention, blackwell, cuda, deepseek, diffusion, glm

  4. 4. NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

    GitHub repository with 13,881 stars and 2,465 forks.

    Trending score: 2.24; stars gained: +7; forks gained: +2.

    Language: Python

    Topics: blackwell, cuda, llm-serving, moe, pytorch

  5. 5. roflcoopter/viseron

    Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

    GitHub repository with 3,231 stars and 396 forks.

    Trending score: 2.21; stars gained: +22; forks gained: +0.

    Language: Python

    Topics: nvr, network-video-capture, network-video-recorder, tensorflow, darknet, yolo

  6. 6. zengxiao-he/tessera

    From teacher to tiles โ€” a from-scratch LLM distillation & serving engine: custom Triton/CUDA kernels, FSDP distillation, paged-KV continuous batching, speculative decoding, a Rust gateway, a JAX oracle, and interpretability tooling.

    GitHub repository with 181 stars and 1 forks.

    Trending score: 2.19; stars gained: +11; forks gained: +0.

    Language: Python

    Topics: cuda, flash-attention, fsdp, inference-engine, jax, knowledge-distillation

Trending in Python

  1. 1. harry0703/MoneyPrinterTurbo

    ๅˆฉ็”จAIๅคงๆจกๅž‹๏ผŒไธ€้”ฎ็”Ÿๆˆ้ซ˜ๆธ…็Ÿญ่ง†้ข‘ Generate short videos with one click using AI LLM.

    GitHub repository with 88,031 stars and 12,625 forks.

    Trending score: 6.02; stars gained: +1,097; forks gained: +218.

    Language: Python

    Topics: ai, automation, chatgpt, moviepy, python, shortvideo

  2. 2. pewdiepie-archdaemon/odysseus

    Self-hosted AI workspace.

    GitHub repository with 71,462 stars and 9,113 forks.

    Trending score: 5.98; stars gained: +834; forks gained: +140.

    Language: Python

  3. 3. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 194,128 stars and 33,994 forks.

    Trending score: 5.92; stars gained: +753; forks gained: +209.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  4. 4. NVIDIA/SkillSpector

    Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

    GitHub repository with 5,962 stars and 441 forks.

    Trending score: 5.61; stars gained: +874; forks gained: +76.

    Language: Python

  5. 5. rohitg00/ai-engineering-from-scratch

    Learn it. Build it. Ship it for others.

    GitHub repository with 32,676 stars and 5,366 forks.

    Trending score: 5.59; stars gained: +762; forks gained: +135.

    Language: Python

    Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

  6. 6. Agents365-ai/drawio-skill

    Generate draw.io diagrams from natural language โ€” 6 presets, vision self-check + up to 5-round refinement, codebase-to-diagram, 10,000+ official shapes & 321 AI/LLM brand logos. Exports PNG/SVG/PDF/JPG.

    GitHub repository with 3,445 stars and 240 forks.

    Trending score: 5.51; stars gained: +1,369; forks gained: +113.

    Language: Python

    Topics: agent-skill, agent-skills, architecture-diagram, claude-code, claude-code-skill, claude-skills

Trending topic: cuda

  1. 1. LMCache/LMCache

    LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

    GitHub repository with 9,093 stars and 1,322 forks.

    Trending score: 4.67; stars gained: +411; forks gained: +26.

    Language: Python

    Topics: amd, cuda, fast, inference, kv-cache, llm

  2. 2. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 82,931 stars and 18,084 forks.

    Trending score: 4.18; stars gained: +80; forks gained: +23.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  3. 3. sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    GitHub repository with 29,041 stars and 6,540 forks.

    Trending score: 3.37; stars gained: +33; forks gained: +15.

    Language: Python

    Topics: attention, blackwell, cuda, deepseek, diffusion, glm

  4. 4. Luce-Org/lucebox-hub

    Fast LLM speculative inference server for consumer hardware.

    GitHub repository with 2,493 stars and 229 forks.

    Trending score: 2.88; stars gained: +27; forks gained: +6.

    Language: C++

    Topics: cuda, cuda-kernels, dflash, kernel, llama-cpp, local-ai

  5. 5. NVlabs/cuda-oxide

    cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX โ€” no DSLs, no foreign language bindings, just Rust.

    GitHub repository with 2,756 stars and 182 forks.

    Trending score: 2.83; stars gained: +25; forks gained: +4.

    Language: Rust

    Topics: async, compiler-backend, cuda, gpu, heterogeneous-computing, high-performance-computing

  6. 6. NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

    GitHub repository with 13,881 stars and 2,465 forks.

    Trending score: 2.24; stars gained: +7; forks gained: +2.

    Language: Python

    Topics: blackwell, cuda, llm-serving, moe, pytorch