NVIDIA/TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

GitHub repository with 13,806 stars and 2,440 forks.

Language: Python

Topics: blackwell, cuda, llm-serving, moe, pytorch

Open provider repository

24h trend summary

Trending score 1.18, activity score 0.05, stars gained +16, forks gained +7.

Latest metric snapshot

2026-06-05: 13,806 stars and 2,440 forks.

Similar repositories

  1. 1. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 81,964 stars and 17,668 forks.

    Trending score: 3.75; stars gained: +79; forks gained: +46.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  2. 2. lightseekorg/tokenspeed

    TokenSpeed is a speed-of-light LLM inference engine.

    GitHub repository with 1,366 stars and 141 forks.

    Trending score: 1.86; stars gained: +6; forks gained: +2.

    Language: Python

    Topics: blackwell, deepseek, gpt-oss, kimi, lightseek, llm

  3. 3. sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    GitHub repository with 28,884 stars and 6,341 forks.

    Trending score: 1.72; stars gained: -55; forks gained: +18.

    Language: Python

    Topics: attention, blackwell, cuda, deepseek, diffusion, glm

  4. 4. NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

    GitHub repository with 13,806 stars and 2,440 forks.

    Trending score: 1.18; stars gained: +16; forks gained: +7.

    Language: Python

    Topics: blackwell, cuda, llm-serving, moe, pytorch

  5. 5. NVIDIA/cudnn-frontend

    cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

    GitHub repository with 840 stars and 177 forks.

    Trending score: 0.69; stars gained: +4; forks gained: +2.

    Language: Python

    Topics: attention, blackwell, cuda, cuda-kernels, cuda-toolkit, deep-learning

  6. 6. jpietek/PenguinBurner

    Nvidia ultimate undervolting companion on Linux. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves. Supports MSI Afterburner profile imports and LACT profile exports.

    GitHub repository with 53 stars and 2 forks.

    Trending score: 0.13; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: blackwell, fan-curve, gaming, gpu, linux, nvidia

Trending in Python

  1. 1. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 181,334 stars and 31,114 forks.

    Trending score: 5.95; stars gained: +1,867; forks gained: +361.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  2. 2. chopratejas/headroom

    Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

    GitHub repository with 12,942 stars and 833 forks.

    Trending score: 5.69; stars gained: +2,829; forks gained: +175.

    Language: Python

    Topics: agent, ai, anthropic, claude-code, compression, context-engineering

  3. 3. Imbad0202/academic-research-skills

    Academic Research Skills for Claude Code: research → write → review → revise → finalize

    GitHub repository with 27,327 stars and 2,249 forks.

    Trending score: 5.52; stars gained: +1,079; forks gained: +89.

    Language: Python

    Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review

  4. 4. anthropics/financial-services

    GitHub repository with 29,986 stars and 4,219 forks.

    Trending score: 4.88; stars gained: +688; forks gained: +114.

    Language: Python

  5. 5. virgiliojr94/book-to-skill

    Turn any technical book PDF into a Claude Code skill — ready to study, reference, and use while you work.

    GitHub repository with 4,221 stars and 528 forks.

    Trending score: 4.88; stars gained: +476; forks gained: +68.

    Language: Python

  6. 6. vinta/awesome-python

    An opinionated list of Python frameworks, libraries, tools, and resources

    GitHub repository with 301,341 stars and 28,044 forks.

    Trending score: 4.60; stars gained: +518; forks gained: +24.

    Language: Python

    Topics: awesome, python, collections, python-frameworks, python-libraries, python-tools

Trending topic: blackwell

  1. 1. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 81,964 stars and 17,668 forks.

    Trending score: 3.75; stars gained: +79; forks gained: +46.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  2. 2. lightseekorg/tokenspeed

    TokenSpeed is a speed-of-light LLM inference engine.

    GitHub repository with 1,366 stars and 141 forks.

    Trending score: 1.86; stars gained: +6; forks gained: +2.

    Language: Python

    Topics: blackwell, deepseek, gpt-oss, kimi, lightseek, llm

  3. 3. sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    GitHub repository with 28,884 stars and 6,341 forks.

    Trending score: 1.72; stars gained: -55; forks gained: +18.

    Language: Python

    Topics: attention, blackwell, cuda, deepseek, diffusion, glm

  4. 4. NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

    GitHub repository with 13,806 stars and 2,440 forks.

    Trending score: 1.18; stars gained: +16; forks gained: +7.

    Language: Python

    Topics: blackwell, cuda, llm-serving, moe, pytorch

  5. 5. NVIDIA/cudnn-frontend

    cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

    GitHub repository with 840 stars and 177 forks.

    Trending score: 0.69; stars gained: +4; forks gained: +2.

    Language: Python

    Topics: attention, blackwell, cuda, cuda-kernels, cuda-toolkit, deep-learning

  6. 6. jpietek/PenguinBurner

    Nvidia ultimate undervolting companion on Linux. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves. Supports MSI Afterburner profile imports and LACT profile exports.

    GitHub repository with 53 stars and 2 forks.

    Trending score: 0.13; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: blackwell, fan-curve, gaming, gpu, linux, nvidia