ahammadmejbah/Awesome-Datasets-Hub

A curated collection of datasets for Large Language Models (LLMs), covering medical AI, NLP, multimodal learning, instruction tuning, reasoning, code generation, and evaluation benchmarks.

GitHub repository with 138 stars and 39 forks.

Topics: benchmark, benchmarking, deep-learning, deep-neural-networks, deeplearning, genetic-algorithm, llm, llm-evaluation, llm-inference, machine-learning

Open provider repository

24h trend summary

Trending score 0.76, freshness score 0.00, stars gained +2, forks gained +0.

Latest metric snapshot

2026-06-15: 138 stars and 39 forks.

Similar repositories

  1. 1. VibeBench/VibeSearchBench

    🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.

    GitHub repository with 1,008 stars and 63 forks.

    Trending score: 3.33; stars gained: +50; forks gained: +37.

    Language: Python

    Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent

  2. 2. Purewhiter/mobilegym

    MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training

    GitHub repository with 621 stars and 98 forks.

    Trending score: 2.50; stars gained: +12; forks gained: +1.

    Language: TypeScript

    Topics: benchmark, mobile-agent, reinforcement-learning, vlm, agents, gym

  3. 3. hogeheer499-commits/strix-halo-guide

    Complete guide to running large language models locally on AMD Strix Halo / Ryzen AI MAX+ 395 with Radeon 8060S (gfx1151) and 96GB/128GB unified memory. Covers BIOS config, Ubuntu/kernel setup, Ollama, llama.cpp Vulkan/RADV, ROCm/HIP, vLLM, and 70B/120B GGUF evidence.

    GitHub repository with 143 stars and 6 forks.

    Trending score: 1.97; stars gained: +9; forks gained: +0.

    Language: Python

    Topics: amd, benchmark, gfx1151, llama-cpp, llm, local-llm

  4. 4. SemiAnalysisAI/InferenceX

    Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

    GitHub repository with 1,099 stars and 195 forks.

    Trending score: 1.96; stars gained: +4; forks gained: +1.

    Language: Shell

    Topics: ai, benchmark, llm, pytorch, sglang, vllm

  5. 5. Shiyao-Huang/awesome-agent-evolution

    Open survey and evidence map for AI agent evolution, self-evolving agents, memory, skills, harnesses, benchmarks, and agent-swarm systems.

    GitHub repository with 228 stars and 10 forks.

    Trending score: 1.27; stars gained: +1; forks gained: +1.

    Language: JavaScript

    Topics: agent-evolution, agent-framework, agent-swarm, ai-agent, ai-agents, ai-research

  6. 6. cxcscmu/SkillLearnBench

    SkillLearnBench is the first benchmark for evaluating continual learning methods that automatically generate agent skills.

    GitHub repository with 47 stars and 3 forks.

    Trending score: 1.26; stars gained: +12; forks gained: +0.

    Language: Python

    Topics: agent-skills, automatic, benchmark, continual-learning, skill-generation

Trending topic: benchmark

  1. 1. VibeBench/VibeSearchBench

    🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.

    GitHub repository with 1,008 stars and 63 forks.

    Trending score: 3.33; stars gained: +50; forks gained: +37.

    Language: Python

    Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent

  2. 2. Purewhiter/mobilegym

    MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training

    GitHub repository with 621 stars and 98 forks.

    Trending score: 2.50; stars gained: +12; forks gained: +1.

    Language: TypeScript

    Topics: benchmark, mobile-agent, reinforcement-learning, vlm, agents, gym

  3. 3. hogeheer499-commits/strix-halo-guide

    Complete guide to running large language models locally on AMD Strix Halo / Ryzen AI MAX+ 395 with Radeon 8060S (gfx1151) and 96GB/128GB unified memory. Covers BIOS config, Ubuntu/kernel setup, Ollama, llama.cpp Vulkan/RADV, ROCm/HIP, vLLM, and 70B/120B GGUF evidence.

    GitHub repository with 143 stars and 6 forks.

    Trending score: 1.97; stars gained: +9; forks gained: +0.

    Language: Python

    Topics: amd, benchmark, gfx1151, llama-cpp, llm, local-llm

  4. 4. SemiAnalysisAI/InferenceX

    Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

    GitHub repository with 1,099 stars and 195 forks.

    Trending score: 1.96; stars gained: +4; forks gained: +1.

    Language: Shell

    Topics: ai, benchmark, llm, pytorch, sglang, vllm

  5. 5. Shiyao-Huang/awesome-agent-evolution

    Open survey and evidence map for AI agent evolution, self-evolving agents, memory, skills, harnesses, benchmarks, and agent-swarm systems.

    GitHub repository with 228 stars and 10 forks.

    Trending score: 1.27; stars gained: +1; forks gained: +1.

    Language: JavaScript

    Topics: agent-evolution, agent-framework, agent-swarm, ai-agent, ai-agents, ai-research

  6. 6. cxcscmu/SkillLearnBench

    SkillLearnBench is the first benchmark for evaluating continual learning methods that automatically generate agent skills.

    GitHub repository with 47 stars and 3 forks.

    Trending score: 1.26; stars gained: +12; forks gained: +0.

    Language: Python

    Topics: agent-skills, automatic, benchmark, continual-learning, skill-generation