screenpipe/screenleak
Multi-modal benchmark for measuring sensitive-information disclosure in computer-use agents
GitHub repository with 5 stars and 0 forks.
Language: Python
Topics: benchmark, computer-use, computer-use-agent, eval, evaluation
Multi-modal benchmark for measuring sensitive-information disclosure in computer-use agents
GitHub repository with 5 stars and 0 forks.
Language: Python
Topics: benchmark, computer-use, computer-use-agent, eval, evaluation
Trending score 0.04, freshness score 0.05, stars gained +0, forks gained +0.
2026-06-02: 5 stars and 0 forks.
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.
GitHub repository with 1,008 stars and 63 forks.
Trending score: 3.33; stars gained: +50; forks gained: +37.
Language: Python
Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
Complete guide to running large language models locally on AMD Strix Halo / Ryzen AI MAX+ 395 with Radeon 8060S (gfx1151) and 96GB/128GB unified memory. Covers BIOS config, Ubuntu/kernel setup, Ollama, llama.cpp Vulkan/RADV, ROCm/HIP, vLLM, and 70B/120B GGUF evidence.
GitHub repository with 143 stars and 6 forks.
Trending score: 1.97; stars gained: +9; forks gained: +0.
Language: Python
Topics: amd, benchmark, gfx1151, llama-cpp, llm, local-llm
SkillLearnBench is the first benchmark for evaluating continual learning methods that automatically generate agent skills.
GitHub repository with 47 stars and 3 forks.
Trending score: 1.26; stars gained: +12; forks gained: +0.
Language: Python
Topics: agent-skills, automatic, benchmark, continual-learning, skill-generation
BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFEx
GitHub repository with 1,518 stars and 205 forks.
Trending score: 1.26; stars gained: +3; forks gained: +0.
Language: Python
Topics: robotics, simulation, benchmark, embodied-ai
Scalable annotation pipeline for action-aglined fine-grained instruciton for Visual-language-Action model
GitHub repository with 19 stars and 0 forks.
Trending score: 1.24; stars gained: +10; forks gained: +0.
Language: Python
Topics: benchmark, caption, caption-generation, fine-grained, roboitcs, vision-language-action-model
MTEB: Massive Text Embedding Benchmark
GitHub repository with 3,303 stars and 625 forks.
Trending score: 1.14; stars gained: +1; forks gained: +2.
Language: Python
Topics: benchmark, bitext-mining, clustering, information-retrieval, low-resource-nlp, mteb
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
GitHub repository with 88,031 stars and 12,625 forks.
Trending score: 6.02; stars gained: +1,097; forks gained: +218.
Language: Python
Topics: ai, automation, chatgpt, moviepy, python, shortvideo
Self-hosted AI workspace.
GitHub repository with 71,422 stars and 9,105 forks.
Trending score: 5.98; stars gained: +834; forks gained: +140.
Language: Python
The agent that grows with you
GitHub repository with 194,091 stars and 33,984 forks.
Trending score: 5.92; stars gained: +753; forks gained: +209.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.
GitHub repository with 5,962 stars and 441 forks.
Trending score: 5.61; stars gained: +874; forks gained: +76.
Language: Python
Learn it. Build it. Ship it for others.
GitHub repository with 32,676 stars and 5,366 forks.
Trending score: 5.59; stars gained: +762; forks gained: +135.
Language: Python
Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course
Generate draw.io diagrams from natural language — 6 presets, vision self-check + up to 5-round refinement, codebase-to-diagram, 10,000+ official shapes & 321 AI/LLM brand logos. Exports PNG/SVG/PDF/JPG.
GitHub repository with 3,445 stars and 240 forks.
Trending score: 5.51; stars gained: +1,369; forks gained: +113.
Language: Python
Topics: agent-skill, agent-skills, architecture-diagram, claude-code, claude-code-skill, claude-skills
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.
GitHub repository with 1,008 stars and 63 forks.
Trending score: 3.33; stars gained: +50; forks gained: +37.
Language: Python
Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training
GitHub repository with 621 stars and 98 forks.
Trending score: 2.50; stars gained: +12; forks gained: +1.
Language: TypeScript
Topics: benchmark, mobile-agent, reinforcement-learning, vlm, agents, gym
Complete guide to running large language models locally on AMD Strix Halo / Ryzen AI MAX+ 395 with Radeon 8060S (gfx1151) and 96GB/128GB unified memory. Covers BIOS config, Ubuntu/kernel setup, Ollama, llama.cpp Vulkan/RADV, ROCm/HIP, vLLM, and 70B/120B GGUF evidence.
GitHub repository with 143 stars and 6 forks.
Trending score: 1.97; stars gained: +9; forks gained: +0.
Language: Python
Topics: amd, benchmark, gfx1151, llama-cpp, llm, local-llm
Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3
GitHub repository with 1,099 stars and 195 forks.
Trending score: 1.96; stars gained: +4; forks gained: +1.
Language: Shell
Topics: ai, benchmark, llm, pytorch, sglang, vllm
Open survey and evidence map for AI agent evolution, self-evolving agents, memory, skills, harnesses, benchmarks, and agent-swarm systems.
GitHub repository with 228 stars and 10 forks.
Trending score: 1.27; stars gained: +1; forks gained: +1.
Language: JavaScript
Topics: agent-evolution, agent-framework, agent-swarm, ai-agent, ai-agents, ai-research
SkillLearnBench is the first benchmark for evaluating continual learning methods that automatically generate agent skills.
GitHub repository with 47 stars and 3 forks.
Trending score: 1.26; stars gained: +12; forks gained: +0.
Language: Python
Topics: agent-skills, automatic, benchmark, continual-learning, skill-generation