SalesforceAIResearch/SCUBA

SCUBA: Salesforce Computer Use Benchmark

GitHub repository with 9 stars and 1 forks.

Language: Python

Topics: benchmark, browser-use-agent, computer-use-agent, crm

Open provider repository

Latest metric snapshot

2026-06-04: 9 stars and 1 forks.

Similar repositories

1. VibeBench/VibeSearchBench

🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.

GitHub repository with 774 stars and 2 forks.

Trending score: 1.88; stars gained: +100; forks gained: +0.

Language: Python

Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
2. StanfordVL/BEHAVIOR-1K

BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFEx

GitHub repository with 1,502 stars and 205 forks.

Trending score: 1.17; stars gained: +14; forks gained: +0.

Language: Python

Topics: benchmark, embodied-ai, robotics, simulation
3. rollinsio/beyond-test-coverage

Benchmark for the quality of LLM-generated test suites — anti-fragility, rigor, mocking discipline, reuse — scored against human baselines, not coverage. Python, JS/TS, Go.

GitHub repository with 18 stars and 1 forks.

Trending score: 1.04; stars gained: +9; forks gained: +0.

Language: Python

Topics: benchmark, claude, code-quality, llm, mocha, pytest
4. sierra-research/tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

GitHub repository with 1,273 stars and 328 forks.

Trending score: 0.92; stars gained: +7; forks gained: +1.

Language: Python

Topics: benchmark, llm, ai, language-model-agent, conversational-agents
5. embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

GitHub repository with 3,288 stars and 617 forks.

Trending score: 0.69; stars gained: +4; forks gained: +1.

Language: Python

Topics: benchmark, clustering, information-retrieval, sentence-transformers, sts, text-embedding
6. embeddings-benchmark/results

Data for the MTEB leaderboard

GitHub repository with 58 stars and 159 forks.

Trending score: 0.57; stars gained: +2; forks gained: +0.

Language: Python

Topics: benchmark, benchmarkresults, clustering, information-retrieval, retrieval, semantic-search

Trending in Python

1. NousResearch/hermes-agent

The agent that grows with you

GitHub repository with 180,881 stars and 31,021 forks.

Trending score: 5.79; stars gained: +1,360; forks gained: +322.

Language: Python

Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
2. microsoft/SkillOpt

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

GitHub repository with 4,892 stars and 487 forks.

Trending score: 4.55; stars gained: +340; forks gained: +27.

Language: Python

Topics: agent-skills, self-evolving-agents
3. mukul975/Anthropic-Cybersecurity-Skills

754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0

GitHub repository with 13,233 stars and 1,551 forks.

Trending score: 4.53; stars gained: +301; forks gained: +38.

Language: Python

Topics: ai-agents, claude-code, cybersecurity, incident-response, mitre-attack, penetration-testing
4. virgiliojr94/book-to-skill

Turn any technical book PDF into a Claude Code skill — ready to study, reference, and use while you work.

GitHub repository with 4,166 stars and 523 forks.

Trending score: 4.43; stars gained: +415; forks gained: +37.

Language: Python
5. anthropics/claude-code

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

GitHub repository with 130,154 stars and 21,149 forks.

Trending score: 4.42; stars gained: +277; forks gained: +38.

Language: Python
6. CloakHQ/CloakBrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

GitHub repository with 23,119 stars and 1,836 forks.

Trending score: 4.24; stars gained: +250; forks gained: +17.

Language: Python

Topics: anti-detect, bot-detection, browser-automation, chromium, cloudflare, fingerprint

SalesforceAIResearch/SCUBA

Latest metric snapshot

Similar repositories

1. VibeBench/VibeSearchBench

2. StanfordVL/BEHAVIOR-1K

3. rollinsio/beyond-test-coverage

4. sierra-research/tau2-bench

5. embeddings-benchmark/mteb

6. embeddings-benchmark/results

Trending in Python

1. NousResearch/hermes-agent

2. microsoft/SkillOpt

3. mukul975/Anthropic-Cybersecurity-Skills

4. virgiliojr94/book-to-skill

5. anthropics/claude-code

6. CloakHQ/CloakBrowser

Trending topic: benchmark

1. Purewhiter/mobilegym

2. VibeBench/VibeSearchBench

3. Ammaar-Alam/minebench

4. StanfordVL/BEHAVIOR-1K

5. rollinsio/beyond-test-coverage

6. sierra-research/tau2-bench