aerosta/rewardhackwatch

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).

GitHub repository with 12 stars and 0 forks.

Language: Python

Topics: agent-safety, ai-safety, distilbert, fastapi, huggingface, llm-agents, machine-learning, misalignment, nlp, pytorch

Open provider repository

24h trend summary

Trending score 0.04, activity score 0.04, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-05: 12 stars and 0 forks.

Similar repositories

1. Hyperion-GPU/ProofFlow-v0.1

GitHub repository with 110 stars and 8 forks.

Trending score: 0.06; stars gained: +0; forks gained: +0.

Language: Python

Topics: agent-safety, ai-agents, audit, code-review, codex, developer-tools
2. aerosta/rewardhackwatch

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).

GitHub repository with 12 stars and 0 forks.

Trending score: 0.04; stars gained: +0; forks gained: +0.

Language: Python

Topics: agent-safety, ai-safety, distilbert, fastapi, huggingface, llm-agents
3. azender1/SafeAgent

Execution control layer for AI agents — prevents duplicate or incorrect real-world actions under retries, uncertainty, and stale context.

GitHub repository with 6 stars and 2 forks.

Trending score: 0.04; stars gained: +0; forks gained: +0.

Language: Python

Topics: ai-agents, idempotency, reliability, agent, agent-infrastructure, agent-safety
4. norika1207-lab/afu-brain

OpenClaw-compatible MASL safety gate with public RAG packs for memory-aware AI agents

GitHub repository with 23 stars and 4 forks.

Trending score: 0.00; stars gained: +0; forks gained: +0.

Language: Python

Topics: agent-safety, ai-agents, local-first, masl, open-source, openclaw

Trending in Python

1. NousResearch/hermes-agent

The agent that grows with you

GitHub repository with 181,960 stars and 31,220 forks.

Trending score: 5.95; stars gained: +1,867; forks gained: +361.

Language: Python

Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
2. chopratejas/headroom

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

GitHub repository with 13,768 stars and 870 forks.

Trending score: 5.69; stars gained: +2,829; forks gained: +175.

Language: Python

Topics: agent, ai, anthropic, compression, context-engineering, context-window
3. Imbad0202/academic-research-skills

Academic Research Skills for Claude Code: research → write → review → revise → finalize

GitHub repository with 27,548 stars and 2,267 forks.

Trending score: 5.52; stars gained: +1,079; forks gained: +89.

Language: Python

Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review
4. rohitg00/ai-engineering-from-scratch

Learn it. Build it. Ship it for others.

GitHub repository with 28,622 stars and 4,680 forks.

Trending score: 5.32; stars gained: +1,261; forks gained: +238.

Language: Python

Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course
5. anthropics/financial-services

GitHub repository with 30,060 stars and 4,235 forks.

Trending score: 4.88; stars gained: +688; forks gained: +114.

Language: Python
6. vinta/awesome-python

An opinionated list of Python frameworks, libraries, tools, and resources

GitHub repository with 301,396 stars and 28,042 forks.

Trending score: 4.60; stars gained: +518; forks gained: +24.

Language: Python

Topics: awesome, python, collections, python-frameworks, python-libraries, python-tools

aerosta/rewardhackwatch

24h trend summary

Latest metric snapshot

Similar repositories

1. Hyperion-GPU/ProofFlow-v0.1

2. aerosta/rewardhackwatch

3. azender1/SafeAgent

4. norika1207-lab/afu-brain

Trending in Python

1. NousResearch/hermes-agent

2. chopratejas/headroom

3. Imbad0202/academic-research-skills

4. rohitg00/ai-engineering-from-scratch

5. anthropics/financial-services

6. vinta/awesome-python

Trending topic: agent-safety

1. oxdeai/oxdeai

2. bridge-mind/BridgeWard

3. jamjet-labs/jamjet

4. Hyperion-GPU/ProofFlow-v0.1

5. aerosta/rewardhackwatch

6. azender1/SafeAgent