aisa-group/decomposing-eval-awareness
Decomposing and measuring evaluation awareness in existing benchmarks and our proposed EvalAwareBench.
GitHub repository with 13 stars and 3 forks.
Language: Python
Topics: ai-alignment, ai-safety, benchmark, evaluation-awareness, llm, situational-awareness