responsibleai/ASSERT
Requirement-driven evaluation harness for AI agents and LLM applications. Generate behavior-specific test cases, run them against any target (hosted models, callable wrappers, OTel-traced agents), and inspect local-first artifacts.
GitHub repository with 87 stars and 13 forks.
Language: Python