Kareem-Rashed/rubric-eval
Independent framework to test, benchmark, and evaluate LLMs & AI agents locally.
GitHub repository with 13 stars and 1 forks.
Language: Python
Topics: agents, ai, anthropic, benchmarking, evals, evaluation, langchain, llm, llm-evaluation, machine-learning