JudgmentLabs/judgeval
The Continuous-Improvement Stack for Agents. Our environment data and evals power agent improvement and monitoring.
GitHub repository with 1,036 stars and 93 forks.
Language: Python
Topics: langchain, langgraph, llama-index, llm, llm-evaluation, llm-observability, open-source, openai, prompt-engineering, agent