yotambraun/Toolscore
Python framework for evaluating LLM tool-calling behavior with comprehensive metrics on accuracy, efficiency, and correctness
GitHub repository with 5 stars and 0 forks.
Language: Python
Topics: ai-agents, ai-agents-and-tools, anthropic, function-calling, langchain, llm, llm-evaluation, llm-metrics, metrics, openai