vignesh2027/LLM-Evaluation-Framework
Production-grade LLM Evaluation & Benchmarking Framework — GPT-4, Claude, Gemini, Mistral. Accuracy, latency, cost, hallucination, reasoning metrics.
GitHub repository with 12 stars and 0 forks.
Language: Python
Topics: accuracy, ai, benchmarking, claude, fastapi, gemini, gpt4, hallucination, large-language-models, latency