Neal006/memorylens

The open-source benchmark for LLM memory decay. Measure how Naive, RAG, Chunked RAG, Cascading, and SummaryMemory degrade over 100 conversation turns. Ebbinghaus forgetting curves, 5-provider LLM eval, multi-seed CI. No API key needed.

GitHub repository with 6 stars and 2 forks.

Language: Python

Topics: ai-evaluation, benchmarking, chatbot, conversation-memory, ebbinghaus, evaluation, large-language-models, llm, llm-benchmark, llm-memory

Open provider repository

24h trend summary

Trending score 0.01, activity score 0.00, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-05: 6 stars and 2 forks.

Similar repositories

  1. 1. ifixai-ai/iFixAi

    The open-source diagnostic for AI misalignment. 32 tests across fabrication, manipulation, deception, unpredictability, and opacity. Provider-agnostic. Runs against OpenAI, Anthropic, Bedrock, Azure, Gemini, and more. Letter grade in under 5 minutes, content-addressed manifest for bit-identical replay. Built by iMe.

    GitHub repository with 466 stars and 90 forks.

    Trending score: 1.78; stars gained: +6; forks gained: +3.

    Language: Python

    Topics: ai, diagnostic-tool, misalignment, agent-evaluation, ai-alignment, ai-evaluation

  2. 2. hyeonsangjeon/gdpval-realworks

    Benchmark LLMs on real professional tasks, not academic puzzles. YAML-driven experiment pipeline + live React dashboard for GDPVal Gold Subset (220 tasks across 11 industries).

    GitHub repository with 14 stars and 2 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai-evaluation, anthropic, azure-openai, benchmark-automation, code-interpreter, dashboard

  3. 3. vishwanathakuthota/openvals

    Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

    GitHub repository with 10 stars and 6 forks.

    Trending score: 0.03; stars gained: -2; forks gained: +1.

    Language: Python

    Topics: ai-agents, ai-evaluation, ai-evaluation-framework, ai-quality, ai-reliability, ai-safety

  4. 4. Neal006/memorylens

    The open-source benchmark for LLM memory decay. Measure how Naive, RAG, Chunked RAG, Cascading, and SummaryMemory degrade over 100 conversation turns. Ebbinghaus forgetting curves, 5-provider LLM eval, multi-seed CI. No API key needed.

    GitHub repository with 6 stars and 2 forks.

    Trending score: 0.01; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai-evaluation, benchmarking, chatbot, conversation-memory, ebbinghaus, evaluation

Trending in Python

  1. 1. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 181,180 stars and 31,085 forks.

    Trending score: 5.95; stars gained: +1,867; forks gained: +361.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  2. 2. chopratejas/headroom

    Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

    GitHub repository with 12,594 stars and 817 forks.

    Trending score: 5.69; stars gained: +2,829; forks gained: +175.

    Language: Python

    Topics: agent, ai, anthropic, claude-code, compression, context-engineering

  3. 3. Imbad0202/academic-research-skills

    Academic Research Skills for Claude Code: research → write → review → revise → finalize

    GitHub repository with 27,254 stars and 2,241 forks.

    Trending score: 5.52; stars gained: +1,079; forks gained: +89.

    Language: Python

    Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review

  4. 4. open-webui/open-webui

    User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

    GitHub repository with 140,059 stars and 20,110 forks.

    Trending score: 5.04; stars gained: +317; forks gained: +58.

    Language: Python

    Topics: ollama, ollama-webui, llm, webui, self-hosted, llm-ui

  5. 5. ZhuLinsen/daily_stock_analysis

    LLM驱动的 A/H/美股智能分析:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.

    GitHub repository with 40,774 stars and 38,952 forks.

    Trending score: 4.88; stars gained: +836; forks gained: +443.

    Language: Python

    Topics: a-stock, ai-agent, aigc, llm, quant, quantitative-finance

  6. 6. anthropics/financial-services

    GitHub repository with 29,986 stars and 4,219 forks.

    Trending score: 4.88; stars gained: +688; forks gained: +114.

    Language: Python

Trending topic: ai-evaluation

  1. 1. ifixai-ai/iFixAi

    The open-source diagnostic for AI misalignment. 32 tests across fabrication, manipulation, deception, unpredictability, and opacity. Provider-agnostic. Runs against OpenAI, Anthropic, Bedrock, Azure, Gemini, and more. Letter grade in under 5 minutes, content-addressed manifest for bit-identical replay. Built by iMe.

    GitHub repository with 466 stars and 90 forks.

    Trending score: 1.78; stars gained: +6; forks gained: +3.

    Language: Python

    Topics: ai, diagnostic-tool, misalignment, agent-evaluation, ai-alignment, ai-evaluation

  2. 2. hyeonsangjeon/gdpval-realworks

    Benchmark LLMs on real professional tasks, not academic puzzles. YAML-driven experiment pipeline + live React dashboard for GDPVal Gold Subset (220 tasks across 11 industries).

    GitHub repository with 14 stars and 2 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai-evaluation, anthropic, azure-openai, benchmark-automation, code-interpreter, dashboard

  3. 3. vishwanathakuthota/openvals

    Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

    GitHub repository with 10 stars and 6 forks.

    Trending score: 0.03; stars gained: -2; forks gained: +1.

    Language: Python

    Topics: ai-agents, ai-evaluation, ai-evaluation-framework, ai-quality, ai-reliability, ai-safety

  4. 4. Neal006/memorylens

    The open-source benchmark for LLM memory decay. Measure how Naive, RAG, Chunked RAG, Cascading, and SummaryMemory degrade over 100 conversation turns. Ebbinghaus forgetting curves, 5-provider LLM eval, multi-seed CI. No API key needed.

    GitHub repository with 6 stars and 2 forks.

    Trending score: 0.01; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai-evaluation, benchmarking, chatbot, conversation-memory, ebbinghaus, evaluation