JingbiaoMei/ATM-Bench
ATM-Bench: A benchmark for long-term personalized memory QA spanning ~4 years of multimodal data (images, videos, emails). Features referential queries, evidence-grounded answering, and multi-source reasoning. Paper: "According to Me: Long-Term Personalized Referential Memory QA"
GitHub repository with 45 stars and 2 forks.
Language: Python
Topics: agentic-memory, ai-agent, ai-agents, benchmark, long-term-memory, memory, memory-agent, memory-management, personal-ai, personal-ai-assistant