tongjingqi/Thinking-with-Video
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reaches 69% accuracy on MMMU.
GitHub repository with 303 stars and 5 forks.
Language: Python
Topics: multimodal-reasoning, sora, sora2, video-generation