youngmin0oh/rcdp-public
Robust contextual dueling bandits with post-serving context, delayed feedback, and adversarial corruption (RLHF / preference learning) — ICML 2026
GitHub repository with 6 stars and 0 forks.
Language: Python
Topics: adversarial-robustness, bandit-algorithms, contextual-bandits, delayed-feedback, dueling-bandits, icml-2026, machine-learning, multi-armed-bandits, online-learning, preference-learning