~/about
Hey, I’m Bobby Cheng.
I work on reinforcement learning for LLMs, building the environments, training pipelines, and evaluation frameworks that let language models learn through interaction. My work spans self-play optimization, multi-agent reasoning benchmarks, and distributed RL infrastructure.
[Email] [X] [GitHub] [Google Scholar]
~/work
- Co-First Author, TextArena [Site] [GitHub] [arXiv] [Blog] [IBM]
- Co-First Author, MEMO: Memory-Augmented Model Context Optimization [Site] [GitHub] [arXiv]
- Builder, UnstableBaselines [GitHub] [Blog]
- Builder, SuperTinyLanguageModels [GitHub] [arXiv]
- Co-Organiser, MindGamesChallenge @ NeurIPS 2025 [Site]
- Area Chair, NeurIPS 2025 Workshop on Multi-Turn Interactions in LLMs