Efficient LLMs and RL&Agentic Systems
Squeeze Evolve: Verifier-free evolutionary test-time scaling with multi-model orchestration
Monishwaran Maheswaran*, Leon Lakhani*, Zhongzhu Zhou, Shijia Yang, Junxiong Wang, Coleman Hooper, Yuezhou Hu, Rishabh Tiwari, Jue Wang, Harman Singh, Qingyang Wu, Yuqing Jian, Ce Zhang, Kurt Keutzer, Tri Dao, Xiaoxia Wu, Ben Athiwaratkun, James Zou Chenfeng Xu* [Website]

Residual Context Diffusion Language Models
Yuezhou Hu*, Harman Singh*, Monishwaran Maheswaran*, Haocheng Xi, Coleman Richard Charles Hooper, Jintao Zhang, Aditya Tomar, Michael W. Mahoney, Sewon Min, Mehrdad Farajtabar, Kurt Keutzer, Amir Gholami, Chenfeng Xu† [ICML2026, Website]

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
Junxiong Wang*†, Fengxiang Bie*†, Jisen Li†, Zhongzhu Zhou†, Zelei Shao†, Yubo Wang†,Yinghui Liu†, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao,Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu† (co-lead), Xiaoxia Wu† [ICML 2026, Website]
​
Aurora is the first day-0 support speculative system! You can accelerate your LLM since today! 😎

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training
Zelei Shao, Vikranth Srivatsa, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue WANG, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang [MLSys 2026]

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Wang Qinsi*, Jinghan Ke*, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu [ICLR 2025]

Flash-KMeans: Fast and Memory-Efficient Exact K-Means
Shuo Yang∗, Haocheng Xi∗, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu† (corresponding author), Ion Stoica [Website]

​ThunderAgent: A Fast, Simple, and Robust Program-Aware Agentic Inference System
Hao Kang*, Ziyang Li*, Xinyu Yang*, Weili Xu, Yinfang Chen, Junxiong Wang, Beidi Chen, Tushar Krishna, Chenfeng Xu, Simran Arora [ICML 2026 Spotlight, website]

CDLM: Consistency Diffusion Language Models For Faster Sampling
Minseo Kim, Chenfeng Xu, Coleman Hooper, Harman Singh, Ben Athiwaratkun, Ce Zhang, Kurt Keutzer, Amir Gholami [MLSys 2026]

Angles Don’t Lie: Unlocking Training-Efficient RL Through the Model’s Own Signals
Qinsi Wang*, Jinghan Ke*, Hancheng Ye, Yueqian Lin, Yuzhe Fu, Jianyi Zhang, Kurt Keutzer, Chenfeng Xu†, Yiran Chen† [Neurips 2025 Spotlight]
