2026

Tree Bandits with Multi-fidelity Actions
Tree Bandits with Multi-fidelity Actions

Peter Chen, Xi Chen

NeurIPS 2026, Under Review

Tree Bandits with Multi-fidelity Actions

Peter Chen, Xi Chen

NeurIPS 2026, Under Review

Reward-free Alignment for Conflicting Objectives
Reward-free Alignment for Conflicting Objectives

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Machine Learning (ICML 2026) Oral

Reward-free Alignment for Conflicting Objectives

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Machine Learning (ICML 2026) Oral

2025

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Learning Representations (ICLR 2026)

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Learning Representations (ICLR 2026)

ComPO: Preference Alignment via Comparison Oracles
ComPO: Preference Alignment via Comparison Oracles

Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin

Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

ComPO: Preference Alignment via Comparison Oracles

Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin

Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Jiacheng Guo$^*$, Ling Yang$^*$, Peter Chen$^*$, Qixin Xiao$^*$, Yinjie Wang, Xinzhe Juan, Jiahao Qiu, Ke Shen, Mengdi Wang

Arxiv 2512.19682

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Jiacheng Guo$^*$, Ling Yang$^*$, Peter Chen$^*$, Qixin Xiao$^*$, Yinjie Wang, Xinzhe Juan, Jiahao Qiu, Ke Shen, Mengdi Wang

Arxiv 2512.19682

3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence
3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence

Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Zev Gartner, Yining Liu

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)

3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence

Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Zev Gartner, Yining Liu

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi Lin

Transactions on Machine Learning Research (TMLR 2026)

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi Lin

Transactions on Machine Learning Research (TMLR 2026)

Displacement-Sparse Neural Optimal Transport
Displacement-Sparse Neural Optimal Transport

Peter Chen, Yue Xie, Qingpeng Zhang

Arxiv 2502.01889

Displacement-Sparse Neural Optimal Transport

Peter Chen, Yue Xie, Qingpeng Zhang

Arxiv 2502.01889

2024

SICNN: Sparsity-induced Input Convex Neural Network

Peter Chen, Yue Xie, Qingpeng Zhang

NeurIPS 2024 Optimization for Machine Learning

SICNN: Sparsity-induced Input Convex Neural Network

Peter Chen, Yue Xie, Qingpeng Zhang

NeurIPS 2024 Optimization for Machine Learning