Lau Luk Peter, Chen
Logo DAMO Academy, Alibaba Group U.S.

I do RL, Optimization, and LLM Post-training. Currently, I work for the post-training division in DAMO Academy, under Wotao Yin.

I did my undergrad at Columbia, advised by Prof. Andrew Blumberg and Prof. Tianyi Lin. I also spent a great time at Princeton with Prof. Mengdi Wang's group.


Education
  • Columbia College, Columbia University
    Columbia College, Columbia University
    B.A. in Mathematics, Computer Science
    May. 2026
Experience
  • DAMO Academy, Alibaba Group U.S.
    DAMO Academy, Alibaba Group U.S.
    Research Scientist Intern
    06/2026 -- 08/2026
  • Princeton University
    Princeton University
    Research Intern; Hosted by Mengdi Wang
    02/2025 -- 12/2025
  • HKU Musketeers Foundation Institute of Data Science
    HKU Musketeers Foundation Institute of Data Science
    Research Intern; Hosted by Yue Xie, Qingpeng Zhang
    05/2024 -- 08/2024
Teaching & Service
  • TA for Analysis & Optimization (Sp 24/Fa 24/Sp 25/Fa 25/Sp 26)
  • Reviewer for NeurIPS, ICLR, ICML, AAAI, TMLR
Selected Publications (view all )
Reward-free Alignment for Conflicting Objectives
Reward-free Alignment for Conflicting Objectives

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Machine Learning (ICML 2026) Spotlight

Reward-free Alignment for Conflicting Objectives

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Machine Learning (ICML 2026) Spotlight

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Learning Representations (ICLR 2026)

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin, Xi Chen, Tianyi Lin

Proceedings of the International Conference on Learning Representations (ICLR 2026)

ComPO: Preference Alignment via Comparison Oracles
ComPO: Preference Alignment via Comparison Oracles

Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin

Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

ComPO: Preference Alignment via Comparison Oracles

Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin

Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence
3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence

Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Zev Gartner, Yining Liu

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)

3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence

Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Zev Gartner, Yining Liu

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi Lin

Transactions on Machine Learning Research (TMLR 2026)

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi Lin

Transactions on Machine Learning Research (TMLR 2026)

All publications