Search
CUHK-Shenzhen
简体中文
  • Apply
    • Submit an Application
    • Student Testimonial
    • Who Can Apply
    • Entry Requirements
    • Admission Process & Timetable
      • Application Rounds & Important Dates
      • Receive Admission Result
      • Accept an Offer
    • Pre-Arrival Assistance
      • Visa & Residence Permit
      • Underage Guardianship
      • International Student Guide
  • Finance My Study
    • Fees & Expenses
    • Scholarships & Work-Study Opportunities
  • Short Term Programs
    • Summer Camp
      • SME
      • SSE
      • HSS
      • SDS
      • MED
    • Winter Camp
  • Study & Life
    • Why CUHK-Shenzhen
    • Schools and Subjects
      • School of Management and Economics
      • School of Science and Engineering
      • School of Humanities and Social Science
      • School of Medicine
      • School of Data Science
      • School of Music
      • 2+2 Double Major Programme
        • X+ASEI
        • X+IDA
      • General Education
      • Study Abroad
    • Campus Life
      • Accommodation
      • Student Services
      • Facilities
      • University Arts Center
    • Life in Shenzhen
  • Career & Advanced Studies
    • Career
      • IANG Visa
    • Advanced Studies
  • News & Events
    • Events
    • News
  • Visit Us
    • On-Site Visit
    • Virtual Visit
  • Apply
    • Submit an Application
    • Student Testimonial
    • Who Can Apply
    • Entry Requirements
    • Admission Process & Timetable
      • Application Rounds & Important Dates
      • Receive Admission Result
      • Accept an Offer
    • Pre-Arrival Assistance
      • Visa & Residence Permit
      • Underage Guardianship
      • International Student Guide
  • Finance My Study
    • Fees & Expenses
    • Scholarships & Work-Study Opportunities
  • Short Term Programs
    • Summer Camp
      • SME
      • SSE
      • HSS
      • SDS
      • MED
    • Winter Camp
  • Study & Life
    • Why CUHK-Shenzhen
    • Schools and Subjects
      • School of Management and Economics
      • School of Science and Engineering
      • School of Humanities and Social Science
      • School of Medicine
      • School of Data Science
      • School of Music
      • 2+2 Double Major Programme
        • X+ASEI
        • X+IDA
      • General Education
      • Study Abroad
    • Campus Life
      • Accommodation
      • Student Services
      • Facilities
      • University Arts Center
    • Life in Shenzhen
  • Career & Advanced Studies
    • Career
      • IANG Visa
    • Advanced Studies
  • News & Events
    • Events
    • News
  • Visit Us
    • On-Site Visit
    • Virtual Visit
CUHK-Shenzhen
简体中文

Breadcrumb

  • Home
  • News & Events
  • Events
  • Reschedule: 【SDS Topical Seminar Series】Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Reschedule: 【SDS Topical Seminar Series】Reinforcement Learning for Reasoning in Large Language Models with One Training Example

May 21,2025 Upcoming Events

Dear all,

 

You are cordially invited to the School of Data Science Topical Seminar on Reinforcement Learning for Reasoning in Large Language Models with One Training Example. Detailed information is as follows:

SDS Topical Seminar Series

Topic

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Speaker

Simon S. DU, Assistant Professor, Paul G. Allen School of Computer Science & Engineering, University of Washington

Host

Ruoyu SUN, Associate Professor, School of Data Science, CUHK-Shenzhen

Date

21 May (Wednesday), 2025

Time

2:30 PM - 3:30 PM, Beijing Time

Format

Hybrid

Venue

Teaching Complex C 203

Live on WeChat Channels

Zoom Link

https://cuhk-edu-cn.zoom.us/j/91843858726?pwd=LugYKuPUBp3eaaCo5Hav8ioygBKd0O.1

Meeting ID 918 4385 8726, Password: 412121

Language

English

Abstract

We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the math reasoning capabilities of large language models (LLMs). Applying RLVR to the base model Qwen2.5-Math-1.5B, we identify a single example that elevates model performance on MATH500 from 36.0% to 73.6%, and improves the average performance across six common mathematical reasoning benchmarks from 17.6% to 35.7%. This result matches the performance obtained using the 1.2k DeepScaleR subset (MATH500: 73.6%, average: 35.9%), which includes the aforementioned example. Similar substantial improvements are observed across various models (Qwen2.5-Math-7B, Llama3.2-3B-Instruct, DeepSeek-R1-Distill-Qwen-1.5B), RL algorithms (GRPO and PPO), and different math examples (many of which yield approximately 30% or greater improvement on MATH500 when employed as a single training example). In addition, we identify some interesting phenomena during 1-shot RLVR, including cross-domain generalization, increased frequency of self-reflection, and sustained test performance improvement even after the training accuracy has saturated, a phenomenon we term post-saturation generalization. Moreover, we verify that the effectiveness of 1-shot RLVR primarily arises from the policy gradient loss, distinguishing it from the "grokking" phenomenon. We also show the critical role of promoting exploration (e.g., by adding entropy loss with an appropriate coefficient) in 1-shot RLVR training. As a bonus, we observe that applying entropy loss alone, without any outcome reward, significantly enhances Qwen2.5-Math-1.5B's performance on MATH500 by 27.4%. These findings can inspire future work on RLVR data efficiency and encourage a re-examination of both recent progress and the underlying mechanisms in RLVR.

Biography

Simon S. Du is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. His research interests are broadly in machine learning, such as deep learning, representation learning, and reinforcement learning. Prior to starting as faculty, he was a postdoc at the Institute for Advanced Study of Princeton. He completed his Ph.D. in Machine Learning at Carnegie Mellon University. Simon's research has been recognized by a Sloan Research Fellowship, an IEEE AI's 10 to Watch Fellowship, a Schmidt Sciences AI2050 Early Career Fellow, a Samsung AI Researcher of the Year Award, an Intel Rising Star Faculty Award, an NSF CAREER award, a Distinguished Dissertation Award honorable mention from CMU, among others.

         

Copyright © CUHK-Shenzhen All Rights Reserved. | 粤ICP备14099122-1号