I am a (Now() - 04/2021).ceil().ordinal() year PhD student studying AI and language models at University of Washington. I am fortunate to be advised by Prof. Yejin Choi and Prof. Hanna Hajishirzi. I am also a student researcher at the Allen Institute for AI (Ai2).

My current research topics are inspecting massive text corpora, training data attribution, LM pretraining, and scaling laws. During my PhD, I have worked on commonsense knowledge generation and verification, automated theorem proving, RLHF, and text decoding.

Previously, I received B.S. in Computer Science from University of Illinois at Urbana-Champaign, where I worked with Prof. Julia Hockenmaier. I used to work in Facebook’s Natural Language Generation (NLG) team.

My name in Chinese characters is 刘嘉程

Email: liujc [at] cs [dot] washington [dot] edu

[CV] [Google Scholar] [GitHub] [Twitter] [LinkedIn]


News


Selected Publications

See my full list of publications here.

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz, Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt, Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Rock Yuren Pang, Pang Wei Koh, Noah A. Smith, Sophie Lebrecht, Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi, Jesse Dodge
ACL 2025 System Demonstrations Track
[Arxiv] [Blog] [Web Interface] [Code] [Twitter] [Trailer Video] [Demo Video]

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi
COLM 2024 (Oral Spotlight, 2%)
[Arxiv] [Project Page] [Web Interface] [API Endpoint] [Python Package] [Code] [Documentation]

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A Smith, Yejin Choi, Hannaneh Hajishirzi
NeurIPS 2024
[Arxiv] [Code] [Models]

Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Yejin Choi, Hannaneh Hajishirzi, Asli Celikyilmaz
COLM 2024
[Arxiv] [Code]

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
EMNLP 2023 (Main Conference, Oral)
[Arxiv] [Code] [Model] [Demo] [Dataset]

Generated Knowledge Prompting for Commonsense Reasoning
Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi
ACL 2022 (Main Conference)
[Arxiv] [Code] [Talk] [Poster]