I am a (Now() - 04/2021).ceil().ordinal() year PhD student studying natural language processing (NLP) at University of Washington. I am fortunate to be advised by Prof. Yejin Choi and Prof. Hanna Hajishirzi. I am also a part-time researcher at the Allen Institute for AI.

My current research topics are inspecting massive text corpora, training data attribution, LM pretraining, and scaling laws. During my PhD, I have worked on commonsense knowledge generation and verification, automated theorem proving, RLHF, and text decoding.

Previously, I received B.S. in Computer Science from University of Illinois at Urbana-Champaign, where I worked with Prof. Julia Hockenmaier. I used to work in Facebook’s Natural Language Generation (NLG) team.

My name in Chinese characters is 刘嘉程

Email: liujc [at] cs.washington.edu

[CV] [Google Scholar] [GitHub] [Twitter] [LinkedIn]

Research and other blogs: this website and [Zhihu]

Private pilot and other personal life VLOGs: [Bilibili] [YouTube]

Personal: [Facebook]



News

Publications

Preprints

Establishing Task Scaling Laws via Compute-Efficient Model Ladders
Akshita Bhagia*, Jiacheng Liu*, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
[Arxiv]

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
Ximing Lu, Melanie Sclar, Skyler Hallinan, Niloofar Mireshghallah, Jiacheng Liu, Seungju Han, Allyson Ettinger, Liwei Jiang, Khyathi Chandu, Nouha Dziri, Yejin Choi
[Arxiv] [Demo]

Peer-Reviewed Papers

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A Smith, Yejin Choi, Hannaneh Hajishirzi
NeurIPS 2024
[Arxiv] [Code] [Models]

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi
COLM 2024 (Oral Spotlight, 2%)
[Arxiv] [Project Page] [Demo]

Don’t throw away your value model! Making PPO even better via Value-Guided Monte-Carlo Tree Search decoding
Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Yejin Choi, Hannaneh Hajishirzi, Asli Celikyilmaz
COLM 2024
[Arxiv] [Code]

Are machines better at complex reasoning? Unveiling human-machine inference gaps in entailment verification
Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren
ACL 2024 (Findings)
[Arxiv] [Model]

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao
ICLR 2024 (Oral); NeurIPS 2023 MATH-AI Workshop
[Arxiv] [Project Page] [Code] [Dataset] [HF Dataset]

Crystal: Introspective Reasoners Reinforced with Self-Feedback
Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, Asli Celikyilmaz
EMNLP 2023 (Main Conference, Oral)
[Arxiv] [Code] [Models: large 3b 11b] [Demo]

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
EMNLP 2023 (Main Conference, Oral)
[Arxiv] [Code] [Model] [Demo] [Dataset]

Inverse Scaling: When Bigger Isn’t Better
Ian R McKenzie, …, Jiacheng Liu, …, Samuel R Bowman, Ethan Perez
TMLR (2023.10)
[Arxiv]

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, Yuhuai Wu
ICLR 2023 (Oral, 5%)
[Arxiv]

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
Jiacheng Liu, Skyler Hallinan, Ximing Lu, Pengfei He, Sean Welleck, Hannaneh Hajishirzi, Yejin Choi
EMNLP 2022 (Main Conference)
[Arxiv] [Code/Data] [Models: Policy Value] [Demo]

NaturalProver: Grounded Mathematical Proof Generation with Language Models
Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi
NeurIPS 2022
[Arxiv] [Code]

NaturalProver: Grounded Natural Language Proof Generation with Language Models
Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi
AITP 2022 (Contributed Talk)
[Talk]

Generated Knowledge Prompting for Commonsense Reasoning
Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi
ACL 2022 (Main Conference)
[Arxiv] [Code] [Talk] [Poster]

Towards Grounded Natural Language Proof Generation
Sean Welleck, Jiacheng Liu, Jesse Michael Han, Yejin Choi
NeurIPS 2021 MATHAI4ED Workshop (Contributed Talk)
[Talk] [Poster]

NaturalProofs: Mathematical Theorem Proving in Natural Language
Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho
NeurIPS 2021 Datasets and Benchmarks Track (Oral, 1%)
[Arxiv] [Data/Code/Models] [Project Page] [Talk] [Slides]

NaturalProofs: Mathematics meets Natural Language
Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho
AITP 2021 (Contributed Talk)
[Talk] [Slides]

Phrase Grounding by Soft-Label Chain Conditional Random Field
Jiacheng Liu, Julia Hockenmaier
EMNLP-IJCNLP 2019 (Oral)
[Arxiv] [Code] [Slides]

CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han
EMNLP-IJCNLP 2019 (Oral)
[Arxiv] [Code] [Slides]



Posts

subscribe via RSS