Mentees: Aashnna Soni, Supraj Bachawala, Harshith Reddy Takkala, Dhruv Maheshwari
Project: Improving Story Ending Generation
Co-mentors: Yichao Zhou, Yekyung Kim
Hi there! My name is Chau (she/her). I am a Computer Science Ph.D. student at University of Maryland College Park, where I am advised by Professor Mohit Iyyer in the CLIP Lab. I earned my Master’s degree at UMass Amherst and my Bachelor’s degree at Colgate University, where I was advised by Professor Joel Sommers.
My research examines how large language models and agentic systems generate and reason over long-form narratives. I use narratives as both a stress test for the long-horizon reasoning that capable AI systems will need, and the foundation for machines that assist and augment how humans write. Specifically, I am interested in two directions:
⚙︎ May 2026: Interning with the Gemini Post-training team at Google Cloud (primary mentors: Shentao Yang and Wen Ding).
✦ Apr 2026: Two papers accepted to ACL 2026: AnalystBench (Adobe internship work on benchmarking LLMs and coding agents for analytical report generation) to findings and Frankentext (on long-form narratives that evade LLM text detection) to the main conference. Looking forward to visiting San Diego this summer!
✎ Mar 2026: Released AutoFiction, a web platform that lets the community read and review full-length novels generated by frontier AI agents. Please check it out!
✦ Mar 2026: Our work on LLMs' unreliability at detecting name PII got accepted to FAccT 2026!
♪ Feb 2026: Gave a talk at Simon Fraser University (CMPT 419: Advanced Method in Natural Language Processing) on Frankentexts and ongoing work on long-form creative text generation.
♪ Nov 2025: Gave a guest lecture at UTexas Austin (LIN 353D / CS 378: Computational Discourse and Natural Language Generation) on generating narratives that evade detection with Frankentexts.
✦ Oct 2025: Our work on personalized story generation got accepted to IJCNLP-AACL 2025!
✦ Aug 2025: OWL got accepted to EMNLP 2025!
✦ Jun 2025: CLIPPER and BearCubs got accepted to COLM 2025! Excited to be in Montreal this Fall.
⚙︎ Jun 2025: Interning with the Document Intelligence lab at Adobe (primary mentor: Varun Manjunatha).
✎ May 2025: Released a preprint on Frankentexts, a new type of LLM narratives generated under extreme constraints, with implications for detecting mixed-authorship AI text and simulating co-writing scenarios.
✔︎ Mar 2025: Got my MS from UMass Amherst!
✎ Feb 2025: Released a preprint on CLIPPER, a synthetic data generation pipeline for long-context narrative reasoning tasks.
➜ Jan 2025: Transferred to UMD College Park with my advisor!
✦ Nov 2024: Presenting Suri at EMNLP 2024 (+ WNU)! See you in Miami.
✎ Nov 2024: Released a Python package for TopicGPT! See TopicGPT page for more details.
♪ Oct 2024: Gave a guest lecture at Mount Holyoke College (COMSC 341NL - Topics: 'Natural Language Processing') on generating and reasoning over long-form texts.
* denotes equal contribution
AutoFiction: Measuring AI Ability to Execute
Long-Horizon Writing Tasks
New!
Chau Minh Pham*, Yapei Chang*, Mohit Iyyer
In progress
[Website]
[BibTeX]
Narrative Generation
@software{pham_chang_iyyer_2026,
author = {Pham, Chau Minh and Chang, Yapei and Iyyer, Mohit},
title = {AutoFiction},
year = {2026},
version = {0.1.0},
url = {https://www.autofiction.ai/},
note = {Web platform}
}
Frankentext: Stitching random text fragments into
long-form narratives
Chau Minh Pham, Jenna Russell, Dzung Pham, Mohit
Iyyer
ACL 2026
[Paper]
[Code]
[BibTeX]
Narrative Generation
@misc{pham2025frankentextstitchingrandomtext,
title={Frankentext: Stitching random text fragments into long-form narratives},
author={Chau Minh Pham and Jenna Russell and Dzung Pham and Mohit Iyyer},
year={2025},
eprint={2505.18128},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.18128},
}
CLIPPER: Compression enables long-context synthetic
data generation
Chau Minh Pham, Yapei Chang, Mohit Iyyer
COLM 2025
[Paper]
[Code]
[HuggingFace]
[Poster]
[BibTeX]
Narrative Reasoning
@misc{pham2025clippercompressionenableslongcontext,
title={CLIPPER: Compression enables long-context synthetic data generation},
author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
year={2025},
eprint={2502.14854},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14854},
}
Suri: Multi-constraint Instruction Following for
Long-form Text Generation
Chau Minh Pham, Simeng Sun, Mohit Iyyer
EMNLP 2024 (findings)
[Paper]
[Project Page]
[Code]
[Poster]
[BibTeX]
Narrative Generation
@misc{pham2024surimulticonstraintinstructionfollowing,
title={Suri: Multi-constraint Instruction Following for Long-form Text Generation},
author={Chau Minh Pham and Simeng Sun and Mohit Iyyer},
year={2024},
eprint={2406.19371},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.19371},
}
TopicGPT: A Prompt-based Topic Modeling Framework
Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip
Resnik, Mohit Iyyer
NAACL 2024
[Paper]
[Project Page]
[Code]
[Poster]
[BibTeX]
Topic Modeling
@inproceedings{pham-etal-2024-topicgpt,
title = "{T}opic{GPT}: A Prompt-based Topic Modeling Framework",
author = "Pham, Chau and
Hoyle, Alexander and
Sun, Simeng and
Resnik, Philip and
Iyyer, Mohit",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.164",
pages = "2956--2984",
}
* denotes equal contribution
StoryScope: Investigating idiosyncrasies in AI
fiction
Jenna Russell, Rishanth Rajendhran,
Chau Minh Pham, Mohit Iyyer, John Wieting
arXiv 2026
[Paper]
[BibTeX]
Narrative Generation
@misc{russell2026storyscopeinvestigatingidiosyncrasies,
title={StoryScope: Investigating idiosyncrasies in AI fiction},
author={Jenna Russell and Rishanth Rajendhran and Chau Minh Pham and Mohit Iyyer and John Wieting},
year={2026},
eprint={2604.03136},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.03136},
}
AutoFiction: Measuring AI Ability to Execute
Long-Horizon Writing Tasks
Chau Minh Pham*, Yapei Chang*, Mohit Iyyer
In progress
[Website]
[BibTeX]
Narrative Generation
@software{pham_chang_iyyer_2026,
author = {Pham, Chau Minh and Chang, Yapei and Iyyer, Mohit},
title = {AutoFiction},
year = {2026},
version = {0.1.0},
url = {https://www.autofiction.ai/},
note = {Web platform}
}
Frankentext: Stitching random text fragments into
long-form narratives
Chau Minh Pham, Jenna Russell, Dzung Pham, Mohit
Iyyer
ACL 2026
[Paper]
[Code]
[BibTeX]
Narrative Generation
@misc{pham2025frankentextstitchingrandomtext,
title={Frankentext: Stitching random text fragments into long-form narratives},
author={Chau Minh Pham and Jenna Russell and Dzung Pham and Mohit Iyyer},
year={2025},
eprint={2505.18128},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.18128},
}
Can Large Language Models Really Recognize Your
Name?
Dzung Pham, Peter Kairouz, Niloofar Mireshghallah, Eugene
Bagdasarian, Chau Minh Pham, Amir Houmansadr
FAccT 2026
[Paper]
[BibTeX]
Privacy & Security
@misc{pham2025largelanguagemodelsreally,
title={Can Large Language Models Really Recognize Your Name?},
author={Dzung Pham and Peter Kairouz and Niloofar Mireshghallah and Eugene Bagdasarian and Chau Minh Pham and Amir Houmansadr},
year={2025},
eprint={2505.14549},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2505.14549},
}
Whose story is it? Personalizing story generation by
inferring author styles
Nischal Ashok Kumar, Chau Minh Pham, Mohit Iyyer,
Andrew Lan
IJCNLP-AACL 2025
[Paper]
[BibTeX]
Narrative Generation
@misc{kumar2025storyitpersonalizingstory,
title={Whose story is it? Personalizing story generation by inferring author styles},
author={Nischal Ashok Kumar and Chau Minh Pham and Mohit Iyyer and Andrew Lan},
year={2025},
eprint={2502.13028},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.13028},
}
OWL: Probing Cross-Lingual Recall of Memorized Texts
via World Literature
Alisha Srivastava*, Emir Korukluoglu*, Minh Nhat Le*, Duyen
Tran, Chau Minh Pham, Marzena Karpinska, Mohit
Iyyer
EMNLP 2025
[Paper]
[BibTeX]
Memorization
@misc{srivastava2025owlprobingcrosslingualrecall,
title={OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature},
author={Alisha Srivastava and Emir Korukluoglu and Minh Nhat Le and Duyen Tran and Chau Minh Pham and Marzena Karpinska and Mohit Iyyer},
year={2025},
eprint={2505.22945},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.22945},
}
CLIPPER: Compression enables long-context synthetic
data generation
Chau Minh Pham, Yapei Chang, Mohit Iyyer
COLM 2025
[Paper]
[Code]
[HuggingFace]
[BibTeX]
Narrative Reasoning
@misc{pham2025clippercompressionenableslongcontext,
title={CLIPPER: Compression enables long-context synthetic data generation},
author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
year={2025},
eprint={2502.14854},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14854},
}
BEARCUBS: A benchmark for computer-using web agents
Yixiao Song, Katherine Thai, Chau Minh Pham, Yapei
Chang, Mazin Nadaf, Mohit Iyyer
COLM 2025
[Paper]
[Website]
[BibTeX]
Agentic AI
@misc{song2025bearcubsbenchmarkcomputerusingweb,
title={BEARCUBS: A benchmark for computer-using web agents},
author={Yixiao Song and Katherine Thai and Chau Minh Pham and Yapei Chang and Mazin Nadaf and Mohit Iyyer},
year={2025},
eprint={2503.07919},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2503.07919},
}
ProxyGPT: Enabling Anonymous Queries in AI Chatbots
with (Un) Trustworthy Browser Proxies
Dzung Pham, Jade Sheffey, Chau Minh Pham, Amir
Houmansadr
MADWeb 2025
[Paper]
[BibTeX]
Privacy & Security
@misc{pham2024proxygptenablinganonymousqueries,
title={ProxyGPT: Enabling Anonymous Queries in AI Chatbots with (Un)Trustworthy Browser Proxies},
author={Dzung Pham and Jade Sheffey and Chau Minh Pham and Amir Houmansadr},
year={2024},
eprint={2407.08792},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2407.08792},
}
Suri: Multi-constraint Instruction Following for
Long-form Text Generation
Chau Minh Pham, Simeng Sun, Mohit Iyyer
EMNLP 2024 (findings); 6th Workshop on
Narrative Understanding
[Paper]
[Project Page]
[Code]
[Poster]
[BibTeX]
Narrative Generation
@misc{pham2024surimulticonstraintinstructionfollowing,
title={Suri: Multi-constraint Instruction Following for Long-form Text Generation},
author={Chau Minh Pham and Simeng Sun and Mohit Iyyer},
year={2024},
eprint={2406.19371},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.19371},
}
TopicGPT: A Prompt-based Topic Modeling Framework
Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip
Resnik, Mohit Iyyer
NAACL 2024
[Paper]
[Project Page]
[Code]
[Poster]
[BibTeX]
Topic Modeling
@inproceedings{pham-etal-2024-topicgpt,
title = "{T}opic{GPT}: A Prompt-based Topic Modeling Framework",
author = "Pham, Chau and
Hoyle, Alexander and
Sun, Simeng and
Resnik, Philip and
Iyyer, Mohit",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.164",
pages = "2956--2984",
}
Emotion analysis and detection during COVID-19
Tiberiu Sosea, Chau Pham, Alexander Tekle, Cornelia
Caragea, Junyi Jessy Li
LREC 2022
[Paper]
[Code]
[BibTeX]
Computational Social Science
@inproceedings{sosea-etal-2022-emotion,
title = "Emotion analysis and detection during {COVID}-19",
author = "Sosea, Tiberiu and
Pham, Chau and
Tekle, Alexander and
Caragea, Cornelia and
Li, Junyi Jessy",
editor = "Calzolari, Nicoletta and
B{\'e}chet, Fr{\'e}d{\'e}ric and
Blache, Philippe and
Choukri, Khalid and
Cieri, Christopher and
Declerck, Thierry and
Goggi, Sara and
Isahara, Hitoshi and
Maegaard, Bente and
Mariani, Joseph and
Mazo, H{\'e}l{\`e}ne and
Odijk, Jan and
Piperidis, Stelios",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lrec-1.750",
pages = "6938--6947",
}
Reassessing the constancy of end-to-end internet
latency
Lily Davisson*, Joakim Jakovleski*, Nhiem Ngo*,
Chau Pham*, Joel Sommers
TMA 2021
[Paper]
[BibTeX]
Networking Systems
@inproceedings{davisson2021reassessing,
title={Reassessing the constancy of end-to-end internet latency},
author={Davisson, Lily and Jakovleski, Joakim and Ngo, Nhiem and Pham, Chau and Sommers, Joel},
booktitle={IFIP Network Traffic Measurement and Analysis Conference},
year={2021}
}
The prompt report: A systematic survey of prompting
techniques
Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine
Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta,
HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet,
Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal,
Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao,
Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L
Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker,
Denis Peskoff, Marine Carpuat, Jules White, Shyamal Anadkat,
Alexander Hoyle, Philip Resnik
arXiv 2024
[Paper]
[BibTeX]
Survey / Benchmark
@misc{schulhoff2024promptreportsystematicsurvey,
title={The Prompt Report: A Systematic Survey of Prompting Techniques},
author={Sander Schulhoff and Michael Ilie and Nishant Balepur and Konstantine Kahadze and Amanda Liu and Chenglei Si and Yinheng Li and Aayush Gupta and HyoJung Han and Sevien Schulhoff and Pranav Sandeep Dulepet and Saurav Vidyadhara and Dayeon Ki and Sweta Agrawal and Chau Pham and Gerson Kroiz and Feileen Li and Hudson Tao and Ashay Srivastava and Hevander Da Costa and Saloni Gupta and Megan L. Rogers and Inna Goncearenco and Giuseppe Sarli and Igor Galynker and Denis Peskoff and Marine Carpuat and Jules White and Shyamal Anadkat and Alexander Hoyle and Philip Resnik},
year={2024},
eprint={2406.06608},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.06608},
}
Interactive Topic Models with Optimal Transport
Garima Dhanania*, Sheshera Mysore*, Chau Minh Pham,
Mohit Iyyer, Hamed Zamani, Andrew McCallum
arXiv 2024
[Paper]
[BibTeX]
Topic Modeling
@misc{dhanania2024interactivetopicmodelsoptimal,
title={Interactive Topic Models with Optimal Transport},
author={Garima Dhanania and Sheshera Mysore and Chau Minh Pham and Mohit Iyyer and Hamed Zamani and Andrew McCallum},
year={2024},
eprint={2406.19928},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.19928},
}
AHA!: Facilitating AI Impact Assessment by Generating
Examples of Harms
Zana Buçinca, Chau Minh Pham, Maurice Jakesch, Marco
Tulio Ribeiro, Alexandra Olteanu, Saleema Amershi
arXiv 2023
[Preprint]
[BibTeX]
AI Safety
@misc{buçinca2023ahafacilitatingaiimpact,
title={AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms},
author={Zana Buçinca and Chau Minh Pham and Maurice Jakesch and Marco Tulio Ribeiro and Alexandra Olteanu and Saleema Amershi},
year={2023},
eprint={2306.03280},
archivePrefix={arXiv},
primaryClass={cs.HC},
url={https://arxiv.org/abs/2306.03280},
}
UMD College Park
TA - CMSC 848O - S25: Seminar on Long-context Language Models
UMass Amherst
Mentees: Aashnna Soni, Supraj Bachawala, Harshith Reddy Takkala, Dhruv Maheshwari
Project: Improving Story Ending Generation
Co-mentors: Yichao Zhou, Yekyung Kim
Mentees: Emir Korukluoglu, Minh Le, Alisha Srivastava, Duyen Tran
Project: Cross-lingual Narrative Memorization; led to OWL at EMNLP 2025
Co-mentors: Marzena Karpinska, Mohit Iyyer
TA - COMPSCI 685 - S24: Advanced Natural Language Processing
TA - COMPSCI 110 - Su23: Foundations of Programming
Colgate University
TA - CS 480A - F21: Natural Language Processing
TA - CS 101 - F19, S20, S21, S22: Introduction to Algorithms and
Data Structures
In my free time, I enjoy powerlifting, running, reading, and cooking experimentally.
I am originally from Hanoi, Vietnam 🇻🇳. If you are planning to visit Hanoi, I would recommend checking out Nguyen Phan Que Mai's read your way through Hanoi and cháo đậu cà (green pea porridge with marinated fried tofu and fermented thai eggplants).