Chih-kai Ting, Karl Munson, et al.
AAAI 2023
Building reliable applications that leverage large language models (LLMs) remains a significant challenge. While LLMs offer impressive capabilities across diverse tasks, their outputs often lack accuracy and provide no clear measure of confidence. This uncertainty compounds in flows of multiple calls to LLMs and other tools, making it difficult for developers and end-users to trust the results. This paper introduces a probabilistic language for programming LLM-based flows. It enables developers to quantify and propagate uncertainty throughout the application's flow, and experiment with different inference scaling techniques without adding a single line of code beyond the flow's logic. We present an experimental study to demonstrate this capability, and a case study building a theorem proving agent for the Rocq theorem prover.
Chih-kai Ting, Karl Munson, et al.
AAAI 2023
Amit Dhurandhar, Vijil Vijil, et al.
ICML 2026
Yuanzhe Liu, Ryan Deng, et al.
NeurIPS 2025
Tyler Stennett, Myeongsoo Kim, et al.
ICSE 2025