We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that LLMs, pre-trained on extensive corpora rich in human knowledge and preferences, can simulate human behaviours well enough to jump-start contextual multi-armed bandits to reduce online learning regret. We propose an initialization algorithm for contextual bandits by prompting LLMs to produce a pre-training dataset of approximate human preferences for the bandit. This significantly reduces online learning regret and data-gathering costs for training such models. Our approach is validated empirically through two sets of experiments with different bandit setups: one which utilizes LLMs to serve as an oracle and a real-world experiment utilizing data from a conjoint survey experiment.
Bibtex
@misc{alamdari2024jumpstartingbanditsllmgenerated,
title={Jump Starting Bandits with LLM-Generated Prior Knowledge},
author={Parand A. Alamdari and Yanshuai Cao and Kevin H. Wilson},
year={2024},
eprint={2406.19317},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.19317},
}
Related Research
-
Identifying and Addressing Delusions for Target-Directed Decision-Making
Identifying and Addressing Delusions for Target-Directed Decision-Making
M. Zhao, T. Sylvain, D. Precup, and Y. Bengio. Workshop at Conference on Neural Information Processing System (NeurIPS)
Publications
-
Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models
Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models
S. Mahdavi, R. Aoki, K. Tang, and Y. Cao. Conference on Neural Information Processing System (NeurIPS)
Publications
-
AdaFlood: Adaptive Flood Regularization
AdaFlood: Adaptive Flood Regularization
W. Bae, Y. Ren, M. O. Ahmed, F. Tung, D. J. Sutherland, and G. Oliveira. Transactions on Machine Learning Research (TMLR)
Publications