Rejecting Hallucinated State Targets during Planning - Publication

Generative models can be used in planning to propose targets corresponding to states that agents deem either likely or advantageous to experience. However, imperfections, common in learned models, lead to infeasible hallucinated targets, which can cause delusional behaviors and thus safety concerns. This work first categorizes and investigates the properties of several kinds of infeasible targets. Then, we devise a strategy to reject infeasible targets with a generic target evaluator, which trains alongside planning agents as an add-on without the need to change the behavior nor the architectures of the agent (and the generative model) it is attached to. We highlight that, without proper design, the evaluator can produce delusional estimates, rendering the strategy futile. Thus, to learn correct evaluations of infeasible targets, we propose to use a combination of learning rule, architecture, and two assistive hindsight relabeling strategies. Our experiments validate significant reductions in delusional behaviors and performance improvements for several kinds of existing planning agents.

Bibtex

@misc{zhao2025rejectinghallucinatedstatetargets,
title={Rejecting Hallucinated State Targets during Planning},
author={Mingde Zhao and Tristan Sylvain and Romain Laroche and Doina Precup and Yoshua Bengio},
year={2025},
eprint={2410.07096},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.07096},
}

Related Research

Stochastic processes and SDEs

Stochastic processes and SDEs

S. Prince.

Research
Numerical methods for ODEs

Numerical methods for ODEs

S. Prince.

Research
Detecting Mule Account Fraud with Federated Learning

Detecting Mule Account Fraud with Federated Learning

Responsible AI

Research

Cookies Settings

Bibtex

Related Research