Large language models can achieve incredible performance on some tasks without having internalized a coherent model of the world or the rules that govern it, MIT researchers find. This means these models are likely to fail unexpectedly if they are deployed in situations where the environment or task slightly changes.