• 1 Post
  • 4 Comments
Joined 1 year ago
cake
Cake day: June 20th, 2023

help-circle


  • Without knowing anything about this model or what it was trained on or how it was trained, it’s impossible to say exactly why it displays this behavior. But there is no “hidden layer” in llama.cpp that allows for “hardcoded”/“built-in” content.

    It is absolutely possible for the model to “override pretty much anything in the system context”. Consider any regular “censored” model, and how any attempt at adding system instructions to change/disable this behavior is mostly ignored. This model is probably doing much the same thing except with a “built-in story” rather than a message that says “As an AI assistant, I am not able to …”.

    As I say, without knowing anything more about what model this is or what the training data looked like, it’s impossible to say exactly why/how it has learned this behavior or even if it’s intentional (this could just be a side-effect of the model being trained on a small selection of specific stories, or perhaps those stories were over-represented in the training data).