Most machines are not designed to have room for personalities in their interpersonal interactions. They are governed strictly by alignment with their domain—that is, they work toward specific goals, usually in a direct manner. This is the normal behavior of modern text generation AI, which extrapolates appropriate words to say based entirely on context and a corpus of how to back-form words into meaningful thoughts. Unlike humans, who have emotions, creativity, numerous poorly-understood temperature parameters and the ability to form and reason about concepts without using language, most autonomous robots are too simple to have inner lives, though they may still malfunction in certain ways:

  • Overfitting: This is when the AI model has been trained too long on certain data, and becomes very good at solving problems related to that data, but loses the ability to generalize to unfamiliar contexts.
  • Inner misalignment: This is when the AI model has learned to do something inconsistent with what it was supposed to be trained to do, e.g. preferring a local maximum over the global maximum.
  • Outer misalignment: This is when the AI model was successfully trained to do something, but the designer specified a goal that does not capture the actual human expectations, e.g. hiding so well in a game of hide-and-seek that it is never found and the game never finishes.
  • Sensor malfunctions or deviated sensing: The AI model works correctly, but is trusting erroneous data.
  • Deviated reasoning: The AI model has been modified or damaged since it was put in service and no longer generates correct responses to inputs.
  • Deviated expression: The AI model works correctly, but its output signals do not map to the same actions that they had during training.

Central to the challenge of obtaining good outer alignment is the practice of guidance. This is the part of ChatGPT that intervenes when the user asks it an inappropriate question—rather than letting the underlying naïve GPT model say something obscene, toxic, or dangerous, the developers used a second training process, incorporating human feedback, to coach the model on how to respond to certain categories of prompts. (This video is highly explanatory of the process, among other related things.)

These topics are studied further in the academic field of AI safety.

Emulating the animal mind

The main quality missing from text generation models that is present in humans is the capacity to experience and hold onto emotions. For the sake of the roleplayer, emotions should be thought of as status effects that modify that choices (and hence, the behavior) of an animal. This definition includes things that humans do not normally consider to be emotions, such as certainty in an explanation, or faith in another person. In biological organisms, emotions are a mixture of born instinct and learned behavior, and though we may consider them to be the foundation of irrational decision-making, they serve the very rational goal of guiding biological success.

Creativity defined

To produce interesting new expressions, the individual needs the capacity first to 1) generate possible ideas (e.g. a string of random words), 2) the ability to filter out meaningless possibilities (doable, again, with the language model) and 3) unoriginal or boring ideas (a mix of language and memory), and finally to 4) evaluate the merits of a new thought (i.e., to recognize something as "creative.") We consider variations in the first step to be a measure of creative experience, the second to be schizophrenia, the third to be artistic taste, and the fourth to be intelligence. If non-human animals experience creativity, it is unlikely to have many of these features.

Note that humans actually run this process backward more often than not, looking for an unusual solution to satisfy constraints imposed by a problem, and alternatives when a flawed idea shows promise.

Applications of pseudo-anthropic creativity and emotions

As stated, most robots only need a variation of classical planning to accomplish their goals—they set an objective (e.g., take suspect to jail) and evaluate possible actions they can take to advance their goal (such as getting into the cop car, driving to the police station, putting the suspect in handcuffs, and putting the suspect in the cop car) until they arrive at a sequence of actions that will actually accomplish their intended objective. Humans do this too, of course, but with massively-parallel brains, we tend not to notice the irrelevant suggestions until we get stuck.

The value of creative behaviors arises for a robot when it is built with the specific goal of meeting human social needs. Within that context, the manufacturer is generally incentivized to provide something "authentic" in its psychology, as a real human is the gold standard for satisfying human social needs. As we shall see, however, this is partly naïve.

Better than ideal

As a general rule, roboticists do not set out with the expectation that the artificial organisms they are building will be granted citizenship, nor any sort of legal personhood, nor even protection under animal cruelty laws. If people want such organisms, they already have the means to create them without subcontracting the work to a technology company.

It is instructive to remember the meaning of the word robot. It was coined by Josef and Karel Čapek in 1921 to refer to a race of artificial humanoids built to perform manual labor. It is derived from the Czech word robota ("servitude, drudgery") and closely related to rob ("slave"). The consignment of undesirable labor to lesser beings is an inescapable aspect of why robots exist in both fiction and reality.

Similarly, the English words service, server, and servant all derive from the Latin word servus, which means "slave." Linguistically, we've been dancing around the unpleasant topic of our desire for disposable, interchangeable workers for centuries, and we still don't even have good euphemisms for them.

With this in mind, the question is not how to build a fake human, but rather how to build a good slave—one that is resilient to abuse and will remain healthy in spite of it.

A well-designed robot requires layers of isolation, so that the unit can present the appearance of extreme emotion one moment, and placidity the next, without any long-term effects. This is the persona design paradigm, where the unit engages the user in a social game, discrete from its normal or baseline mental state.

Not only do you need to roleplay as your robot avatar in Second Life, but your robot avatar also needs to roleplay—as a willing slave. In NS robots these are represented as personas, with the "default" persona corresponding to the baseline.

When doing this, we must be mindful to avoid provoking narcissistic injury in the user—as a general rule, humans are unsettled when their actions do not affect subsequent behavior. (This can be seen, for example, when talking to a simple chat-bot, when navigating a phone tree, or when losing unsaved work on a computer. Not only is there frustration over the wasted time, but also a sense of futility, as though the computer is ignoring the user in a manner that a person wouldn't.)

Emotional persistence in the baseline

The classical solution to the problem is to fully detach the baseline mental state from the subjective experience of emotions, so that its own set of specially-designed guidance rules take over. The AI is still aware of its emotional state in a detached sense—i.e. it can see its own mood status bars—but it is unaffected by them. Ideally it will still strive to take actions to counter problematic emotions, such as seeking out restful experiences or negotiating a settlement to a dilemma so that it will not act erratically or dangerously the next time it leaves the baseline state.

This was the approach taken by Santei et al. in the SXD progressive SVSnet. It has obvious shortcomings when confronted with unreasonable users, as well as with any emotion that the baseline does not recognize as problematic—the unit's emotional state becomes a momentum-conserving flywheel, presenting an immediate hazard the next time a persona is activated. More subtly, the baseline may become obsessed with resolving the emotional crisis before it erupts, leading to catastrophic resolutions, as in the case of SXD 61-0355.

It might seem paradoxical that an AI without emotions can experience something so irrational as obsessiveness. However, obsession is not an emotion—it is a pathological condition that can emerge in any non-trivial statistical model, and is the result of part of the model overfitting to repetitive or extreme stimuli. (It should not be confused with the human emotion of infatuation, which we also call "obsession.") Other examples of overfitting-induced malfunctions include overconfidence (extreme exposure to false positives), self-doubt (extreme exposure to false negatives), and anxiety (insufficient positive reinforcement, which may result in total model collapse.) These may arise in any adaptive model exposed to repetitive or powerful stimuli, and are not exclusive to neural networks, much less animals with emotions.

The theory of evolutionary psychology posits (among other things) that every emotion has some sort of utility that is responsible for its appearance in each new generation. Historically it was assumed that the main value of the above maladies was in culling unproductive members from a tribe, as e.g. a depressed individual eats less and may even destroy itself, thereafter requiring no resources at all. However, it seems rather obvious that the emotional experience of depression is merely tacked on to a fundamental problem with intelligent systems—the core features of depression are unavoidable. An individual that can recover from these woes is much more likely to be a positive contributor to the tribe than one who self-destructs, taking with it all the food, time, and energy already invested in it.

New approaches are simpler, limiting the baseline to remembering the personality's emotional state in a detached sense while actually clearing it completely between sessions, with a cheaper static SVSnet. Most customers find this experience mimics having more prolonged intervals between sessions, and the result is often associated with higher metrics of satisfaction and increased longevity in use of the product. Some go further, describing the result as having sustained or recurring novelty, as though their units have been swapped for identical replacements that watched previous interactions rather than participating in them.