For each type of gameplay, I will describe a set of technologies that would support the appropriate gameplay. Although some details of the specific games in each genre are available in the public domain, the limited number of titles means it is difficult to be general about what works and what doesn’t. Throughout this chapter I’ll try to indicate alternatives.
13.1 TEACHING CHARACTERS
feedback. The observational learning mechanism watches the actions of other char-acters and the player and tries to replicate them. When it replicates the action, the player can give positive or negative feedback (slaps and tickles usually) to encourage or discourage the same action from being carried out again.
13.1.1 REPRESENTING ACTIONS
When a character does something, an action structure can be created to represent it. The action structure consists of the type of action and details of things in the game to act as the object and indirect object, if required.
13.1 Teaching Characters 819
The context information that is presented is typically fairly narrow. Large amounts of context information can improve performance, but they dramatically reduce the speed of learning. Since the player is responsible for teaching the char-acter, the player wants to see some obvious improvement in a relatively short space of time. This means that learning needs to be as fast as possible without leading to stupid behavior.
A variety of learning mechanisms are possible for the character. To date, the majority of titles have relied on neural networks; from this book, reinforcement learning would also be a sensible technique to try.
For a neural network learning algorithm, there is a blend of two types of super-vision: strong supervision from observation and weak supervision from player feed-back.
Independent of learning, the network can be used to make decisions for the char-acter by giving the current context as an input and reading the action from the output.
Inevitably, most output actions will be illegal (there may be no such action possi-ble at that time or no such object or indirect object available), but those that are legal
To learn by observation, the character records the actions of other characters or the player. As long as these actions are within its vision, it uses them to learn.
First, the character needs to find a representation for the action it has seen and a representation for the current context. It can then train the neural network with this input–output pattern, either once or repeatedly until the network learns the correct output for the input.
a hungry character eating, then it may learn to associate eating with not being hun-gry. In other words, your own context information cannot be matched with someone else’s actions.
In games where the player does most of the teaching, this problem does not arise. Typically, the player is trying to show the character what to do next. The character’s context information can be used.
When a feedback event arrives from the player (a slap or tickle, for example), there is no way to know exactly which action the player was pleased or angry about. This is the classic “credit assignment problem” in AI: in a series of actions, how do we tell which actions helped and which didn’t?
By keeping a list of several seconds’ worth of input–output pairs, we assume that the user’s feedback is related to a whole series of actions. When feedback arrives, the neural network is trained (using the weakly supervised method) to strengthen or weaken all the input–output pairs over that time.