Earlier this week, I went over an article of Chris Crawford’s current project in videos games. Like everybody, I’ve heard of Chris mostly because of his famous 1992 CGDC lecture in which he announced his exit from the mainstream game industry by charging away branding a sword. But I never wonder to what new horizons such a singular figure would ride away.
It turns out Chris Still love videos games, and his current project is called Siboot. Apparently bored of Hollywood-styled productions, he went for building interactive storytelling softwares. Not the standard visual novel where the player follows a manually made graph story structure, like a labyrinth. He intends to do an actually simulate entire characters with which you can interact in several non-predetermined ways. Needless to say, the AI has not been the strongest selling points of many games of today. The explosion of the brainless zombie genre is telling.
Of course, doing a whole novel this dialogue with AI is the core of the game way raises many questions. How to combine non-determinism with standard story-telling practices?¹ How to interface the novel with the player? How to simulate characters behaviors?
I usually consider four main phases in artificial intelligence: the perception phase, the processing, the decision phase and the action phase. At Data-Essential, we do machine learning to discern patterns into vast amounts of data, which corresponds to the signal processing phase.
So the question that immediately occurred to me: how can human intents be captured by a computer? This question has part of its answer in the sensing phase, and how accurately you can capture humans communication signals. Choose a poor interface, and you may get too skewed data from the human communicant. Or you may miss valuable information, like voice tone or body language². If you look at the description of SWAT, the storytelling engine of Chris Crawford, you notice the input is actually more rigid than letting you entering any text. This reduces the panel of possible messages a human can conduct, but I shall explain, it eases the processing of messages by the machine.
Nevertheless the trickiest part is to detect those patterns and those associated meanings. Nowadays, most Natural Language Processing (NLP) is based on machine learning, using algorithms akin to the algorithms in big data. A big difference between more common machine learning and NLP is involved to finding how the syntax was used. That is, to what word relates each word, and how?
But with his rigid script Chris Crawford took that difficulty away! He forces the user to input its sentences in a custom language, made of English words and a custom tree-like syntax. Thus the relations between words are provided by the user himself/herself.
Now we wish to know what typical reaction a character do in response to a particular line in a particular situation. It’s actually a typical big data question. We can begin by looking at at many example of sentences, situations and replies made, make an initial model. Now we can predict what a character should do when receiving a sentence in some context, and feed the data back to a new, improved model of behavior.
However a data point correspond not to a vector of feature values, but to a tree of words. This bring its difficulties, as a tree might have different size and depth. This is not impossible to deal with, the important rule here is to define how similar two input trees, or sentences, are. This is measured with a metric i.e. a mathematical function that computes the “distance” between sentences.
So what metric to use? By the nature of the restricted input, sentences can be represented as rooted trees. A very common metric for such structures is the edit distance, which represents how much words must be removed, added or moved to a parent/child node to transforms one tree to another. It’s fine some situation, but it put too much emphasis on the structure. In particular, some words have radically different meanings, while other are synonyms. But the edit distance wouldn’t make the difference and treat each change of word with the same distance.
A workaround would be to use another metric on words, alone, that quantifies how much synonyms two words are. Then the edit distance would also consider replacement of words by another, weighted by the origin and the target. Oh, and the more deep in the tree a word is, the less the weight of the changing of a word into another. This is to ensure that obscure severally layered complements of complement don’t affect the end meaning too much.
What’s great with this, is that the single-words-only metric can be learned with the usual regression techniques, too. Now a regular edit distance can be used as a metric on words to compute that new metric, and many training examples of synonyms and non-synonyms can be given up (i.e. a thesaurus).
I’ll admit it sounds a bit easier than it actually is. There are many research on NLP for years. Many, many more tweaks should be involved. But what I find amazing with this example is that a small restriction in the user input would allow to reduce, at least conceptually, fairly easily to common machine learning techniques.
In the end Chris Crawford was pretty obscure on what tools/algorithms he uses to simulate his character engine. But he definitively has the means to do something awesome. Maybe interactive novels with a computer-generated story elements may be soon a reality.
The project is intended to be eventually open-sourced. If Chris manages to come the end of it, (despite the poor attention his patreon page received) count me on to check how he implemented his characters’ AI
¹A similar problem is heavily worked on by Hello Games, the studio behind No man’s sky. How do you make procedurally generated visual consistent as much as possible with the rules of aesthetics~? There is also an interesting lecture made by their lead artist at the GDC 2015.
²Humans can as well be fooled by missing body language or voice tone. Smileys in text-only but rapid communications is a way to circumvent this shortcoming.