Can we predict the emergence of mental illness or
dementia from oral or written language samples?
How can we use these samples to study the development of concepts
throughout history?
Dr. Mariano Sigman attempts to answer
these questions in this fascinating TED talk, which was brought to my attention
by Michelle Lisses Topaz.
Here are excerpts from the transcript:
Can the
theory that introspection built up in human history only about
3,000 years ago be examined in a quantitative and objective manner?
The space of words (or Latent Semantic Analysis -LSA) is a computer
simulation that contains all words in such a way that the distance between any
two of them is indicative of how closely related they are. So for
instance, the words "dog" and "cat" are
very close together, but the words "grapefruit" and
"logarithm" are very far away. And this has to be true for any two words
within the space.
When two words are related, they tend to appear in the same sentences, in the same paragraphs, in the same documents, more often than would be expected just by pure
chance. This simple method, with some computational tricks that have to do with the fact that this is a very complex and
high-dimensional space, turns out to be quite effective.
Words
automatically organize into semantic neighborhoods. So you get the fruits, the body parts, the computer parts, the scientific terms and
so on. The algorithm also identifies
that we organize concepts in a hierarchy.
Once we've built the space, the question of the history of introspection, or of the history of any concept which before could seem abstract and somehow
vague, becomes concrete -- becomes amenable to quantitative science.
All that we have to do is take the books, we digitize them, and we take this stream of words as a trajectory and project them into the space, and then we ask whether this trajectory spends
significant time circling closely to the concept of
introspection.
And with this, we could analyze the history of introspection in the ancient Greek tradition, for which we have the best available written
record. So what we did is we took all the books -- we just ordered them by time -- for each book we take the words and we project them to the space, and then we ask for each word how close it is
to introspection, and we just average that. And then we ask whether, as time goes on and
on, these books get closer, and closer and closer to the concept of introspection.
And this is exactly what happens in the
ancient Greek tradition. So you can see that for the oldest books in
the Homeric tradition, there is a small increase with books getting
closer to introspection. But about four centuries before Christ, this starts ramping up very rapidly to an
almost five-fold increase of books getting closer, and closer and closer to the concept of introspection.
We ran
this same analysis on the Judeo-Christian tradition, and we got virtually the same pattern.
Can the words we say today tell us
something of where our minds will be in a few days, in a few
months or a few years from now?
We can ask whether monitoring and analyzing
the words we speak, we tweet, we email, we write, can tell us ahead of time whether something
may go wrong with our minds. And with Guillermo Cecchi, who has been my brother in this adventure, we took on this task. And we did so by analyzing the recorded speech
of 34 young people who were at a high risk of developing
schizophrenia.
And so we measured speech at day one, and then we asked whether the properties of
the speech could predict, within a window of almost three years, the future development of psychosis. But despite our hopes, we got failure after failure. There was just not enough information in
semantics to predict the future organization of the
mind. It was good enough to distinguish between a group of
schizophrenics and a control group, a bit like we had done for the ancient texts, but not to predict the future onset of
psychosis.
But then we realized that maybe the most important thing was not so
much what they were saying, but how they were saying it. More specifically, it was not in which semantic neighborhoods the
words were, but how far and fast they jumped from one semantic neighborhood to the other
one. And so we came up with this measure, which we termed semantic coherence, which essentially measures the persistence of
speech within one semantic topic, within one semantic category.
And it
turned out to be that for this group of 34 people, the
algorithm based on semantic coherence could predict, with 100
percent accuracy, who developed psychosis and who will not. And this
was something that could not be achieved -- not even
close -- with all the other existing clinical measures.
We may be
seeing in the future a very different form of mental health, based on
objective, quantitative and automated analysis of the
words we write, of the words we say.
Want to read more?
Bedi, G., Carrillo, F., Cecchi, G. A., Slezak,
D. F., Sigman, M., Mota, N. B., ... & Corcoran, C. M. (2015). Automated analysis of
free speech predicts psychosis onset in high-risk youths. npj
Schizophrenia, 1. https://neuro.org.ar/sites/neuro.org.ar/files/Automated%20analysis%20of%20free%20speech%20predicts%20psychosis%20onset%20in.pdf