Monday, May 1, 2017

Personality assessment through computerized linguistic analysis of Facebook messages


Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., ... & Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal of personality and social psychology, 108(6), 934.  http://www.peggykern.org/uploads/5/6/6/7/56678211/park_2015_-_automatic_personality_assessment_through_social_media_language.pdf

How do psychologists assess personality?  There are three main ways:  an interview (a conversation with the client), questionnaires (which the client fills out about herself and/or which people who know the client fill out about her), and projective tests (in which, for instance, the client is asked to tell stories about pictures that are presented to her.  We assume that the way the client tells the stories and the content of the stories – for example the interaction between story characters – reflect various aspects of the client's personality).  It's also possible to assess aspects of personality by simulating different social situation (this is usually done by group assessment).

Is it possible to assess personality using the "electronic signature" that every one of us leaves on the internet? Undoubtedly, messages we write in the social media reflect different aspects of our personality.  It's reasonable to assume that when reliable and valid tests that assess personality by analyzing messages in the social media will be made available – they will be used massively.  These tests will be cheap, easy and very fast.  It will be possible to have a persons' "personality profile" within a few seconds.  One may think about the use of these tests by psychologists, corporations that recruit personnel or dating sites…this raises ethical questions like assessing people's personalities without their consent or assessment that is based on things people wrote years before, without considering the future use of the contents of their writings.

In this study, Park and his colleagues (among them Prof. Martin Seligman) used language based analysis (LBA) to analyze Facebook messages. They attempted to predict people's personality traits from the language they use on Facebook.

When personality is assessed in one of the traditional ways, the client knows he is being assessed and this may influence the way he responds (because he wants to present himself in the best possible way).   As opposed to that, when a person writes on Facebook he does so in natural social situations and he tends to disclose a lot of information about himself.  The researchers assumed that social media users typically present their true selves and not just idealized versions.  When personality is assessed in one of the traditional ways, this is done at a specific point in time.  Assessment that is based on the way a person writes on Facebook takes into account his writing across years, not only at a specific point in time.

The LBA software analyzes the use people make of   single  words, nonword symbols (e.g., emoticons, punctuation), multiword phrases and clusters of semantically related words or topics.

The authors used the LBA system to construct a predictive model of personality based on a sample of more than 66 thousand Facebook users.  They tested the model with another sample of 5000 Facebook users. The participants in the research were chosen out of the people that use  myPersonality application.  This application allowed users to take a series of psychological measures and share results with friends. The myPersonality application was installed by roughly 4.5 million users between 2007 and 2012. All users agreed to the anonymous use of their survey responses for research purposes. The analytic sample was a subset of myPersonality who also allowed the application to access their status messages (i.e., brief posts on the user’s main Facebook page). Park et al. limited the analytic sample to users who wrote at least 1,000 words across their status messages, provided their gender and age, and were younger than 65 years of age. They captured every status message written by the study volunteers between January 2009 and November 2011, totaling over 15 million messages. Users wrote an average of 4,107 words across all status messages.

All participants completed measures of personality traits as defined by the NEO-PI-R five factor model/BIG5 model (Costa & McCrae, 1992): openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism.

The BIG5 model resulted from factor analyses done by Raymond Cattell and later Costa & McCrae on a very large number of personality traits.  Here are the traits' definitions (from Wikipedia):

·         Openness to experience: (inventive/curious vs. consistent/cautious). Appreciation for art, emotion, adventure, unusual ideas, curiosity, and variety of experience. Openness reflects the degree of intellectual curiosity, creativity and a preference for novelty and variety a person has.

·         Conscientiousness: (efficient/organized vs. easy-going/careless). A tendency to be organized and dependable, show self discipline,  act dutifully, aim for achievement, and prefer planned rather than spontaneous behavior.

·         Extraversion: (outgoing/energetic vs. solitary/reserved). Energy, positive emotions, surgency, assertiveness, sociability and the tendency to seek stimulation in the company of others, and talkativeness.

·         Agreeableness: (friendly/compassionate vs. analytical/detached). A tendency to be compassionate and cooperative rather than suspicious and antagonistic towards others. It is also a measure of one's trusting and helpful nature, and whether a person is generally well-tempered or not.

·         Neuroticism: (sensitive/nervous vs. secure/confident). The tendency to experience unpleasant emotions easily, such as anger, anxiety, depression, and vulnerability. Neuroticism also refers to the degree of emotional stability and impulse control and is sometimes referred to by its low pole, "emotional stability". 

Most theoreticians of personality see the traits as the bedrock of personality.  Traits like extraversion and agreeableness describe the most basic differences between people, differences that can be easily identified by human behavior across situations and time.  These traits are so basic that they stand out even in infancy.  Some babies tend to be happy and some tend to be anxious, some are curious about their surrounding and some are much more reserved. The large differences in temperament in the early months of life gradually develop into personality traits.

Now back to Park et al.s' study.

As was mentioned above, Park and his colleagues tried to predict the BIG5 traits using the language people use on facebook.  They discovered that LBA- based predictions had medium sized correlations with the results of BIG5 questionnaires. The correlations were 0.43 with openness, 0.37 with conscientiousness, 0.42 with extraversion, and 0.35 with agreeableness and neuroticism.  The overall correlation of LBA and BIG5 questionnaires was 0.38.

Predictions of the BIG5 traits using LBA were stable over a six month period. Correlations between predictions of the BIG5 using LBA that were done in six month intervals were 0.70 on average.  In comparison, test–retest correlations of BIG5 questionnaires are usually in the range of 0.65 and 0.85.  Thus the stability of prediction using LBA is similar to that of questionnaires. 

To what extent are LBA predictions of the BIG5 in line with informant reports of the BIG5?  The correlation between self reports and informant reports was 0.32.  The correlation between self reports and LBA predictions was 0.38.  Thus, LBA predictions matched self reports better than informant reports.  However, the authors note that the correlation between self reports and informant reports in this study was lower than usual.  The correlation between informant reports and LBA was 0.24.

The words, phrases and topics of the messages that had the highest correlation with each of the BIG5 traits were in line with thought, feeling and behavior patterns that are typical of each trait.  In the diagram below we can see the language features that were common to people high in extraversion compared to people low in extraversion (introverts).  Each "word cloud" contains the one hundred words and phrases that had the highest correlations with high and low extraversion.  The size of the words is proportional to the size of the correlation.  The color represents the word's frequency (the redder the word, the more frequent it is).

click on image to enlarge.





Aspects of high extraversion are evident in the left panel of Figure 3, including language reflecting positive emotion (e.g., love, :)), enthusiasm (e.g., best, stoked, pumped), and sociability (e.g., party, hanging, dinner with). On the other end, the language of low extraversion (introverts) suggested a more inward focus (e.g., i’ve, i don’t, i should), relatively greater interest in things (vs. people; e.g., computer, book, chemistry), and tentativeness (e.g., probably, suppose, apparently).

I recommend looking at the other word clouds for the rest of the traits.  This is amusing.

The authors conclude by saying that they provided evidence that the language in social media can be harnessed to create a valid and reliable measure of personality. This approach is just one example of how social media can extend assessment to many more people—quickly, cheaply, and with low participant burden. Moreover, this illustrates how computational techniques can reveal new layers of psychological richness in language. Combining these techniques with psychological theory may complement existing measures.


A hybrid approach that combines LBAs with other rich nonverbal data sources from social media (e.g., images, preferences, social network characteristics, etc.) would likely improve predictive performance. Kosinski, Stillwell, and Graepel (2013) found that Facebook users’ personality traits and other characteristics could be accurately predicted using only users’ preferences or “likes.” Even models built only on social network behavior, such as message frequency and message response time, have been useful in predicting users’ personalities (Adali & Golbeck, 2014). Provided that each source has some unique contribution to a target trait, models combining multiple sources in addition to language may provide even better assessments.

No comments:

Post a Comment