This article is part 2 of a series on how Memairy uses artificial intelligence to create the best online journal app experience. We recommend you first read What is Artificial Intelligence? And what does AI have to do with an online diary? (Memairy AI Part 1)
In this post, we will explore how Natural Language Processing (NLP) is implemented by Memairy to create a superior and more fulfilling journaling experience. NLP is used in the following 3 major ways, 1. Sentiment Analysis 2. Named Entity Recognition (NER) 3. Topic Modeling Let’s go into sentiment analysis and briefly describe how it can enhance your journal entries.
Sentiment analysis is a technique used to interpret and classify text by tagging with a certain sentiment, usually positive, negative, or neutral. Sentiments refer to the emotions that the author wishes to express in a particular portion of writing. Sentiments are inherently subjective. Thus, they may depend on the cultural background, values, and beliefs of a person. You and I may not always agree on the sentiment of the same text. Nonetheless, most people will generally consider emotions like happiness, excitement, and amusement as positive and emotions like sadness, fear, and anger as negative.
Humans have evolved to be very adept at inferring the emotional state of others. This ability must have been critical for survival. As a species, we are relatively weak physically compared to other similarly sized animals. We lack formidable claws, fangs, strength, or speed. Nevertheless, we have been highly successful in large parts due to our intelligence, ability to cooperate, and, just as important, the ability to assess the willingness of others to cooperate with us. The price of failing to understand the emotions of others could mean death.
Today, sentiment analysis is as important than ever however on a much greater scale than the interactions of small groups from our prehistory. There are many uses for sentiment now possible due to the vast amounts of text data available from online reviews to social media posts. It is often used by businesses to better understand their customer feelings from product reviews, survey reviews, and social media. Sentiment Analysis enables businesses to learn what their customers do and do not like, so they can adapt to better meet their customers’ needs.
Memairy uses this technique to determine the sentiment of each diary entry to provide an assessment of the user’s feelings. By adding a more complete context to the entry, a user can better understand and process their emotions. The insights of Sentiment Analysis arm the user with a powerful tool for introspection and self-reflection.
Humans may instinctively understand language, but only computers have the processing power to analyze millions of pieces of text quickly. For computers to do automated sentiment analysis well, they must be trained to comprehend how language can express human feeling.
For computers to perform the complicated task of understanding sentiment, as with many similar artificial intelligence goals, a model is required. Sentiment analysis is a classification model that determines the degree to which text expressed a sentiment. Model in this context means a set of mathematical operations that take inputs, often called ‘features’ in the jargon of data science, and transforms these inputs into a prediction.
In a greatly simplified procedure to build a model, data is first collected and tagged with the outcomes that we want the machine to learn to predict. You may have heard this commonly referred to as ‘machine learning’, a branch of artificial intelligence.
The next step is to extract features from the data. These features are the inputs that are used by machine learning algorithms to identify patterns that are key in determining the output.
Then, there are many sophisticated techniques used to determine the mathematical operations needed for the model to best turn the inputs into the desired output; in the case of sentiment analysis, the output is the sentiment of text.
Now equipped with a sentiment analysis model, we can provide the model with an arbitrary new text and get, as an output, a prediction of the sentiment of that text. The results of sentiment analysis typically have two outputs:
Sentiment | Sample Values |
---|---|
Clearly Positive | “score”: 0.8, “magnitude”: 5.0 |
Moderately Positive | “score”: 0.2, “magnitude”: 6.0 |
Neutral | “score”: 0.0, “magnitude”: 0.0 |
Mixed | “score”: 0.0, “magnitude”: 8.0 |
Moderately Negative | “score”: -0.3, “magnitude”: 7.0 |
Clearly Negative | “score”: -0.7, “magnitude”: 4.0 |
“Neutral” documents have little emotional content with score and magnitude both around 0.0. Whereas “mixed” documents have both positive and negative emotions that cancel each other out resulting in a score around 0.0 but a higher magnitude.
To make this more concrete we analyzed the sentiment of each diary entry in Anne Frank’s diary. The diary entry with the greatest negative sentiment (-0.5 score 3.50 magnitude), MONDAY, JULY 19, 1943
Dearest Kitty, North Amsterdam was very heavily bombed on Sunday. There was apparently a great deal of destruction. Entire streets are in ruins, and it will take a while for them to dig out all the bodies. So far there have been two hundred dead and countless wounded; the hospitals are bursting at the seams. We've been told of children searching forlornly in the smoldering ruins for their dead parents. It still makes me shiver to think of the dull, distant drone that signified the approaching destruction.On the opposite end of the spectrum, the diary entry with the greatest positive sentiment (0.80 score 3.50 magnitude), FRIDAY JUNE 12, 1942
I hope I will be able to confide everything to you, as I have never been able to confide in anyone, and I hope you will be a great source of comfort and support. COMMENT ADDED BY ANNE ON SEPTEMBER 28, 1942: So far you truly have been a great source of comfort to me, and so has Kitty, whom I now write to regularly. This way of keeping a diary is much nicer, and now I can hardly wait for those moments when I'm able to write in you. Oh, I'm so glad I brought you along!Explore the sentiment analysis for all her diary entries.
Sentiment Analysis is one of the more challenging tasks in Natural Language Processing. Even us humans sometimes misinterpret someone’s email or text. I’m sure that this has happened to all of us – an email misread as rude by a coworker or a text not understood correctly by a friend. Combine this with how many social media messages mix text with emojis, and, suddenly, the interpretation of sentiment can become even more complicated.
Let’s take a closer look at some of the major challenges a machine faces in understanding text and how sentiment analysis overcomes those challenges. EmojisTo state the obvious, emojis are those pictorial representations of facial expressions, objects, and locations, among others. The use of emojis can help to communicate emotions reducing the chance that they will be misinterpreted.
One can image the difficulties that a computer faces when trying to understand such nuance in text. The cliché is true, “A picture is worth a thousand words”. The most sophisticated sentiment analysis incorporates the dense emotional information contained in emojis.
Fact or OpinionIn all classification problems, including sentiment analysis, defining the categories is a critical component. We need to tag what we want to model in the training which demands that we have a good definition of the categories. The accuracy of the model will only be as accurate as we are able to tag the training data. In the case for sentiment analysis, we want to identify text as neutral, positive, or negative. So, how do we define what is neutral?
Sentences can be divided into either fact or opinion. Fact sentences do not contain emotional sentiments whereas opinion sentences do. For example, consider the following two sentences:
Now that we have a definition of what neutral sentiment is, we need to define positive and negative sentiment. This is primarily done using a collection of special dictionaries. These dictionaries are compilations of scores manually labelled by persons to rank words such as ‘good’ and ‘great’, giving ‘great’ a higher positive sentiment. Sentiment analysis looks up words and phrases in these dictionaries to assign the degree to which they are positive or negative.
ContextThe context of sentence, the order of the phrases, and the surrounding words and sentences, are critical for understanding the meaning. These two sentences have very different meanings.
A major component of this is the identification of entities in the text, such as ‘dog’ and ‘ball’ in the simple sentences above. The natural language processing technique, Named Entity Recognition (NER), is critical however that is a topic for another post.