Artificial Intelligence: Extracting Qualifying Adjectives from Comments on Donald Trump Facebook Posts using NLP

Cristóbal V
8 min readJan 13, 2021

A Simple Application of Natural Language Processing on Comments from Facebook Posts.

Photo by Markus Spiske on Unsplash

Every morning, Artificial Intelligence is being implemented exponentially in our daily lives. The algorithms of A. I are giving human capabilities to our technological devices and that this might be sound a little scary, but the advantages of all these new concepts are improving the knowledge of the word and the comprehension of many things that a few years ago would have been impossible to understand for humans.

There is an amazing subfield of many differents but connected areas, called Natural Language Processing (NLP), cited by Wikipedia; “ NLP is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language. The result is a computer capable of “understanding” the contents of documents, including the contextual nuances of the language within them”.

In my case, NLP is one of those areas of knowledge that finished it being one of my passions. I love to learn new algorithms and techniques to research the Human Language using a technological and kind of “futuristic” point of view to this field.

That’s why I decided to write this article and sharing my knowledge in a simple way of how to extract relevant information using NLP techniques on Facebook Post’s Comments.

To follow the example of this article you need to have the following requirements:

Requirements:

you can find all the code on my Github.

1. Data Extraction

The first thing I need to obtain is the data that I want to analyze. For the sake of simplicity, I’ll use a tool called Facepager, which helps to extract data of Pages using the API of Facebook. I will extract the last 3 months of the posts and comments by users on all these posts shared by Donald Trump on his Facebook Page. I won’t explain how to use Facepager because is out of the scope of this article, but if you want to learn how to use it, I recommend you to watch the following tutorials on Youtube:

  1. Tutorial: Extracting Facebook Page Posts and Photos.
  2. Tutorial: Extracting Posts by Time Period, Comments, and Replies to Comments.
  3. Link to download Facepager.

Considering the big amount of comments that Donald Trump has on his Facebook Posts, I decided to download the latest one that he shared. So using Facepager I could download 5975 comments from this post (click here to watch the post):

Target Post of Donald Trump to extract Comments (Photo by Author)

The data extracted have the following features:

  • The message of Comments.
  • Tags of Comments.
  • Created Date of Comments.
  • Type of Attachment on Comments.
  • Likes on Comments.
Data Extracted using Facepager (Photo by Author)

2. Analyzing the Data

Before start using NLP on the data, I analyzed some information to answer some interesting questions about this data. Some questions asked were:

  • Which are the top Tags, and Attachment Type used on Comments?
  • Which is the most liked comment?

To answer these questions I just created a simple function on Python that counts the frequency of any specific value and returns the 10 most commons. The parameters of this function are the data frame, name of the column to analyze, and several top values. the function returns 2 lists with the top values and its number of frequency:

def most_commons(data,column,n_top):  data = data[data[column].notnull()]
values = data[column].tolist()

top_values = list(dict(Counter(values).most_common(n_top)).keys())
top_values = [value for value in top_values if value != 'nan']
n_values=dict(Counter(values).most_common(n_top)).values()
n_values = list(n_values)
return top_values,n_values"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""most 10 common tags"""
tag_vals, tag_freq = most_commons(data,'tags',10)
"""most 10 commennt attachment type"""
at_val, at_freq = most_commons(data,'comment_attachment_type',10)

Plotting the results we have the following visualizations:

10 Most Common Tags (Photo by Author)
10 Most Common Attachment Type (Photo by Author)

Finally a simple manipulation of data using pandas to filter the comment with more likes.

most_liked_comment = data[data['comment_likes'].isin([data['comment_likes'].max()])]['message'].valuesprint(str(most_liked_comment))

You are the worst President in the history of America, Mr. Donald J. Trump …… You are proving to be a worse dictator than the dictators from elsewhere around the world that America has been sanctioning!!! But protests or not, on January 21st this month, you will be Ex-President, and Joe Biden will be the President of the United States of America!!!! Period!!!’”

3. Natural Language Processing

So finally we will apply some knowledge of NLP to keep researching more about comments on Donald Trump’s Facebook!. So let’s go ahead!

First of all, when using NLP it’s necessary to normalize the text on comments, this means erasing punctuations, symbols, and words that determiner words of English Language better known as StopWords. A clear definition of determiners cited by Wikipedia: “a word, phrase, or affix that occurs together with a noun or noun phrase and serves to express the reference of that noun or a noun phrase in the context (e.g. the, an this, those, at, your, his)”

The are several methods to do this but I will use some tools available at Spacy Library. If you want to read more about this I highly recommend these links:

For the scope of this article, I just will normalize, Tokenize, and remove Stopwords of comments.

I defined a Function called normalize text that is capable to do these tasks.

def normalize_comments(data):clean_comments = []
stopwords = nlp.Defaults.stop_words
comments = data['message'].tolist()
for comment in comments:
token_words = comment.split()
clean_comments.append(" ".join(unidecode(word).lower() for word in token_words if word not in stopwords and word not in punctuation))

data['clean_message'] = clean_comments
data['clean_message'] = data['clean_message'].str.replace(r"[^\w]"," ")
data = data[['message','clean_message','tags','created_time','comment_attachment_type','comment_likes']]return data
Extract of DataFrame with columns message and clean_message (Photo by Author)

If you noticed the column called clean_message has the text of comments in lowercase and with all these stickers, punctuation, and stopwords removed.

So now I can respond to a new question. Which are the most common words used in these comments?. Similarly, in the function explained above I defined a new one to count the most common words.

def frecuency_words(data):
total_words = []
results = defaultdict(list)
messages = data['clean_message'].unique().tolist()
for message in messages:
message = message.split()
for m in message:
if len(m)>1:
total_words.append(m)
else:
pass
words = Counter(total_words)
values = list(words.values())
words = list(words.keys())
results['word'] = words
results['frequency'] = values
df = pd.DataFrame(results) return df

I noticed most of the top words are related to a kind of patriotic words but there are others related to complex concepts like Antifa and violence. It’s Interesting to look at the word People in the first place!!

Top 20 most common words on Comments (Photo by Author)

Finally, I wanted to find the Qualifying Adjectives closer to the words “ Donald Trump” to research if which are the most common adjectives that people use to describe Donald Trump.

To do this, Spacy has a trained model that can analyze where is the tag of the words and their structure given a text. If you want to read more information about this you can read this article (The concept is called Pos Tagging).

So I defined 2 functions, one to search for all the messages that have the target words, and then the second function search closer adjectives to the position of the target text in the comment structure.

def find_message_by_word(data,words):
messages = data['message'].str.lower().tolist()
comments = list()
for message in messages:
if any(unidecode(word) in message for word in words):
comments.append(message)
return comments

For the second function, I had to create a text file and write the most common qualifying adjectives so in a simple explanation, the function search for adjectives using the model trained on Spacy library and then try to match the results with the words on the txt file base just in qualifying adjectives. The result is a data frame with the most common adjectives that people use to describe Donald Trump and their frequency.

def find_adjs_by_comment(comments,pattern):words = []
for comment in comments:
doc = nlp(comment)
for i in range(0,len(doc)):
if any(word in doc[i].text for word in pattern):
try:
for x in range(i-10,len(doc)):
if doc[x].pos_ == "ADJ":
words.append(doc[x].text)
except:
for x in range(i,len(doc)):
if doc[x].pos_ == "ADJ":
words.append(doc[x].text)
df = pd.DataFrame({'words':list(Counter(words).keys()),'freq':list(Counter(words).values())}).sort_values(['freq'],ascending=False)
with open('adjectives.txt','r') as file:
adjs = file.readlines()
filters = [adj.replace("\n","") for adj in adjs]
top_adjs = df[df.words.isin(filters)].reset_index(drop=True)
return top_adjs
Barplot of Most Common Adjectives used to describe Donald Trump on Facebook (photo by Author)
Table of Frequency Adjectives

In total, I could find 68 different Qualifying Adjectives to describe Donald Trump by Facebook User’s comments on the latest Trump Facebook Post (6 January 2020). You can watch the full graph and table on my Github Repository.

4. Conclusion

Artificial Intelligence applied to text analysis can retrieve information that can be useful for many different purposes. For this case, I just wanted to do a simple analysis to applied NLP on Facebook Data and show it to all the future JEDIS of Data Science and Artificial Intelligence how to use this technique to have fun and discovering new analysis to do with public information on the web.

Other of My Articles !

--

--