HOW TO BECOME YOUTUBE FAMOUS


using word clouds to visualize how key words affect video popularity

Does clickbait work?

I first started this project in order to see if there was any correlation between key words in video titles and video popularity. As I dug further into the data, the project became more of an exploratory analysis to extract the most statistically interesting results. The following is a series of word cloud visualizations that showcases such iterative process.

The data comes from Mitchell J's Trending YouTube Video Statistics datasets which is a collection of (up to) 200 listed trending Youtube videos every day in the United States, United Kingdom, Canada, Germany, and France. The datasets are updated every few days and includes information such as video titles, channel name, number of views, number of comments, number of likes and dislikes, and etc.

For the sake of simplifying the data, I chose to only look at the US dataset, and the data only includes videos featured from November 14, 2017 to December 12, 2017.

What is popular?

Lots of factors can contribute to the success or popularity of a Youtube video: number of views, level of engagement, ratio of likes to dislikes, ratio of views to subscribers, etc.

Lets start by finding out which videos are grabbing most people's attention by looking at the number of views as the determinant for popularity.

>

titles with < average # of views
titles with > titles average # of views

What did you notice?

The pair of word clouds generated before show the most frequently used words in video titles with below average number of views (left) and with above average number of views (right).

Some words appear in both clouds and some appear multiple times in one cloud. For example, the word "official" appears many times in the right cloud and also shows up in both clouds. This suggests that for this particular dataset, such "repeat" words tend to appear very frequently in video titles, and thus are not particularly meaningful. It also makes sense that words like "Christmas" and "iPhone" (referring to the release of the iPhone X) are popular during this period of time. Overall, there are no striking differences between the two word clouds, so the results are not particularly interesting. In the next set of word clouds, lets filter out non-meaningful words like "official" and "christmas", which we will call our "repeat-offenders."

titles with < average # of views (filtered)
titles with > average # of views (filtered)

Doing better: tighter bounds

It seems like using the average as the lower-upper bounds threshold isn't producing very interesting results. Lets tighten the bounds by discarding all video titles in the 2nd quartile. Remember, we only care about the best and the worst.

The following set of word clouds show only the words in the bottom 25th and top 25th percentile of # of views.

words in the bottom 25th percentile of # of views (filtered)
words in the top 25th percentile of # of views (filtered)

What are people talking about?💬

Another determinant for popularity is how high engagement is. In other words, what is controversial? What are most people talking about?

In the following set of word clouds, we will strictly look at the ratio of comments to views.

>

titles with < average comments to views ratio
titles with > average comments to views ratio

What did you notice?

The pair of word clouds generated before show the most frequently used words in video titles with below average comments to views ratio (left) and with above average comments to views ratio (right).

Hm, interesting. Off the bat, the most striking difference is that there seems to be a greater variety of words in this set of word clouds compared to the word clouds above. If you didn't notice, feel free to scroll to the second set of word clouds in the slider above and compare!

What could be the cause of this dramatic difference? Turns out that basing our word clouds strictly on the ratio of comments-to-views is not entirely accurate. Videos with fewer views could have very high comments-to-views ratios thus earning them a higher rank, while videos with more views could have lower comments-to-views ratios, thus earning them a lower rank. So while the previous set of word clouds produced interesting results, they are not entirely accurate.

If we want to take into consideration both a video's number of views as well as its number of comments, we can use a scalar. The following set of word clouds were generated using a 50-50 scalar, where both number of views and number of comments were weighted equally.

titles with < average # of comments + views
titles with > average # of comments + views

What about now?

While there may not be any striking differences between these two clouds, there is something a bit more interesting to glean from by comparing these clouds with the second set of clouds that were generated above.

For example, if we compare only the right clouds (> than average), we can see video titles containing sports related words such as "NFL", "Week", "Game", and "Highlights" attracted more comments than titles containing the word "iPhone". On the same note words that appear larger in both right clouds suggest those video titles not only attracted many views but also had higher levels of engagement.

Doing better: tighter bounds

So far, the last two sets of word clouds have proven to be more promising. But, lets try to do better by tightening the bounds by discarding all video titles in the 2nd quartile. Remember, we only care about the best and the worst.

The following set of word clouds show only the words in the bottom 25th and top 25th percentile of # of comments + views.

words in the bottom 25th percentile of # of comments + views
words in the top 25th percentile of # of comments + views

What do people like?👍

Finally, lets look at the last determinant for popularity -- likes! A good indicator of a popular video is whether people find your videos enjoyable or informative. In the following set of word clouds, we will strictly look at the ratio of likes to views.

>

titles with < average # likes to views ratio
titles with > average # likes to views ratio

What did you notice?

The pair of word clouds generated before show the most frequently used words in video titles with below average likes to views ratio (left) and with above average likes to views ratio (right).

Once again, basing our word clouds strictly on a ratio produces very interesting results. But, we can do better using a scalar.

Lets take into consideration both a video's number of views as well as its number of likes. The following set of word clouds were generated using a 50-50 scalar, where both number of views and number of likes were weighted equally.

titles with < average # of likes + views
titles with > average # of likes + views

Doing better: tighter bounds

Once again, lets tighten the bounds by discarding all video titles in the 2nd quartile. Remember, we only care about the best and the worst.

The following set of word clouds show only the words in the bottom and top 25th percentile of # of likes + views.

words in the bottom 25th percentile of # of likes + views
words in the top 25th percentile of # of likes + views