musings
sentiment analysis
machine learning
data visualization
cloud computing
data science

This Analysis on the Emergence of Clickbait Will Blow You Away

15 mins

Companies like Facebook and Google aren’t merely providing a free online service - they’re competing for your attention because they need it to thrive in today’s economy. Your interactions on these platforms generate invaluable data without which the machine intelligence that makes them so intuitive and personalized would fail to exist. By making their services free to maximize their user base, many of these companies adopt business models that are dependent on the usage of their services, such as data collection and advertising. This is where our attention economy comes in, a theory widely discussed in the fields of psychology, advertising, and economics.

Economist and Nobel Prize winner Herbert Simon was perhaps the first to discuss this concept when he wrote,

In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients.

In other words, as online content becomes increasingly abundant and accessible, our attention becomes the limiting factor to which content is consumed and the businesses that understand this are the ones that end up winning. So to compete in today’s complex and dynamic online landscape, it is vital for companies to capitalize on their consumers’ attention economy.

Read more

Visualizing My Music Taste using Machine Learning and Sentiment Analysis

16 mins
Peter Margaritoff's Gradients: Created with the average color of album covers for the top 2,000 songs on Spotify

Intro • The xx

Music is a complicated thing. No one fully understands why we’re drawn towards a certain song, or so frustrated by another. In the recent paper Musical Preferences are Linked to Cognitive Styles by Greenberg et al., studies showed that cognitive style, or how individuals process information, influences their preference of music. The first study showed the link between empathy levels and the genres of choice. The second studied the effect of E-S cognitive styles (based on the Empathizing-Systemizing theory) on musical preference. Subjects with a bias towards empathizing preferred music with low arousal (gentle or warm), negative valence (sad or depressing) and emotional depth (poetic or thoughtful). On the other hand, those with a bias for systemizing showed preference for music with high arousal (strong or thrilling), positive valence (lively) and cerebral depth (complex).

This connection between cognitive processing and musical preference is very interesting to me. Having never truly understood my affinity to certain genres of music, I was inspired to attempt to quantify my music taste using data, to really understand the similarities and differences across the music that I listen to. I was also excited to apply concepts from machine learning such as clustering and unsupervised learning, and from natural language processing such as sentiment analysis, to this project and understand what insights this might produce. I’ve also wanted to explore Spotify’s API for some time now, and this proved to be the perfect opportunity to do so.

And yes, each title in this blog post will be a song from my dataset.

Read more

Exploring Survival on the Titanic with Machine Learning

12 mins

In the early morning of 15 April 1912, a British passenger liner sank in the North Atlantic Ocean after colliding with an iceberg. More than 1,500 passengers died in the sinking, making it one of the deadliest maritime disasters. Since then, the Titanic has become one of the most famous ships in history, her memory kept alive in various forms of pop culture, museums, books and films.


We can use machine learning to explore some interesting questions. How much of role did a passenger’s socio-economic status play on their chance of survival? Did their name or age make a difference? What about siblings, parents or children? Is one of these factors more significant than the rest? Using decision trees and a random forest model, we can analyze the passenger data from the ship, answer some of these interesting questions and create a classifier that can predict if a passenger survived the tragedy.

Read more