musings
sentiment analysis
machine learning
data visualization
cloud computing
data science

This Analysis on the Emergence of Clickbait Will Blow You Away

15 mins

Companies like Facebook and Google aren’t merely providing a free online service - they’re competing for your attention because they need it to thrive in today’s economy. Your interactions on these platforms generate invaluable data without which the machine intelligence that makes them so intuitive and personalized would fail to exist. By making their services free to maximize their user base, many of these companies adopt business models that are dependent on the usage of their services, such as data collection and advertising. This is where our attention economy comes in, a theory widely discussed in the fields of psychology, advertising, and economics.

Economist and Nobel Prize winner Herbert Simon was perhaps the first to discuss this concept when he wrote,

In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients.

In other words, as online content becomes increasingly abundant and accessible, our attention becomes the limiting factor to which content is consumed and the businesses that understand this are the ones that end up winning. So to compete in today’s complex and dynamic online landscape, it is vital for companies to capitalize on their consumers’ attention economy.

Read more

Cambridge Analytica and the Case for Ethical Data Mining

8 mins


It was recently made public that Cambridge Analytica, a political data firm hired by Trump’s 2016 election campaign, gained access to the private information of more than 50 million Facebook users. The data was collected through a personality survey app designed by Aleksandr Kogan, a psychology professor at Cambridge University. The app scraped private information not only from the users of the app, but even that of their Facebook friends. Using this data, the firm allegedly identified the personalities of American voters and influenced their behaviors.

Facebook, however, claims that this was not a data breach. In a press release suspending Cambridge Analytica’s presence from Facebook, VP Paul Grewal wrote

The claim that this is a data breach is completely false. Aleksandr Kogan requested and gained access to information from users who chose to sign up to his app, and everyone involved gave their consent. People knowingly provided their information, no systems were infiltrated, and no passwords or sensitive pieces of information were stolen or hacked.

This raises several questions about the ethics of data mining and our willingness to share information in an increasingly online world. What does this mean for data scientists, whose very roles revolve around the collection and analysis of data? Moving forward in the wake of such events, it is interesting to consider how the reactions of various stakeholders will influence our society and affect how we view data.

Read more

Understanding the Data Science Lifecycle

14 mins


Data science is quickly evolving to be one of the hottest fields in the technology industry. With rapid advancements in computational performance that now allow for the analysis of massive datasets, we can uncover patterns and insights about user behavior and world trends to an unprecedented extent.

With the influx of buzzwords in the field of data science and relevant fields, a common question I’ve heard from friends is “Data science sounds pretty cool - how do I get started?” And so what started out as an attempt to explain it to a friend who wanted to get started with Kaggle projects has culminated in this post. I’ll give a brief overview of the seven steps that make up a data science lifecycle - business understanding, data mining, data cleaning, data exploration, feature engineering, predictive modeling, and data visualization. For each step, I will also provide some resources that I’ve found to be useful in my experience.

As a disclaimer, there are countless interpretations to the lifecycle (and to what data science even is), and this is the understanding that I have built up through my reading and experience so far. Data science is a quickly evolving field, and its terminology is rapidly evolving with it. If there’s something that you strongly disagree with, I’d love to hear about it!

Read more

How to Build Your Development Stack in the Cloud from Scratch

11 mins


Over the past couple of weeks, a number of friends approached me to ask how I had set up my website. I told them I’d created the site using Jekyll and hosted it on Github Pages, but I had to host my web apps on Heroku because Github Pages only serves static content, and I also needed a database so I had to connect the web apps to Firebase…

And it struck me how inefficient my development stack was. I started wondering if I could do any better, and so this post is the result of completely recreating this stack from scratch in the cloud. There are a lot of great tutorials out there for each of the topics I’ll be discussing, so this post does not intend to replicate any of those but instead provides an insightful collection of resources to hopefully give an idea of what a cloud computing setup entails.

Read more

Visualizing My Music Taste using Machine Learning and Sentiment Analysis

16 mins
Peter Margaritoff's Gradients: Created with the average color of album covers for the top 2,000 songs on Spotify

Intro • The xx

Music is a complicated thing. No one fully understands why we’re drawn towards a certain song, or so frustrated by another. In the recent paper Musical Preferences are Linked to Cognitive Styles by Greenberg et al., studies showed that cognitive style, or how individuals process information, influences their preference of music. The first study showed the link between empathy levels and the genres of choice. The second studied the effect of E-S cognitive styles (based on the Empathizing-Systemizing theory) on musical preference. Subjects with a bias towards empathizing preferred music with low arousal (gentle or warm), negative valence (sad or depressing) and emotional depth (poetic or thoughtful). On the other hand, those with a bias for systemizing showed preference for music with high arousal (strong or thrilling), positive valence (lively) and cerebral depth (complex).

This connection between cognitive processing and musical preference is very interesting to me. Having never truly understood my affinity to certain genres of music, I was inspired to attempt to quantify my music taste using data, to really understand the similarities and differences across the music that I listen to. I was also excited to apply concepts from machine learning such as clustering and unsupervised learning, and from natural language processing such as sentiment analysis, to this project and understand what insights this might produce. I’ve also wanted to explore Spotify’s API for some time now, and this proved to be the perfect opportunity to do so.

And yes, each title in this blog post will be a song from my dataset.

Read more