Strange Loop

Design and Implementation of a Topic Detection System

Ah, the topic detection dream! To automagically sift and discover "things" that people are talking about without human input.

Cortico, a non-profit dedicated to fostering a healthy public sphere, requires the ability to discover the small transient topics latent in public speech (ex. how do different geographic regions talk about the same event?). This style of topic detection differs from traditional analysis, in that the "topics" are much more fine grained (we call them events) and closely tied to current activity, and thus must be computed relatively quickly.

To solve this, we implemented our event detection method using a word co-occurrence model that performs well on collections of very small documents, such as tweets or sentences uttered on talk radio. We also bucked the trend of employing K-Means, LDA, Non-Negative Matrix Factorization and their ilk, opting instead to discover event clusters with Louvain, a network graph algorithm. The resulting output is intuitive, fast to compute, and produces interesting side models of events that we can apply to generate other delightful results.

Wes Chow

Wes Chow

Cortico / MIT Media Lab

Wes is the Director of Engineering, Advanced Analytics at Cortico, and a Research Engineer at the Laboratory for Social Machines at the MIT Media Lab. At Cortico, he leads the engineering team developing computational methods for understanding the public sphere and building parts of a social machine to boost positive outcomes. He previously was CTO at Chartbeat, where he worked on large scale distributed streaming and warehousing systems to support analytics in use by most news publishers in the world.