© 2021 Strange Loop
Exploring a large or unfamiliar codebase can be tricky. Code Navigation features like “jump to definition” and “find all references” let you discover how different pieces of code relate to each other. To power these features, we need to extract lists of symbols from the code, and describe the language-specific rules for how those symbols relate to each other.
It’s difficult to add Code Nav to a large hosted service like GitHub, where we must support hundreds of programming languages, hundreds of millions of repositories, and petabytes of history. At this scale, we have a different set of design constraints than a local IDE. We need our data extraction to be incremental, so that we can reuse previous results for files that haven’t changed in a newly pushed commit, saving both compute and storage costs. And to support cross-repo lookups, it should require zero configuration — repo owners should not have to set up anything manually to activate the feature.
Douglas Creager is a staff engineering manager at GitHub. He manages the Semantic Code team, which applies academic programming language theory to analyze all of the code hosted on GitHub. Our goal is to make it easier for developers and maintainers to understand what their code does, and how other developers use it.