© 2020 Strange Loop
Multi-language tool support for syntax transformation is hard due to heterogeneous representations in syntax and abstract syntax trees (ASTs). Regex-based search-and-replace falls short of recognizing syntax that fundamentally delineates tree data structures. Recent approaches develop new strategies that overcome the limitations of regex matching but remain underdeveloped for easily changing code. Our work goes one step further, focusing on the problem of enabling lightweight program transformation in every language for every programmer. We show that the problem can be decomposed where (1) a common grammar expresses the central context-free language properties shared by many contemporary languages (e.g., balanced parentheses) and (2) open extension points in the grammar customizes syntax handling (e.g., for language-specific comments) with smaller parsers. We introduce Parser Parser Combinators (PPCs), our key mechanism implementing these ideas. PPCs are parser combinators that produce parsers from user-supplied patterns. Generated parsers run directly on program source to match syntax of interest (we don't define or use any AST), thereby lifting syntax rewriting to a modularly-defined parsing problem. We share large-scale results from rewriting code across 12 languages (Go, Rust, Scala, and Elm to name but a few) for top-100 most popular GitHub repositories (per language). We show over 50 syntactic changes merged into 40+ of these projects using our tool, and give a demo.
Rijnard is a PhD candidate at Carnegie Mellon University and a part-time software engineer at Sourcegraph. His research interest is in the overlap of Automated Program Repair, Program Transformation, and Program Analysis, with an emphasis on bringing new advances in this area to practice. Rijnard is born South African and holds a Master's and Bachelor's from Stellenbosch University.