Lightweight and Locality-Aware Composition of Black Box Functions
Functions are the fundamental unit of interoperability in software design: users encapsulate common functionality in libraries, and write applications by composing functions from various libraries. Unfortunately, performance is lost at function boundaries due to poor cache utilization, and increased memory footprint. Operator fusion is a well-known technique to combat this overhead. Previous solutions fuse operators by manually rewriting code into fused functions, or relying on heavy-weight compilers to perform fusion. While the first approach is not scalable, the second requires a semantic understanding of the entire computation during code generation, and the full stack of optimizations (vectorization, blocking, pipelining, etc) needs to be performed by the compiler after fusion.
In this work, we attempt to identify the minimal ingredients required to fuse computations. We find that, unlike previous approaches that require a semantic understanding of the computation, most opportunities for fusion require understanding only the data production and consumption patterns. Exploiting this insight, we add fusion on top of black-box library interfaces by proposing a lightweight enrichment of function headers to expose these data patterns. We implement our approach in a system called Fern, and demonstrate its benefits by showing that Fern is competitive with state-of-the-art, high-performance libraries with manually fused operators, can fuse across library and domain boundaries for unforeseen workloads, and can deliver speedups of up to $5\times$ over unfused code.