Teaching Computers How to Write Fast Software
The computer revolution has taken us to the point where we use machines for nearly every possible thing. We use them in schools, work environments and our homes; even our cell phones, cameras and cars have computers. Scientists use very high performance computers to collect, analyze and manage large amounts of information. To perform these complex tasks, computers run software or programs written by human experts--it's the part most of us don't see.
National Science Foundation (NSF) supported researchers at Carnegie Mellon University (CMU) have found a way to 'teach' computers to write high performance libraries for certain types of numerical functionality. Libraries are modular segments of code that perform a performance-critical task or function when called by a program. In developing software, programmers call libraries instead of having to rewrite the same code over and over. If a library runs fast, then so will the application that uses it.
The research team, led by Markus Püschel, an associate research professor of electrical and computer engineering at CMU, is the first to demonstrate that all the necessary tasks in writing and optimizing these types of libraries for highest performance can be fully automated.
"It has become extraordinarily difficult to write libraries that run as fast as possible," Püschel says. "Particularly affected are numerical applications, those that require extensive mathematics such as any form of audio-image-video processing, communication, medical imaging, scientific modeling and many others."
As an example, Püschel said that one straight-forward implementation of a mathematical routine may run 10 to 100 times slower than an optimal one, even though both may perform the same number of mathematical operations. "Needless to say, this is not an acceptable loss," he adds.
So what is so difficult about writing fast libraries for numerical functionality? These libraries are written and optimized by hand, a time-consuming task that has to be repeated whenever a new computing platform is released. To make things worse, the complexity of modern computing platforms "puts performance optimization into the realm of 'guru programmers' who do things like restructure the computation to avoid cache misses, use special vector instructions, and parallelize for multiple processor cores," Püschel says.
Automating these and other tasks in library development posed the main challenge that his team took up as part of the SPIRAL project for a domain of functions called linear transforms. The team succeeded due to a number of major ideas and techniques developed by team member Franz Franchetti, an assistant research professor, and recent Ph.D. graduate Yevgen Voronenko who also designed and implemented the library generation system as part of his thesis work.
The result: SPIRAL is the first to demonstrate that the development of highest performance libraries can be fully automated. Given only a short mathematical description of a linear transform, SPIRAL can automatically design and implement a library and perform all the necessary optimizations to achieve very high performance on a state-of-the-art computer. These optimizations include the traditionally difficult problems of parallelization, vector processing and memory hierarchy optimization. Further, benchmarking shows that these computer-generated libraries are as fast as, and sometimes even faster, than their human written counterparts.
Integrated Electronics Corporation (Intel)--one of the leading companies for microprocessors manufacturing--has started to use SPIRAL to generate parts of its commercial libraries. "This may mark the first time that commercial library development is done by a computer," Püschel notes.
SPIRAL combines innovative techniques from different disciplines including tensor algebra, domain-specific languages, symbolic computation and compilers. "The key idea behind SPIRAL is a novel framework that combines several techniques including the use of mathematical, domain-specific languages and rewriting to raise the level of abstraction and 'teach' SPIRAL algorithm knowledge and algorithm optimization," Püschel notes.
"When SPIRAL writes software, different available algorithms are considered and evaluated and undergo sophisticated optimizations so the final code is not only correct, but also runs as fast as possible. And it does, sometimes beating all competitors," Püschel continues. "SPIRAL can, to date, only implement a limited set of functionality, but one has to start somewhere and we believe that SPIRAL's approach can be carried much further. Approaches like SPIRAL may be crucial in dealing with future computing platforms," he adds.
The latest version of the SPIRAL library generation system builds on an earlier prototype jointly developed by the teams of professors Jose Moura, Püschel and Manuela Veloso of CMU, professor David Padua of the University of Illinois at Urbana-Champaign, and professor Jeremy Johnson of Drexel University.