Moore’s law has historically dominated all facets of the computer industry and has generally come to mean both increasing transistor counts and clock rates. The net result is significantly increased compute performance in every generation. While memory densities have increased with Moore’s law (and, indeed were Gordon Moore’s target technology), there exists a significant disconnect between memory performance and compute performance. Over the past six decades, this disconnect has been termed the von Neumann bottleneck. Today, Moore’s law is moving back to its traditional definition of increasing transistor density and losing the more colloquial pillar of increasing clock rate. This can be seen in the transition to multicore architectures. This transition represents a new era in computation with three key trends:
logic transistors are essentially free and tend to manifest themselves as an increasing number of cores;
the von Neumann bottleneck continues to plague architectures and manifests itself as both a decrease in memory bandwidth per core and an increase in the total number of memory requests the memory system must handle;
to achieve the historic exponential increase in performance, increased concurrency (rather than the no longer available increased clock rate) will be required.
All of these trends combine to create a significant problem for the programmer. In fact, this emerged as the most critical challenge facing HPC at Sandia’s 2006 Workshop on Programming Languages for High Performance Computing (HPCWPL). This follow-on workshop examines these issues by addressing the following questions:
What applications are facing critical memory challenges? What do those applications require of the memory system, and how are those demands being met today?
What are the trends in memory architecture and can more intelligent memory systems impact application performance?
What minimal set of changes or enhancements to the programming model, programming languages, and compilers can be introduced to enable better memory system performance?