Henry Neeman, University of Oklahoma
December 21, 2018
Containers, such as Singularity and Docker, are an
amazing advance in software sustainability. By allowing
software developers to package not only application
software but also other components of the software
stack, including software dependencies, that the
application needs, and with which the application is
well tested, containers make the porting of
applications to new platforms much more straightforward,
convenient and efficient.
In the large scale research computing world, containers
are a miracle in the near-term, but a looming challenge
in the medium- to long-term.
A little background:
Science, Technology, Engineering and Mathematics (STEM)
research software is mission-critical to virtually
every STEM discipline (and many non-STEM disciplines), with
thousands of software applications available, mostly as
free, open source downloads, alongside a more modest
number of commercial applications.
At my institution, for example, we support about 400
applications and libraries, while national supercomputing
centers support about ten times as many.
Before containers, if you wanted to use an application
package on a particular platform—the combination of
hardware and software stack, including the operating
system and one or more compilers—you had to
explicitly port that application to that platform.
Such porting can be, in many cases, very labor intensive,
because of quirks in every part of the software
stack, from the OS to the application source code. In
some cases, porting can require substantial expertise
about building software in general, or even about the
innards of the specific application you’re porting.
Containers are very helpful with that – so why are they a challenge?
The main issue is that containers give software
developers a straightforward way to put off until
tomorrow an onerous task that they have no incentive to
And that means that, for each containerized STEM
research software application, it’ll become harder and
harder to take advantage of new hardware features, like
longer vector lengths, new instruction sets, and so on.
In the research software sustainability world, a key
issue is that the organizations that fund research
software application development—primarily national
research funding agencies—are generally far less
concerned with software issues and far more concerned
with advancing STEM research.
As such, far more funding is available to add
new science features to software applications, so that
they can solve more advanced STEM research problems,
than to improve the software itself, or even to port
the software to new platforms.
By containerizing an application, a research software
engineer can reduce the labor cost of porting to a new
platform, including to a new version of a common
operating system or a long-used compiler.
It’s clear why this would be attractive to a research
software engineer, as it helps with prioritizing the
things that their funding is intended to pay for, and
reduces the cost of the things that their funding isn’t
Of course, software stacks constantly drift, with both
new features and new bugs continually arising. New
versions of applications typically are only tested on
more recent software stacks, because the call for using
those applications on obsolete software stacks is very
Because of this, old versions of a software application
become harder and harder to port to the latest platforms,
while newer versions of an application are easier to
port to new platforms (but would similarly be very
difficult to port back to obsolete platforms, if there
were a need for doing that).
Once an application has been containerized, then each
time the research software engineer is faced with the
choice between adding new science features or spending
time porting to a new platform, it’s possible to defer
the latter in favor of the former.
After a year of doing that, it’s still
straightforward to port to a new platform – but the
software is working fine, so why go to all that trouble?
After a second year, the need may be more pressing –
for example, new hardware features may be available –
but the labor cost of porting has gone up, so the
research software engineer may choose to put off the
Each successive year, the need will be greater, but the
labor cost will be higher, so the disincentive to port
will be stronger.
Bear in mind that old versions of the compiler aren’t
updated to use new instructions – only the few most
recent versions of the compiler support that.
But the more porting is deferred to later, the harder
and more labor intensive that porting will be.
After several years of this, application performance
improvement will be limited to whatever can be derived
naturally via improvements in memory bandwidth,
interconnect speed, and disk speed.
In practice, these improve slowly compared to CPU
speed (see below).
Whereas, if the applications can take advantage of
CPU performance improvements, especially via new
instructions (e.g., AVX, half-precision floating point),
then they can also get substantial speedup from CPU
But those new instructions are available in new versions
of the relevant compiler families, not in the older
versions that an older container was built around.
And, as time goes on, those new versions of the compilers
won’t have been tested against older platforms.
So the software drift gets worse and worse over time,
while the disincentive to port to new platforms gets
stronger and stronger.
individual CPU speed improved from 53 GFLOPs per
socket on Intel Nehalem W5590 in 2009  to 1523 GFLOPs
per socket on Intel Skylake 8180 in 2017 , a
doubling period of 2 years;
memory bandwidth went from, for example, ~37 GB/sec
on dual Intel Nehalem W5580 in 2009 [6,7] to
~290 GB/sec in late 2017 on dual AMD EPYC ,
a doubling period of 3 years;
disk drive speed improves primarily with the number
of drives being used or the kind of drives (SSD vs
spinning hard disk) – i.e., with dollars spent;
interconnect bandwidth improves slowly, and latency
even more slowly, for example:
Mellanox QDR Infiniband at 40 Gbps and ~1 microsecond
MPI point-to-point latency  was released in 2008 ,
Mellanox FDR at 56 Gbps and ~1 microsecond latency 
was released in late 2011 ,
Mellanox EDR at 100 Gbps and ~1 microseconds latency
 in early 2015 ,
Intel Omni-Path at 100 Gbps and ~1 microseconds
latency  in 2016 ,
Mellanox HDR at 200 Gbps and ~0.7 microseconds latency
 in 2017.
 J. Vienne, J. Chen, M. Wasi-ur-Rahman, N. S. Islam,
H. Subramoni and D. K. Panda, 2012: “Performance Analysis
and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC
and Cloud Computing Systems.” Proc. IEEE Hot
Interconnects (HOTI-20). DOI: 10.1109/HOTI.2012.19.
 Infiniband Trade Association, Infiniband Roadmap.
 D. K. Panda, 2018, “Designing Scalable HPC, Deep
Learning, and Cloud Middleware for Exascale Systems.”
HPC Advisory Council Swiss Conference 2018, slide #15.
 Wikipedia Omni-Path webpage.
 Xenon High Performance Computing, 2017: “XENON
will integrate Mellanox’s new 200Gb/s HDR InfiniBand
interconnect infrastructure as part of their HPC
 STREAM numbers on nehalem.
 Intel® Xeon® Processor W5580.
 AMD EPYC SoC Delivers Exceptional Results on the
STREAM Benchmark on 2P Servers.
 Intel® Xeon® Processor W5590.
 Intel Xeon Platinum 8180 Processor.