The molecular sciences — including chemistry, materials, biophysics and biochemistry — have a long history of developing software to answer core scientific questions. The field also has a long history of challenges to software sustainability. This blog post discusses some of the software sustainability challenges and the opportunities/possible solutions that the Molecular Sciences Software Institute (MolSSI) is working toward with the molecular sciences software development community. The MolSSI is an NSF-funded project that is a nexus for science, education, and cooperation for the global computational molecular sciences community.
With All Hallows Eve upon us once more, as the souls of the dead come to haunt us, it’s time to recount terrifying tales and scary stories… about software. You might think that research software is safe from such gruesome goings-on but you would be wrong, for there are many undead projects out to devour us. Here’s how to recognise some of these spooky software, along with other pestilent projects, and dispatch them back whence they came.
When asked how they plan to sustain their software, many (naïve) software developers will say that they plan to make it open source. And that’s often their whole plan. There is an assumption that the mere act of exposing the software to the public will create a community who are able and willing to contribute to the support, maintenance, and perhaps the enhancement of the software product. Those who have more experience observing how open source software works will realize that it is very rare for a project to reap significant benefits from the broader community.
Development and use of software are fundamental to numerous areas of scientific research. Many scientists write, modify, and use software to gain insight and prove scientific results. At the same time, formal software engineering techniques and knowledge that are widely adopted in other software development domains are not as commonly used in research software projects. In my experience, research software development approaches are more informal, particularly in the upstream activities of requirements, analysis, and design.
This blog post suggests an expression that can be used to loosely quantify software sustainability, and then proposes that projects that seek sustainability use this formula when making decisions. It’s heavily based on a a white paper for the 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), which in turn is based on a previous blog post, and it is crossposted on the BSSw and URSSI blogs, as well as my own blog.
Why do open source research software projects appear to have a low rate of success? Is it because we lack appropriate models for sustaining research software development or is it because the community isn’t seeing the results? In “traditional” open source software projects, development is often sustained by creating a community of contributors from different organisations that collectively provide effort towards the ongoing maintenance and feature development of the software.
Research Software Engineers are playing an increasingly critical role in research software development (as described in a previous blog post). A community of RSEs began to form in the UK in 2012, by Jan 2019, the European Commission had published a report Recognising the Importance of Software in Research - Research Software Engineers (RSEs), a UK Example emphasizing that RSEs are crucial to sustain research software and research computing. In the US, the people in these roles have begun to build a more formal community with the US Research Software Engineer Association (US-RSE).
(reposted from Better Scientific Software) I’ve been participating in open source software projects since around 1994; and when asked what I’ve learned, I always say, “It’s all about people.” So while I could be writing about all the technical things that are going on in my scientific software projects, let me instead write about people. Emacs’s CC mode My first involvement with open source was when I was a freshman at the University of Stuttgart, in Germany, in 1994.
Do you develop software for your research? Do you have some basic skills but desire more? If so, you might be interested in the URSSI Winter School in Research Software Engineering. As part of the URSSI institute planning, we are planning a pilot 2.5-day workshop on research software engineering skills. This is aimed at early-career researchers, including graduate students and postdocs, who are familiar with the basics such as the Unix shell, version control with Git, and Python programming, and would like to learn more about best-practices for developing research software.
(reposted from Chan Zuckerberg Initiative Science Medium) Open source software is a key ingredient of modern science. Hundreds of software packages, libraries, and applications have become essential tools. Whether it’s searching a genome sequence for a disease gene, counting cells in a microscope image, or tracking the evolution of an Ebola outbreak, software is critical to the work scientists do every day — and much of it is built by researchers who volunteer their time and effort to make their tools available and usable by others.
URSSI Community Survey - Initial Results To better understand research software user and developer communities, we conducted a survey of research software users and developers. The focus of the survey was to gather information to help identify how to increase the sustainability of research software. To gather a broad range of perspectives, we distributed the survey to 25,000 NSF and 25,000 NIH PIs whose projects involve research software, as well as mailing lists of interested people such as the WSSSPE email list.
Why Research Software Engineers? At Princeton University, our Research Software Engineering group is nearing its third birthday but many people still ask basic questions about us: What is a Research Software Engineer? What do you do? How do you work with researchers? In an attempt to answer some of these questions I like to begin with an analogy. Most adults know how to cook something. Maybe it’s ramen, tacos, or lasagna.
Summary: One goal of the conceptualization phase of URSSI is to gather as much input from the community as possible about the different facets and pain points of sustainability of research software: from career paths of software developers in academia to citations of software to gaps in existing training and education programs for software engineering. The awareness of the importance of this topic is evident in diverse initiatives and projects around research software sustainability such as WSSSPE (Working towards Sustainable Software for Science: Practice and Experiences) and BSSw (Better Scientific Software), and funding programs like the past NSF SI2, which also funds the conceptualization of URSSI, and its successor, CSSI.
Summary At the first URSSI community workshop, a small group of participants started to discuss a model for incubating research software projects. Incubation in this context might include a structured program that helps developers plan a new community-based software project, or improve existing projects that need mentorship, strategy, or other resources in order to sustainably grow. To further explore this topic we brought together 16 experienced funders, community managers, developers, researchers, and software users.
Summary: One of the biggest obstacles to making research software sustainable is ensuring appropriate credit and recognition for researchers who develop and maintain such software. We convened 16 experts over two days to identify core issues around software credit and propose concrete steps that a software institute might take to solve them. We identified six core issues directly related to credit (career paths, individual impact, disincentives in the academic credit model, quality versus impact, recognition of software value, lack of funding) and two broader challenges (lack of funding for maintenance and lack of awareness of best practices).
Thanks for comments from: Dan Katz, Suresh Marru, Abby Cabunoc Mayes, Micaela S. Parker, Danielle Robinson, and Nic Weber. Research software is software created by scholarly researchers. Often researchers produce source code that is ‘open’, in alignment with scientific norms. But this code and the research it supports is not always “sustainable”, because it does not have a viable community of developers and users working with it in an ongoing way.
We are witnessing the early stages of a revolution in the computational molecular sciences. Numerous community codes in quantum chemistry, biomolecular simulation, and computational materials science are beginning to adopt modern, collaborative software engineering practices and tools, to the benefit of the broader field. Over their long history, the computational molecular sciences have emerged as an essential partner with experiment in elucidating the structures and mechanisms that control chemical processes, and, in fact, often precede experiment in the knowledge-based design of new systems.
(reposted from Titus’s blog) We are slowly working towards a v2.0 release of sourmash, our software for MinHash and modulo hash analysis of genomic data, and the question of proper authorship is once again on my mind! The question du jour: how should authorship on software papers be decided? Some background - our previous take on authorship Those of you with long memories may recall a hullabaloo in 2015 over this occasioned by the khmer v2.
Containers, such as Singularity and Docker, are an amazing advance in software sustainability. By allowing software developers to package not only application software but also other components of the software stack, including software dependencies, that the application needs, and with which the application is well tested, containers make the porting of applications to new platforms much more straightforward, convenient and efficient. In the large scale research computing world, containers are a miracle in the near-term, but a looming challenge in the medium- to long-term.
This is a time of great growth at the intersection of software engineering and research software. There is a need to bring together members of these communities to identify common goals and lay out research agenda to move both communities in a positive direction. To address this, the SE4Science’19 workshop will be held May 28, 2019 in conjunction with the International Conference on Software Engineering (ICSE) in Montreal, Canada. The goal of this workshop is to provide a unique venue for interaction between software engineers and scientists.
(reposted from SSI blog) By Mateusz Kuzak, Maria Cruz, Carsten Thiel, Shoaib Sufi, and Nasir Eisty. This post is part of the WSSSPE6.1 speed blog posts series. Photo courtesy of Lee Cannon We argue that research software should be treated as a first-class research output, in equal footing to research data. Research software and research data are both fundamental to contemporary research. However, the recognition of the importance of research software as a valuable research output in its own right is lagging behind that of research data.
Credit and recognition for research software: Current state of practice and outlook
Stephan Druskat, Daniel S. Katz, David Klein, Mark Santcroos, Tobias Schlauch, Liz Sexton-Kennedy, and Anthony Truskinger • December 3, 2018
(reposted from SSI blog) By Stephan Druskat, Daniel S. Katz, David Klein, Mark Santcroos, Tobias Schlauch, Liz Sexton-Kennedy, and Anthony Truskinger. This post is part of the WSSSPE6.1 speed blog posts series. The cruise ship "Columbus" leaving the harbor at Amsterdam during WSSSPE 6.1. Photo by Mark Santcroos. Like the behemoth cruise ship leaving the harbor of Amsterdam that overshadowed our discussion table at WSSSPE 6.1, credit for software is a slowly moving target, and it’s a non-trivial task to ensure that the right people get due credit.
(reposted from GSI blog) Numerous fields are increasingly dependent on geospatial software that is defined to transform geospatial data (i.e. data with geo and/or spatial references) into geospatial information, knowledge, and intelligence. The growing benefits and importance of geospatial software to science and engineering is driven by tremendous needs in these fields such as agriculture, ecology, emergency management, environmental engineering and sciences, geography and spatial sciences, geosciences, national security, public health, and social sciences, to name just a few, and is reflected by a massive digital geospatial industry.
A major part of our year long effort to plan a US research software institute is to understand the diverse challenges and barriers that researchers face when using or developing research software. To better understand these challenges, we are currently in the midst of running a large scale survey aimed at researchers who develop or use software in academia, government, and other research focused institutes. If you’re involved in any aspect of research software or know colleagues who are, please take and share the survey:
(reposted from Daniel S. Katz’s blog This blog post is intended as companion text for a talk I gave at the September 2018 NumFOCUS Project Forum in in New York, though I also hope it stands on its own. To address software sustainability, it is important first to understand how the term sustainability is used more generally. It’s most often used in the context of ecology, often specifically in the relationship between humans and the planet.
CiteAs.org links between pieces of software and their requested citations. It enables moving from the name of a piece of software, its webpage URL, or a DOI, directly to the machine-readable metadata (e.g., BibTex, Zotero auto-import) for the citation the author of the software package wants you to use. CiteAs.org is funded by the Digital Science program at the Sloan Foundation (Grant Number 8028), and conceived and developed by Heather Piwowar and Jason Priem at ImpactStory, together with James Howison from the Information School at the University of Texas at Austin.
What do sociologists, ecologists, economists, engineers, anthropologists, geographers, hydrologists, evolutionary biologists, and environmental scientists all have in common? Software! Science at the intersection of humans and the environment increasingly requires collaborative, interdisciplinary work among researchers with varied computing backgrounds to gather insights from highly diverse data at multiple scales. Reliable software is necessary to achieving this synthesis. At the National Socio-Environmental Synthesis Center (SESYNC), we provide cyberinfrastructure support oriented toward helping researchers choose, apply, and develop software to meet the research needs of the 40+ interdisciplinary projects we support at any given time.
When I first started thinking about how we could create a career path for Research Software Engineers (RSEs) in academia, I assumed it would involve persuading university managers to implement a new career path. Quite frankly, I wasn’t looking forward to the interminable bureaucracy that such a change would require me to navigate. Fortunately, a completely different solution quickly gained traction in the UK: the rise of RSE Groups. An RSE Group is a centralized group, based at a university or other research organization, that employs a number of RSEs and then hires them out to researchers across the organization.
It is a truth universally acknowledged that a biologist in possession of a data must be in want of a computer to analyze it on. Or, perhaps not. In 2016 as part of our efforts to better understand the needs of users and potential users of CyVerse (NSF-funded cyberinfrastructure for life sciences), we conducted a survey of NSF-funded investigators to determine what was important for them when it comes to analyzing large datasets.
Software has been both ubiquitous and largely neglected in computational science and engineering (CSE) since before the field became a recognized entity. The interest in CSE software for both practitioners and sponsors has primarily been on the scientific insights and advances it enables rather than on its value as a long-lived tool or product. As a result, the culture of CSE, broadly speaking, has a structure and reward system that focuses on the algorithms and the results, but where good quality research software, as well as the time and effort required to produce it, often tend to be marginalized.
Among the many efforts that are underway as part of NSF’s SI2 program, one of the most cross-cutting efforts is the planning for a US Research Software Institute (URSSI), which was funded in December 2017. This effort aims to plan an institute that would address challenges around making research software sustainable and robust, and more importantly, improve the sustainability of the researchers who develop such software. Some of our initial discussions, described in detail below, have surfaced problems encountered by specific researchers working on specific software applications, but the solutions conceived of and planned for by URSSI are not aimed at any one domain or discipline of research.
Note: This is reposted from the CANARIE Blog My previous blog posts have focused on the research software landscape in Canada, but the challenges and opportunities we face are not different from those in other parts of the world. In this post, I provide a brief overview of three international organizations that CANARIE works with as part of our Research Software program. These organizations are very different in their structure and approach to excellence in research software, but as you’ll see, they are all trying to solve common problems.
In 2016, the UK Software Sustainability Institute (SSI) ran a first survey of Research Software Engineers (RSEs): the people who write code in academia. This produced the first insight into the demographics, job satisfaction, and practices of RSEs. To support and broaden this work, the Institute planned to run the survey every year in the UK and an ever-expanding number of countries so that insight and comparison can be made across the globe.