Best Practices for Software Registries and Repositories

Alejandra Gonzalez-Beltran, Alice Allen, Allen Lee, Daniel Garijo, Thomas Morrell, SciCodes Consortium
August 4, 2021

(This post is cross-posted on the SciCodes website, the SSI blog, the ASCL blog, and the FORCE11 blog.)

Software is a fundamental element of the scientific process, and cataloguing scientific software is helpful to enable software discoverability. During the years 2019-2020, the Task Force on Best Practices for Software Registries of the FORCE11 Software Citation Implementation Working Group worked to create Nine Best Practices for Scientific Software Registries and Repositories. In this post, we explain why scientific software registries and repositories are important, why we wanted to create a list of best practices for such registries and repositories, the process we followed, what the best practices include, and what the next steps for this community are.

Why are scientific software registries and repositories important?

Scientific software registries and repositories support identifying and finding software, provide information for software citation, foster long-term preservation and reuse of computational methods, and ultimately, improve research reproducibility and replicability.

Why did we write these guidelines?

Managers of scientific software registries and repositories have been working independently to run their services and provide useful information and tools to users in different communities. The Best Practices for Software Registries Task Force participants had different perspectives representing a heterogeneous set of resources, but came together for the common goal of creating a list of best practices for scientific software registries. These shared practices help to raise awareness of software as a research output, enable credit for software creators, and guide curators working on software catalogues through the steps to consider when setting up their software registries. In the longer term, we hope to improve the interoperability of the software metadata supported by different services.

The goals that we considered for writing the guidelines were:

  • to have a minimal number of best practices, easy to adopt by repository managers
  • to be broadly applicable to most or all of our resources
  • to be descriptive on a meta level, not prescriptive, and focused on what the best practices should do or provide, not on what a suggested policy or element should specifically say.

What are the best practices?

Our guidelines, listed below, provide an overview of the key points to take into consideration when creating a software registry. They are:

  • Provide a public scope statement (examples)
  • Provide guidance for users
  • Provide guidance to software contributors (examples)
  • Establish an authorship policy
  • Share your metadata schema (examples)
  • Stipulate conditions of use (examples)
  • State a privacy policy (examples)
  • Provide a retention policy (examples)
  • Disclose your end-of-life policy (examples)

Our pre-print offers more explanation about each guideline and a longer list of implementations that we found when we were doing our work on these practices.

What process did we follow to produce the guidelines?

Representatives from numerous software registries and repositories were involved in the FORCE11 Software Citation Implementation Working Group (SCIWG). Alice Allen proposed that we form a task force within the SCIWG for writing up some best practices for the registries and repositories, and with acceptance by the co-chairs of the SCIWG and interest from relevant people, the Task Force on Best Practices for Software Registries was formed. Initially, we gathered information from members of this Task Force to learn more about each resource and to identify some of our overlapping interests. We then identified potential best practices based on prior issues we experienced running our services and discussed what each potential practice might include or exclude.

Through iterative deliberations, we determined which of the potential practices were the most broadly applicable. With generous funding from the Alfred P. Sloan Foundation, we hosted a workshop for scientific registries and repositories, part of which was devoted to gathering final consensus around the Best Practices. The workshop included registries who were not part of the Task Force, resulting in a broader set of contributions to the final list.

What are the next steps for the group?

Our goal is to continue our efforts by implementing these practices more uniformly in our own registries and repositories and reducing the burdens of adoption. We have created SciCodes, a consortium of scientific software registries and repositories, which is now defining the next priorities to tackle, such as tracking the impact of good metadata, improving interoperability between registries, and making our metadata more discoverable by search engines and services such as Google Scholar, ORCID, and discipline indexes. We are also sharing tools and ideas in a series of presentations that are recorded and available for viewing on the SciCodes website, so please check them out!