Mary Shelley (Director of IT, Univ. of Maryland School of Public Health; formerly Associate Director of Synthesis at SESYNC)
September 24, 2018
What do sociologists, ecologists, economists, engineers, anthropologists, geographers, hydrologists, evolutionary biologists, and environmental scientists all have in common? Software! Science at the intersection of humans and the environment increasingly requires collaborative, interdisciplinary work among researchers with varied computing backgrounds to gather insights from highly diverse data at multiple scales. Reliable software is necessary to achieving this synthesis. At the National Socio-Environmental Synthesis Center (SESYNC), we provide cyberinfrastructure support oriented toward helping researchers choose, apply, and develop software to meet the research needs of the 40+ interdisciplinary projects we support at any given time. We approach this support from multiple angles.
SESYNC offers a cloud computing environment so that research teams can store data in a central location, develop and run software on our meso-scale cluster (bigger than a desktop but smaller than a supercomputer), and access these resources through a variety of gateways we provide: RStudio server (the most popular), ssh, Jupyter Hub, and virtual machines. For many researchers that come to the center, this is their first experience of non-local computing, so we have found that a thorough explanation and demonstration of the computing environment and how its various components fit together can greatly accelerate users’ ability to write and execute the code they need for their research.
Besides providing statistical and methodological support, our data science team helps all levels of users discover and develop software and design software workflows that meet the needs of their projects. This support can involve troubleshooting code, getting teams up and running with git, recommending packages to meet research needs, developing custom tools, optimizing code, parallelizing computation of embarrassingly parallel problems, and helping package code for dissemination and reuse. While we offer a variety of tutorials and how-to guides on our cyberhelp webpage, face-to-face interactions and in-depth discussions have proven critical to providing the best support because the software support researchers actually need often differs from what they think they need.
Throughout the year, we design and offer a variety of hands-on short courses informed by The Carpentries model of training. Though these courses provide instruction in specific tools, they help researchers adopt the mindset of software developers, particularly regarding code management, reusability, and sustainability. At our annual Summer Institute, we require participants to attend in teams and to bring their own data so they can apply a subset of the tools and practices we cover to their specific context with the help of our data scientists.
We aim to continuously improve the software and computing support we provide to the thousands of researchers that constitute our user community. As we have observed gaps over the years, we have sought or developed solutions to overcome these. Examples include provision of RStudio and Shiny servers, development of the rslurm package for bridging R sessions with a high-performance compute cluster, and offering a short course on open source geospatial tools. As we look to next steps in software support, we aim to encourage increased adoption of version control, dissemination and cataloging of project code, and facilitating adoption of metadata standards and versioning for synthetic datasets.
For more information about SESYNC and our cyberinfrastructure, please see in-line links and the following resources: