Data Publishing & Reproducibility
Data and Software Publishing
You may want to consider making your data and code available through a data journal or software journal. Data journals describe data in detail, provide links to the dataset via DOI, are usually published in an open data repository, and may be peer-reviewed.
Data Journals - examples
- Geoscience Data Journal
- Earth System Science Data
- GigaScience big data from life & biomedical sciences; open-access, open-data, open peer-review
- Scientific Data (Nature)
- Methods (Nature) lab methods
- Elsevier Research Elements, Data in Brief
- Data Science
- Also see "Data Journals: A survey" provides list of 116 data journals published by 15 publishers by subject (Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015), Data journals: A survey. J Assn Inf Sci Tec, 66: 1747–1762. doi:10.1002/asi.23358)
Software Journals - examples
- Journal of Open Research Software
- Journal of Open Source Software
- Elsevier Research Elements, SoftwareX
Data and Software Journals
- F1000Research publish all your findings including null results, data notes and more.
- Rescience C encourages explicit replication of already published research
Methodology and Protocols
- Protocols.io share science methods, computational workflows, procedures, etc.
- Elsevier Research Elements, MethodsX
- GigaScience Editorial Policies
- Scientific Data (Nature) Policies
- Journal of Open Research Software Policy
- F1000 Research Policies
There are several facets to move research from being hidden, unusable, and irreproducible to fully Findable, Accessible, Interoperable, and Reusable (FAIR). These include, but are not limited to, following data management and sharing best practices.
One facet that can greatly aid in making research FAIR is by adopting tools and practices to make it easier for other to re-execute computational analyses. Re-executing an analysis helps to
- Increase the defensibilty of conclusions through transparent and open research
- Increase the ability to verify results
- Enable re-use of all or parts of the software and data in new research
Writing better software improves research reproducibility. Practices include
- Organizing data and code appropriately. See the suggested folder structures in Data Organization.
- Using scripting or other automation instead of manual processing to ensure tasks can be repeated.
- Avoiding hard-coding configurations (file paths, parameter values, etc).
- Avoiding absolute file paths. Use relative paths instead for portability.
- Using version control systems (git, SVN, etc)
- Attaching an appropriate license to software. The MIT license is recommended for simplicity while maximizing reusability. Choosalicense.org can help you select an appropriate license. For software with commercial potential, consult TechLaunch Arizona before releasing software as open source.
Johns Hopkins University has prepared a 22 minute online tutorial consisting of 6 modules, explaining the above practices in more detail.
Tools for Enabling Reproducible Analyses
These are listed for informational purposes and do not imply endorsement.
These are platforms that aim to make computational reproducibility easier by helping you package together all data, software, and dependencies into a portable package that can be executed by others in one click, using an easy-to-use graphical interface.
Most of these tools have the sharing of analyses as a central focus. Some, like Code Ocean, have integrations with certain journals to allow including executable analyses within papers.
|Containers & Virtual Machines||
||Containers can package together all software dependencies for better portability. Many of the products in the Reproducibility Platforms category above are built on container and virtualization technologies.|
|Packaging and dependency capture||Unlike the categories above which require manually bundling dependencies, this category of tools aim to make it easier to collect and package software dependencies automatically.|
|Workflow automation||These systems aim to make it easier to link together data workflows in a reproducible fashion. Many tools are specific to certain domains. The ones listed here are general purpose with varying degrees of maturity.|