Research Data Management
What is a Data Repository?
Data repositories are being developed as a result of federal agencies, journals, and scholarly societies mandating that researchers share the data resulting from their research. Some data repositories have been around for over 50 years, but most have been developed since the early '90s. Repositories have been developed by governments, scholarly societies, institutions and universities. They can be open (all are invited to deposit and use the data) or closed (only members may deposit or use the data).
When selecting a repository take into consideration that "publishing" datasets is becoming like publishing your research paper – you want it to be found and cited. Eventually, datasets may be a part of the tenure and promotion procedure (At the U the conversation has yet to begin). Web of Science folks have already developed Data Citation, such as the DataCite schema and discipline specific schemas.
Before settling on a repository consider:
- The Hive is the University of Utah's institutional data repository and is free of charge for all U of U researchers.
- The subject(s) the repository will allow in their system.
- Your funding agencies may have a specific repository for your datasets.
- The journals in which you will be publishing may have a specific repository for your datasets or require it be in an open access repository, e.g. PLoS.
- Your scholarly society and colleagues may already be depositing datasets in a repository. Talk to them.
- If you were required to write a data management plan to include with your grant proposal, what did you say you about sharing your research data?
- Check out the cost for using the repository. Do you have the funding to cover it? The cost to deposit and/or the maintenance fees depends on the repository. Not all repositories will charge to deposit your research data. If it is a repository requiring membership, then either the researcher must belong or the researcher's institution must belong. The Marriott Library pays for membership to ICPSR (Inter-university Consortium for Political and Social Research) so everyone on campus conducting research in social science can use it and deposit datasets.
- Check to see if the repository is able to preserve (not just backing up) your datasets. Does it have the technology and policy in place for preservation to ensure your datasets will be maintained for use in the future?
- Check out the metadata and vocabulary requirements being used by the repository. This information should include enough information about how the project was conducted so that it can be replicated. Your discipline may have already developed a standard vocabulary.
- Check out what file formats are acceptable. Usually file names should include only letters, numbers, dashes ("-"), underscores ("_") and should all be lower-case. The repository may have additional restrictions.
- Check to make sure the datasets receive persistent identifiers, PIDs to identify the dataset. A DOI is the most commonly used PID for datasets and publications. ARKs can be deleted so are not useful PIDs for datasets. PIDs are used to link the datasets with the publications.
- Check to see if your datasets can be restricted to specific users, if it is sensitive data. Can the datasets be restricted for a specific time period?
- Does the repository provide information on how to cite data reused by others? If you are going to do all the work of depositing your data you may as well receive credit for it.
Repository Selection Tools
The following are databases of data centers and subject-based data repositories.
- NIH Data Sharing Repositories
- The National Library of Medicine has produced a table of NIH-supported data repositories.These repositories do accept data from non-NIH research. There are 45 repositories listed.
- re3data.org
- The Registry of Research Data Repositories was released in December 2012. The registry is a result of a partnership among the Berlin School of Library and Information Science, GFZ German Research Centre for Geosciences and the KIT Library of Karlsruhe Institute of Technology. The initiative is being funded by the German Research Foundation DFG.
- Open Access Directory
- The Open Access Directory of Data Repositories is being maintained by Simmons College. Repositories are listed by discipline.
- Repository Finder
- This is a pilot project of the Enabling FAIR Data Project led by the American Geophysical Union (AGU) in partnership with DataCite and the Earth, space and environment sciences community, can help you find an appropriate repository to deposit your research data. The tool queries the re3data registry of research data repositories.
- FAIRsharing.org
- This FAIRsharing resource provides access to databases and in addition indicating the standards and policies used to develop the databases/repositories.
Listings of data repositories:
- Nature
- This resource is excellent for finding general and discipline specific repositories.
Repository Selection Decision Tree
- Repository Selection Decision TreeThis document walks users through the selection of a data repository for sharing and storage. The final page includes a matrix to compare The Hive, One Utah Open Data, Generalist Repos, and ICPSR
Additional Selection Resources
Helpful charts comparing generalist repositories:
Desirable Characteristics for All Data Repositories
- Unique Persistent Identifiers: Assigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available.
- Long-Term Sustainability: Has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events.
- Metadata: Ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. Domain-specific repositories would generally have more detailed metadata than generalist repositories.
- Curation and Quality Assurance: Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.
- Free and Easy Access: Provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data.
- Broad and Measured Reuse: Makes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of data (i.e., through assignment of adequate metadata and unique PIDs).
- Clear Use Guidance: Provides accompanying documentation describing terms of dataset access and use (e.g., particular licenses, need for approval by a data use committee).
- Security and Integrity: Has documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data.
- Confidentiality: Has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.
- Common Format: Allows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves.
- Provenance: Has mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata.
- Retention Policy: Provides documentation on policies for data retention within the repository.
- Last Updated: Jul 30, 2024 11:40 AM
- URL: https://campusguides.lib.utah.edu/researchdatamanagement
- Print Page