Skip to Main Content

Data Management: Preserving Your Data

Information and resources pertaining to research data management.

Data Repositories

Data Back-Up

  • Have at least three geographically distributed copies:
    • One local copy
    • One cloud based copy
    • One copy on external hardware
  • Make sure all copies are updated regularly
  • Document clearly how back-ups are stored and when the back-ups are updated
  • Carefully research any external hardware before use to make sure the product is reliable and not prone to failure

File Names/File Organization

When naming and organizing your files:

  • Include a version number or date in the filename
  • Do not use any special characters or blank spaces
  • Make sure your filenames are followed by the correct file extension for the file type
  • Only use abbreviations or acronyms if you are sure you will remember what those abbreviations stand for at least one year later.

File Formats

If possible, every file should be saved in their native format, an open format and a preservation worthy format. Library of Congress recommended preservation formats include:

  • Documents: XML-based markup formats with included DTD/schema, XSD/XSL presentation stylesheet(s), and explicitly stated character encoding (examples: NLM Book DTD and EPUB - compliant); Page-layout formats (PDF/UA, PDF/A, PDF)
  • Images: First make sure that the images are the highest resolution available with the highest bit depth available, embedded coror profile, uncompressed and unlayered. In order of preference - TIFF, JPEG2000, PNG, JPEG/JFIF, Degital Negative DNG, BMP, GIF
  • Audio: WAVE file with embedded metadata
  • Video: Free of digital rights management software. The Library of Congress does not currently endorse a specific file format but the community of practice recommends MP4 or AVI
  • Datasets: Platform independent, character based formats (JSON, XML), Line-oriented (TSV, CSV, fixed-width), Platform-independent native formats (.xls, xlsx)

Document Your Data

Include a README file in the highest level of your folder structure with the following information:

  • The data's purpose
  • A complete inventory of all your content
  • A data dictionary listing and describing all variables

For every research file include a description of:

  • Who created the content
  • What the content is
  • When the content was created
  • Where the content was created
  • How the content was created
  • Why the content was created

Digital Curation Lifecycle Model

Digital Curation Lifecycle Model