Data Organization

Directory, Folder, and File Naming

One of the first things to consider is: How do I want to organize my data?  There are a number of questions you will want to consider:

  • Are there file naming conventions for your discipline?
  • What directory structure and file naming conventions to use?
  • Version control -- record every change to a file, no matter how small
    • Consider version control software, if applicable
    • Discard obsolete versions, but never the raw copy

Keep the following best practices in mind:

  • Be consistent with how you name directories, folders, and files
    • Always include the same information
    • Retain the order of information
  • Be descriptive so others can understand what file names mean
  • Keep track of versions (and be consistent!)
  • Use application-specific codes for file extensions, such as .mov, tif, wrl

It will be important to track changes in your data files especially if there is more than one person involved in the research.

The following are some file renaming applications if you need to revise your naming system:

File Formats

The file format is the principal factor in the ability for others to use your data in the future.  You need to plan for software and hardware obsolescence since technology continually changes.  How will others use your data if the software used to produce is no longer available?  You may want to consider migrating your files to a format with the characteristics listed below and keep a copy in the original format.

Formats most likely to be accessible in the future include:

  • Non-proprietary, not tied to a specific software product
  • Unencrypted
  • Uncompressed
  • Common, used by the research community
  • Standard representation, such as ASCII, Unicode
  • Open, documented standard

Examples of preferred formats:

  • PDF/A, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPEG
  • XML or RDF, not RDBMS