Data files should be organised and named in a consistent and practical system, to make it easier to find and keep track of data.
File names should help classify files, uniquely identify files and provide information about the content and status of the file.
Best practice when choosing a file name:
- Keep names short
- Always use a 2-digit number, rather than 1 e.g. 01,02,03, unless it is a year or a number with more than 2 digits
- Considering how filenames will sort in directory listings e.g. putting common elements at the start of a file so they are grouped together
- Unless the system you are using automatically maintains version histories, you should add version numbering in file names to indicate files revisions or edits; or use discrete or continuous numbering depending on minor or major revisions
- Decide on a file naming convention at the start of your project – this involves decisions about punctuation, date formats, number of digits and the order of each element
- Use controlled vocabularies within your research discipline to enable effective searching and retrieval of files by different people
- Name files suitably as soon as you create them
- meaningless names (or only mean something to you)
- names relating to individuals
- unnecessary repetition and redundant words
- Spaces; use underscores or hyphens instead
- Non-alphanumeric characters
- Common words e.g. draft at the start of the file name
- Avoid use of characters that perform specific functions in some operating systems, such as the following reserved characters / \ “ * :
Examples from the UK Data Archive:
- FG1_CONS_2010-02-12 is the file that contains the transcript of the first focus group with consumers, that took place on 12 February 2010
- Int024_AP_2008-06-05 is an interview with participant 024, interviewed by Anne Parsons on 5 June 2008
- BDHSurveyProcedures_00_04.pdf is version 4 of the survey procedures for the British Dental Health Survey
Examples of filenames for scientific data are available from the Centre for Environmental Data Archival.
More detailed advice about file naming is available from JISC.
Tip: to ensure files are compatible across platforms, such as Windows and Apple, the 8.3 Naming Convention should be used. This limits names to eight characters, followed by a three-character extension (such as 12345678.txt).
It is important to take the time to plan how to structure files in folders to enable quick location, especially when working in collaboration with others.
Best practice for structuring files and folders:
- Group files within folders so information on a particular topic is located in one place. For example, experimental work could be stored in folders organised by the date of the experiment, or by a key experimental condition
- Apply logical structuring of files within folders relating to projects or issues
- Don’t leave files unsorted, hanging under top level folders
- Separate current and completed work or versions e.g. where a document has many versions and multiple contributors consider a “Current Version” folder
- Structure folders hierarchically by starting with a limited number of high level folders for broader topics, and create more specific folders within these. It helps to restrict the level of folders to three or four deep and not to have more than ten items in each list
- Use existing conventions and procedures, from your project team or Research Centre, to structure folders.
Tip: Assess what data you have regularly to ensure files are not kept pointlessly. Put a reminder in your calendar so you don't forget!
Tip: However you choose to arrange your files, make sure you write down what you've decided in an index file (e.g. Word or text document) that you keep with the files. This only takes a few minutes, but can save hours of searching later.
An example folder structure is available from the UK Data Archive; this separates data and documentation files, and then according to type and research activity:
The London School of Hygiene and Tropical Medicine also has a sample organisation structure for longitudinal data
Version control is managing different versions and drafts of a document, file, record or dataset. It provides an audit trail for the revision and update of draft and final versions. Version control is important for working on collaborative documents with a number of contributors, and for knowing which version of a file is being used or enforced.
Some systems automatically maintain version histories; but if not, the following best practice should be used:
- Use appropriate labels to differentiate between statuses:
- ‘d’ for draft revisions that are still in development e.g. d1, d2, d3
- ‘v’ for versions that are intended to be seen by others e.g. v1, v2, v3
- Agree who will finish finals and mark them as 'final'
- Use a 'revision' numbering system:
- First draft versions should use v0-1, v0-2 etc. until it becomes the final approved version v1-0
- Final approved versions and major changes should be indicated by whole numbers e.g. v1-0 would be the first major version, v2-0 the second major version.
- Minor changes can be indicated by increasing the figure after the dash, for example v1-1
indicates a minor change has been made to the first version, and v3-1 a minor change has been made to the third version.
- Although full stops are usually used in numbering, dashes should be used for electronic filenames e.g. v1-0
- Apply version numbers consistently e.g. v1-0, v2-0, v2-1, v3-0, rather than v1, v2, v2.1, v3
- Include the version as part of both the file name, and within the document itself. In the header or footer of a document identify the author, filename, page number and date the document was created/revised
- If you store the same data in different file formats, ensure that the filename and version are the same e.g. ‘SmithB-transcript-v10.doc’, ‘Smith-B-transcript-v10.rtf’, ‘SmithB-transcript-v10.pdf’
Examples of file versions from the UK Data Archive:
- date recorded in the file name or embedded within the file: HealthTest_06-04- 2008
- version numbering in the file name: BGHSurveyProcedures_v1-3
- version description in the file name or embedded within the file (draft, final):
Tip: Include a version control table in files to detail each time a change is made; see an example below.
|Version ||Date ||Description of Change ||Name
|0-1 ||17/06/2015 ||First draft sent to project team ||Miss A. Researcher
|0-2 ||22/06/2015||Updates made from project team feedback, including changes to the method, because an alternative method was discovered since the first draft and is better suited to the project ||Miss A. Researcher
|1-0 ||29/06/2015 ||Final version – approved by Research Committee ||Miss A. Researcher
|1-1 ||16/07/2015||Minor amendment to section 3 ||Miss A. Researcher
Version control can also be maintained through:
- version control facilities within software used - Example for Microsoft Word
- using versioning software, e.g. in SharePoint or Github
- using file sharing services such as Syncplicity
- manual merging of entries or edits by multiple users
To demonstrate the authenticity of data and prevent unauthorised changes to it, follow this best practice from the UK Data Archive:
- keep a single master file of data
- assign responsibility for master files to a single project team member
- master versions of data files should be given read-only status to general readership
- record all changes to master files
- maintain old master files in case later ones contain errors
- archive copies of master files at regular intervals
- develop a formal procedure for the destruction of master files
Data documentation encompasses all the information necessary to discover, interpret, understand and use data. This is important for collaborators, original researchers returning to data, or new users of data. Good documentation is vital for successful data preservation as data can quickly become unusable if key details of the context have been forgotten.
Data documentation should include detailed data description and annotation:
Study level description
- Research aims, objectives, questions
- Why and how the data were created, prepared or digitised (e.g. data collection methodologies, analytical information, classification schemes used, details of how a sample was chosen, assumptions)
- Instruments, measures and secondary data sources used
- Data validation and quality checking procedures
- Data ownership, confidentiality, access and use conditions
Data level description
- Data context (e.g. names and definitions, units of measurements, geographic location and time period)
- Data content and structure (e.g. data format and volume)
- Data alterations or coding (e.g. algorithm or command file used, plus reasons for missing values)
- Weighting and grossing variables
- Data list describing cases, individuals or items studied, for example for logging qualitative interviews
- Audit trail of activities performed when capturing, processing, and analysing contained content
- Relationships between individual files or entities (e.g. X is later superseded by Y)
All of this extra information is collectively known as metadata.
It is important to consider any third party requirements before describing the data as there are many standards. This could include:
Creating data documentation from the start of a project makes it easier to manage and understand data later in the research lifecycle. Therefore, include procedures for documentation in your data planning.
How to capture data documentation
- Embedding documentation within data or documents:
- statistical e.g. SPSS
variable descriptions and attributes (codes, data type, missing values) of each variable in the data file can be documented in 'Variable View' or via syntax, whereby embedded data documentation is then contained in the SPSS command file
- databases e.g. Microsoft Access
variable descriptions and attributes can be documented in 'Design View' and relationships between tables and files can be created
- GIS e.g ArcGIS
shapefiles (layers) and tables can be organised in a geo-database with rich metadata created in ArcCatalog
- spreadsheets e.g. Microsoft Excel
an additional worksheet within the data file can contain data-related documentation
- text files e.g. Microsoft Word
add documentation to header or footer
- Supporting documentation accompanying data - in separate ‘read me’ files, final reports for funders before depositing data, supplementary materials underpinning published articles
- Catalogue metadata – a subset of core data documentation providing standardised structured information, usually associated with the data required when depositing in a repository. Metadata are typically used for discovery, providing searchable information that helps users to find existing data, as a bibliographic record for citation, or for online data browsing.
Detailed guidance is available from the UK Data Service for both tabular data and qualitative data, and guidelines are available about documenting qualitative data using NVivo 9.
Tip: for lab-based research, documentation is often recorded in a lab notebook, so make sure this is kept safe. Record the lab notebook page number with the data files, and if possible, scan the page(s) and keep them with the data.
Further advice on documentation for describing images is available from JISC Digital Media
Not all research data is digital and hand drawn sketches and hand-written laboratory notebooks, journals and other materials are at particular risk of loss. Digitisation can help organise and protect non-digital data:
- Anything stored on paper can be scanned fairly easily: find out how to scan directly to your University storage using any of the managed printers on campus
- Take a digital photo, but check the quality of the image to make sure you can use it if you lose the original
- Audio recordings can be turned into digital sound files, or transcribed if only the words are needed. This can be done individually or by employing a professional transcription service
If the data or artefact absolutely cannot be digitised, consider other options for protection, such as a fireproof safe.
Best practice and standards for digitising analogue media is available from JISC Digital Media.