mdw - logo

Data guidelines - File formats recommended for the mdw Repository

File formats and software used in research data depend on how researchers collect and analyse their data. They are often discipline-specific, acknowledged within the respective domain. In general, this leads to a very heterogeneous concerning file format environment.

The use of open formats, instead of vendor specific, proprietary formats, is mandatory for long-term digital preservation as it ensures accessibility and reuse. This means that data may need to be converted from the actual work format used in research to an appropriate format for preservation.

The mdw Repository preserves deposited data in their default incoming formats (incl. open or closed formats) as master renditions though an open rendition must be available for preview, indexing and ensuring long term accessibility of the data as it can be migrated to current formats over time.

In general, data deposited in the mdw Repository should be

  • open, non-proprietary,
  • unencrypted,
  • uncompressed,
  • interoperable across platforms, and
  • acknowledged by the research community.

Metadata support of file formats is an issue when the metadata must be embedded into the actual research data in order to 'travel' with the file itself so the data can be better understood by the designated community (self-documentation).

Overview of preferred and accepted file formats

The following section provides an overview on the preferred formats for long term preservation in the mdw Repository.

Documents & Structured Text

Preferred Accepted
Adobe Portable Document Format (PDF/A), OpenDocument Text (.odt)
eXtensible Mark-up Language (XML) text (.xml) - according to an appropriate Document Type Definition (DTD) or schema (XSD)
Hypertext Mark-up Language (HTML) (.html)
plain text data
ASCII (.txt) - UTF-8 encoding
MS Word (.doc, .docx)
Rich Text Format (.rtf)
Adobe Portable Document Format (.pdf) - only if no PDF/A can be produced

Presentations

Preferred Accepted
OpenDocument Presentation (.odp) MS Powerpoint (.ppt, .pptx)

Digital Raster Images

Preferred Accepted
TIFF version 6 uncompressed (.tif) JPEG (.jpeg, .jpg) - only if created in this format !!!
TIFF (other versions) (.tif, .tiff) - only if required by specific analysis software as master files
Adobe Portable Document Format (PDF/A) (.pdf)
standard applicable RAW image format (.raw) - as master files
Photoshop files (.psd) - only if required by specific analysis software as master files

Digital Vector Graphics

Preferred Accepted
Scalable vector graphics format (.svg) Encapsulated Postscript files (.eps)
Adobe Illustrator files (.ai) - only if required by specific software as master files

Digital Audio

Preferred Accepted
Waveform Audio Format (WAV) (.wav) MPEG-1 Audio Layer 3 (.mp3) - only if created in this format

Audio editing project files to be defined still.

Please contact repo@mdw.ac.at in case other audio formats are required.

Digital Video

Please contact repo@mdw.ac.at for current video formats.

Qualitative data analysis

Especially these software packages tend to provide proprietary file formats by default though most current products have export facilities to open formats (e.g. XML based data export options). MAXQDA, for example, can export the whole project including the raw data, coding tree, coded data, and associated data (mainly memos and notes) in open formats. Thus, researchers should deposit the working MAXQDA (closed) master project file together with open variants of the data in order to ensure long term accessibility.

Preferred Accepted
SPSS portable format (.por) proprietary SPSS format (.sav)
MAXQDA XML export proprietary MAXQDA format
comma-separated values (CSV) file (.csv),
OpenDocument Spreadsheet (.ods)
MS Excel (.xls/.xlsx)

Containers / Archives

In general, the following archive types are supported: TAR, GZIP, ZIP.

Images of computer systems

Please contact repo@mdw.ac.at for supported ISO images of entire computer images.