An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
All items are CC0 licenced unless otherwise stated.
A recent summary of the contents of the repository can be found here.
See http://wiki.curatecamp.org/index.php/Collecting_format_ID_test_files for more information.
See metadata-template.ext.md for a simple per-file metadata template.
As well as pooling example files, we also pool format signatures:
- Tika signatures staged here: https://github.com/openplanets/format-corpus-tools/tree/master/tools/fidget/src/main/resources/tika-bl-staging
- Tika signatures later merged here: [https://github.com/openplanets/format-corpus-tools/blob/master/tools/fidget/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml here]
- DROID signatures go [https://github.com/openplanets/format-corpus-tools/tree/master/tools/fidget/src/main/resources/droid here].
More details here: http://wiki.curatecamp.org/index.php/Improving_format_ID_coverage