Useful links and materials

From COST-MOBILISE Wiki
Jump to: navigation, search

General

List of digital preservation initiatives

EOSC Marketplace for Data Storage and Data Archiving, see also Solutions for a sustainable EOSC: A FAIR Lady (olim Iron Lady) report from the EOSC Sustainability Working Group 2020 and PID Architecture for the EOSC - Report from the EOSC Executive Board Working Group (WG) Architecture PID Task Force (TF)and the project OCRE

GAIA-X Technical Infrastructure June 2020

Open consultation for the EOSC Strategic Research and Innovation Agenda with EOSC Open Consultation Booklet July 2020

EUDAT CDI members

Principles of Archival of Digital Assets, published by iRODS, 2014 (bit preservation and functional preservation)

Digitale Bestandserhaltung in der Praxis – Entwicklung eines Preservation-Planning-Konzepts zur Langzeitarchivierung von digitalem Kulturgut am Beispiel der Verbundlösung Berlin-Brandenburg by C. Loose, 2016, FH Potsdam

Funktionale Langzeitarchivierung digitaler Objekte – Erfolgsbedingungen des Einsatzes von Emulationsstrategien, Suchodoletz 2009, Universität Freiburg

nestor Handbuch. – Eine kleine Enzyklopädie der digitalen Langzeitarchivierung. Version 2.3, 2010 hrsg. v. H. Neuroth, A. Oßwald, R. Scheffel, S. Strathmann, K. Huth im Rahmen des Projektes: nestor - Kompetenznetzwerk Langzeitarchivierung und Langzeitverfügbarkeit digitaler Ressourcen für Deutschland. urn:nbn:de:0008-2010071949

Best practices for sharing and archiving datasets – Polar data catalogue, 2014

Long-term preservation of biomedical research data, 2018

Scientific collections, 2009 comprize also artefacts, technical objects, DNA samples

The FAIR Principles: First Generation Implementation Choices and Challenges (all articles in a single PDF), 2019

FAIR Data and Services in Biodiversity Science and Geoscience, DiSSCo context, Lannom et al. 2019

Provisional Data Management Plan for DiSSCo infrastructure, 2019: "All data that can be linked to collection objects (specimens) are in scope."

DiSSCo Technical Infrastructure, see also DiSSCo Prepare and DiSSCo Knowledge Base

RDA group Interoperable Data Archiving and Migration Using the RDRI Working Group Recommendations with iROD and DVUploader, see https://www.rd-alliance.org/sites/default/files/InteroperableDatasetExchange.RDA2020_0.pdf , BagIt specification complemented with BagIt Profiles, recommending to include DataCite metadata in each package

RDA group Research Data Repository Interoperability WG Final Recommendations with pdf.

RDA group FAIR Data Maturity Model WG

RDA group Assessment of Data Fitness for Use WG

Wikipedia Digital preservation

Neuroth et al. 2014 Nestor -Langzeitarchivierung von Forschungsdaten - eine Bestandsaufnahme

Data complexity (in the size and intricacy of data): Size, structure, variety, abstraction

problem of researchers to find appropriate data repositories for published data, see data repositories recommended by NATURE under https://www.nature.com/sdata/policies/repositories and policies for data preservation there

Bähr, T. 2016 Dienstleistungen für die Digitale Langzeitarchivierung

Standards, iso norms, standardisation

Table on ISO and DIN norms relevant for archiving

Norm Title Purpose/ Notes
ISO 11506:2017 Document management applications — Archiving of electronic data — Computer output microform (COM)/Computer output laser disc (COLD) it applies to different types of electronic data, such as text and two-dimensional graphic data which can be represented as a black-and-white image
ISO 14641:2018 Electronic document management – Design and operation of an information system for the preservation of electronic documents – Specifications Attention: This document is not applicable to information systems in which users have the ability to substitute or alter documents after capture.
ISO 14721:2012 Space data and information transfer systems — Open archival information system (OAIS) — Reference model see Reference Model for an open archival information system (OAIS): OAIS/ISO 14721 Version 2012 online: https://public.ccsds.org/pubs/650x0m2.pdf; see also GFBio Overview on Iso Standards for Digital Archives
ISO 15948:2004 Information technology — Computer graphics and image processing — Portable Network Graphics (PNG): Functional specification specifies a datastream and an associated file format, Portable Network Graphics (PNG)
ISO 16363/TDR Space data and information transfer systems — Audit and certification of trustworthy digital repositories see also GFBio Overview on Iso Standards for Digital Archives
ISO 16919:2014 Space data and information transfer systems — Requirements for bodies providing audit and certification of candidate trustworthy digital repositories see also GFBio Overview on Iso Standards for Digital Archives
ISO 19005-1:2005 Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1) how to use the Portable Document Format (PDF) 1.4 for long-term preservation of electronic documents. It is applicable to documents containing combinations of character, raster and vector data.
ISO 19566-1:2016 Information technology — JPEG Systems — Part 1: Packaging of information using codestreams and file formats describes common elements of a system layer for JPEG standards, referred to as JPEG Systems (example: JPG)
ISO 20614:2017 Information and documentation – Data exchange protocol for interoperability and preservation (DEPIP) DEPIP specifies a standardized framework for the various data (including both data and related metadata) exchange transactions between an archive and its producers and consumers. Interchanges between archives (including archives integrated in organizations, public archives, storage service suppliers) are also considered....
DIN 31644:2012-04 Information und Dokumentation - Kriterien für vertrauenswürdige digitale Langzeitarchive (Information and documentation - Criteria for trustworthy digital archives)
DIN 31645:2011-11 Leitfaden zur Informationsübernahme in digitale Langzeitarchive (Information and documentation - Guide to the transfer of information objects into digital long-term archives)



Nestor – Standardisation by DNB

DOA architecture with DONA Specification, 2018

SIARD-Dateiformat und Standard eCH-0165 SIARD-Formatspezifikation (SIARD = Software-Independent Archival of Relational Databases), 2018. Es handelt sich um eine normative Beschreibung eines Dateiformats für die langfristige Erhaltung von relationalen Datenbanken, siehe eCH-0165 SIARD Format Specification

Software tools

DBPTK Database Preservation Toolkit

SIARD Suite with SIARD Suite GitHub

KEEP Solutions Portugal, tools for preservation

E-ARK with Deliverables, E-ARK AIP pilot specification and E-ARK SIP Specification for Submission Information Packages

eArchiving project services and tools

LOCKSS technology with LOCKSS software

Preserveware – A digital preservation hub, Tools

PERICLES github and publication under file:///C:/Users/Gast/Downloads/PERICLES_AV_Insider_rd_publication.pdf

CORDRA software - Highly configurable software for managing digital objects at scale.

CNRI Technical projects

Open Preservation Foundation Products

Rosetta from ExLibris group

Community standards for data exchange in collection domain

useful to improve functional long-term preservation by including schema definitions as xsd?

Data exchange standards, protocols and formats relevant for the collection data domain, overview with emphasis on the GFBio network

FAIRsharing org

Digital Curation Centre DCC: List of disciplinary metadata standards, see also Metadata Guidance and RDA Metadata Standards Directory

Archive (file) formats and archive files

Electronic file formats, see "Archiving"

Public Record Office and Nôm 喃 (PRONOM) is a web-based technical registry to support digital preservation services. It is an operational public file format registry, see PRONOM

FACILE – Service de validation de formats: Vérifier l'éligibilité de vos documents à un archivage sur la plateforme PAC du CINES

Sustainability of Digital Formats: Planning for Library of Congress Collections: Format Descriptions

File format. A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

List of archive formats

Archive file

Digital File Types - Preservation

Best File Formats for Archiving by Fabian M. Suchanek, 2019

Recommended File Formats for Archiving Research Data

Media Types (formerly known as MIME types) specified by RFC6657

FAIR data archiving and "distributed" data archiving, visions and concepts

LOCKSS Lessons Learned in Successful Community Collaboration, LOCKSS as digital library program in the digital preservation field

Archiving in a FAIR way, an Overview of Data Archive Costs

Prompting an EOSC in practice: Final report 2018

Save Archive FEderation SAFE-PLN with MoU

Archiving and long-term storage organisations in Europe with AIPs from the Science Collection domain

The table includes a first selection of trusted data repositories/ data centers with goals in archiving scientific collection and biodiversity research data (last changes, April 2020).

name, country kind of organisation/ affiliation with respect to archiving services general mission, scope science collection metadata standards used for AIPs (see GFBio checklist) archive formats (see FACILE checklist) references and pilot studies AIP-PIDs contact person in WG4 context (preferably WG4 members) notes, certification
CINES, France national public institution national e-infrastructure DublinCore with extension? FACILE - list pour un archivage sur la plateforme PAC du CINES pilot description, ICEDIG document ePIC... Nicolas Cazenave archiving together with EUDAT-CDI?
EGI, The Netherlands a federated (European) e-Infrastructure, publicly funded European and international e-infrastructure DublinCore with extension? various archive formats? ?
FinBIF, archiving network, Finnland service infrastructure at one national history museum, publicly funded national e-infrastructure Darwin Core few selected archive formats?, e.g., XML+XSD?, JPEG 1.0?, tiff? pilot description, ICEDIG document, Schulman et al. (2021) HTTP URI... ?
GBIF data publishers, network of long-term storage and archiving institutions/ organizations an international federated e-Infrastructure, funded by member states and by single participating archiving institutions international e-infrastructure + national e-infrastructure + institutional e-infrastructure archiving done by GBIF data publishers via Darwin Core, see GBIF Darwin Core; alternatively ABCD few selected archive formats?, e.g. XML+XSD, JPEG 1.0? DOI, HTTP URI... Fabien Caviere local installation of IPT: GBIF Integrated Publishing Toolkit or BioCASe provider software generating AIPs for local archiving published data assets; ; GBIF nodes may act as data publishers on the national level; GBIF downloads are stored on GBIF servers for 6 months, see https://www.gbif.org/faq?q=DOI
GFBio network of data centers and archiving institutions, Germany service infrastructure at several national history museums and other archiving institutions, publicly funded national e-infrastructure + institutional e-infrastructure ABCD, Darwin Core few selected archive formats?, e.g., XML+XSD?, JPEG 1.0?, tiff?, wav? and? data archiving descriptions HTTP URI, DOI ... Peter Grobe, Tanja Weibulat AIPs for archiving published and non-published data assets from the science collection domain; partly together with regional (super)computing centers
GWDG, Germany institute operated and funded by the University of Göttingen and the Max-Planck-Gesellschaft zur Förderung der Wissenschaften e. V. (GmbH) international e-infrastructure + institutional e-infrastructure ? various archive formats? ePIC, DOI Sven Bingert Offering ePIC service for AIPs for different science domains, with public repositories for scientific data
VIAA, now meemoo, Belgium Belgique/ Flemish institute for archives, publicly funded national and regional e-infrastructure DublinCore with extension? various archive formats used in library domain?? Brecht Declercq Flemish Institute for Archives
Zenodo, Switzerland public services operated by CERN (the latter funded by member states) international e-infrastructure OAI-PMH and others, see under Zenodo metadata formats various archive formats used in library domain? pilot description, ICEDIG document DOI... Donat Agosti? general-purpose open-access repository, AIPs for archiving published data assets

Images of scientific collections, scientific collection objects and parts of them as well as of of natural science taxa with occurrence and descriptive data are in the focus of scientific collections. Other images and information gained for research studies and published in scientific papers might be linked to scientitic collection object data. This data might be long-term stored and even archived, e.g., by the BioImage Archive, see Ellenberg et al. (2018), BioStudies Archive, see Sarkans et al. (2018) and ArrayExpress.

see also TIB Hannover, https://www.tib.eu/de/publizieren-archivieren/digitale-langzeitarchivierung and https://www.tib.eu/fileadmin/Daten/presse/dokumente/baehr-schwab_bub_2018-11.pdf

Further materials for discussion

https://www.gbif.org/data-processing

FAIR principles

Data on the Web Best Practices

LERU Roadmap for Research Data

ICEDIG Deliverables: https://icedig.eu/content/deliverables

Digitisation infrastructure design for EUDAT / CINES: Report 2019: specifies the requirements for adapting CINES & EUDAT services for long-term storage of large-scale digitised biodiversity data

Digitisation infrastructure design for Zenodo 2019

Design of a collection digitisation dashboard: Report 2019, MIDS in Table 8

Digitalisation infrastructure for national open science clouds Report 2019, Finland

California Digital Library with CDL Guidelines for Digital Objects(CDL GDO)

What is a digital object (philosophy)

LIBER Fairness Repositories Report

IANUS IT-Empfehlungen für den nachhaltigen Umgang mit digitalen Daten in den Altertumswissenschaften: Datenbanken

Levels of digital preservation of the National Digital Stewardship Alliance (NDSA)

Digital Preservation Handbook of the Digital Preservation Coalition

CETAF Specimen Preview Profile (SPP) with Sourceforce Persistent Collection Objects Identifiers and Best practices for stable URIs and CETAF Specimen URI Tester, see Güntsch et al. (2017) Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects

UK National Archives: Archive Principles and Practice: an introduction to archives for non-archivists, 2016, see 3.5.6 and 3.5.7

GFBio Data archiving

Integrating Institutional Archives with Disciplinary Web Repositories Workshop (iDigBio related, January 2020

GFBio OAIS standard data pipelines for collection and specimen data

Heuscher, Stephan & Jaermann, Stephan & Keller-Marxer, Peter & Moehle, Frank. (2004). Providing Authentic Long-term Archival Access to Complex Relational Data.. Proceedings of the ESA/ESRIN Symposium PV-2004: Ensuring Long-Term Preservation and Adding Value to Scientific and Technical Data, Frascati, Italy, October 2004. ESA WPP. 241-261. see https://arxiv.org/abs/cs/0408054 with pdf

Core Trust Seal Certification Glossary, based on OAIS terms

UUID discussion: Triebel, D., Reichert, W., Bosert, S., Feulner, M., Osieko Okach, D., Slimani, A. & Rambold, G. 2018. A generic workflow for effective sampling of environmental vouchers with UUID assignment and image processing. – Database, 2018 (Article ID bax096), 1–10. (doi.org/10.1093/database/bax096), see https://academic.oup.com/database/article-abstract/doi/10.1093/database/bax096/4797113.

Handle system, handle-based system, e.g., DOI system with DOI registration agencies like DataCite

LSID is a URN specification; LSID resolver, see LSID in Wikipedia, issuing authorities like zoobank (is not registered with the Internet Assigned Numbers Authority - IANA)

CETAF stable identifiers (CSI) for specimens, see Güntsch, A., Hyam, R., Hagedorn, G., Chagnoux, S., Röpert, D., Casino A., Droege, G., Glöckler, F., Gödderz, K., Groom, Q., Hoffmann, J., Holleman, A., Kempa, M., Koivula, H., Marhold, K., Nicolson, N., Smith, V. S. & Triebel, D. 2017. Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects. – Database, 2017, 1–9. (doi.org/10.1093/database/bax003)

HTTP URI (see RFC3986)

ICEDIG Digital Specimen Repository with Natural Science Identifier (NSId); see https://nsidr.org/#objects/?query=*%3A*&sortFields=/name, issuing authority, e.g, https://www.allianceforbio.org/: "Technically, an NSId is a unique alphanumeric name string registered in the Handle System that acts as an opaque abstract reference to the thing that is identified; in this case, a Digital Specimen. Administration of the Handle System globally is a shared responsibility overseen at its top-level by the DONA Foundation (Geneva). At the sub-global (Europe and other continents) level, DiSSCo is presently (mid-2020) analysing different options for the technical implementation."

DOI; see also Handle Resolver under https://www.handle.net/

multilingual European DOI Registration Agency (mEDRA) with DOI Multiple Resolution - a multiple resolution service, where one DOI points to multiple resources and services associated to the DOI and "Learning objects"

IGSN handle system for geosamples, with IGSN e.V. as registration service and a number of allocating agents of the IGSN

ePIC Persistent Identifiers for eResearch, handle system

DID Decentralized Identifiers W3C proposed recommendation

A Persistent Identifier (PID) policy for the European Open Science Cloud (EOSC), 2020, 10.2777/926037; Wittenberg as co-author

PIDINST - Persistent identifiers for instruments. https://datascience.codata.org/articles/10.5334/dsj-2020-018/; (ePIC related)

ARK IDs, Archival Resource Key with N4T Resolver: Keeps names (identifiers) persistent, forwarding (resolving) them to the best known web addresses. Names --> Things: Any kind of name – ARK, DOI, URN, Handle, PMID, PDB, Taxon, GRID, arxiv, ISSN, ... --> Any kind of thing – web pages, data, physical specimens, vocabulary terms, living beings, groups, ...; for authorities see https://n2t.net/e/ark_ids.html; cooperation with DataCite

REFEDS (Research and Education FEDerations) with eduPersons

Arms, W.Y. 1995. Key Concepts in the Architecture of the Digital Library. https://www.dlib.org/dlib/July95/07arms.html

Kahn R. & Wilensky R. 1995. A Framework for Distributed Digital Object Services. http://www.cnri.reston.va.us/home/cstr/arch/k-w.html with CNRI and DOI

Weigel, T., Kindermann, S. and Lautenschlager, M., 2014. Actionable Persistent Identifier Collections. Data Science Journal, 12, pp.191–206. DOI: http://doi.org/10.2481/dsj.12-058

Klump, J et al 2017 Editorial: 20 Years of Persistent Identifiers – Applications and Future Directions. Data Science Journal, 16: 52, pp. 1–7, DOI: https://doi.org/10.5334/dsj-2017-052

T. Weigel, U. Schwardmann, J. Klump, S. Bendoukha & R. Quick. Making data and workflows findable for machines. Data Intelligence 2(2020), 40–46. doi: 10.1162/dint_a_00026

Schwardmann, U., 2020. Digital Objects – FAIR Digital Objects: Which Services Are Required?. Data Science Journal, 19(1), p.15. DOI: http://doi.org/10.5334/dsj-2020-015

Harjes, J., Link, A., Weibulat, T., Triebel, D. & Rambold, G. 2020. FAIR digital objects in environmental and life sciences should comprise workflow operation design data and method information for repeatability of study setups and reproducibility of results, Database, 2020 (Article ID baaa059), 1–20. (doi.org/10.1093/database/baaa059).

Natural Science Identifiers versus CETAF stable identifiers with discussion on IGSN relation

CatRIS - Catalogue of Research Infrastructure Services with ELViS listed

The research data repository of the Environmental Data Initiative (EDI) and LTER initiative: Gries et al. 2020. Change in Pictures: Creating best practices in archiving ecological imagery for reuse; see also https://dilcis.eu/images/Specifications/AIP/DASBOARD_E-ARK_AIP_1_0.pdf

EUROCRIS with The Common European Research Information Format (CERIF). It is the comprehensive information model for the domain of scientific research. It is intended to support interchange of research information between and with CRISs. It is used by OpenAIRE.

Archiving of BLOBs and similar large binary objects, see https://de.wikipedia.org/wiki/Binary_Large_Object

An Overview of End-to-End Entity Resolution for Big Data: https://dl.acm.org/doi/abs/10.1145/3418896 (relation between linked entities and physical objects)

COSTS of federal scientific collections

https://iwgsc.nal.usda.gov/economic-analyses-federal-scientific-collections

Schindel, D. E. and the Economic Study Group of the Interagency Working Group on Scientific Collections. 2020. “Economic Analyses of Federal Scientific Collections: Methods for Documenting Costs and Benefits.” Report. Washington, DC: Smithsonian Scholarly Press. https://doi.org/10.5479/si.13241612

2019: Biodiversity_Next Symposium (SI55): "Federated Infrastructures for Sustainable Biodiversity Data Management"

SI55 talks:

There were 6 talks with published abstracts (see hyperlinks and DOIs), four of them were strongly related to WP4. The results gave a good overview on the landscape of federated repositories in the Biodiversity domain.

2020: CETAF Joint ISTC and Digitisation Working Groups Virtual Meeting

COST MOBILISE WG4 talk:



Further materials




Back to Working Group WG4

Back to WG4 Workshop "Data storage and archiving strategies" in Sofia (NMNHS)

Back to WG4 Workshop "Towards a documentation and guideline" in Warsaw

Back to MOBILISE website

see also Definitions of core terms in the data archiving context