Difference between revisions of "Useful links and materials"

From COST-MOBILISE Wiki
Jump to: navigation, search
(General)
(Archive (file) formats and archive files)
(8 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
[https://irods.org/uploads/2014/07/Principles-of-Archival-of-Digital-Assets.pdf Principles of Archival of Digital Assets], published by [https://irods.org/ iRODS], 2014 (bit preservation and functional preservation)
 
[https://irods.org/uploads/2014/07/Principles-of-Archival-of-Digital-Assets.pdf Principles of Archival of Digital Assets], published by [https://irods.org/ iRODS], 2014 (bit preservation and functional preservation)
  
[https://opus4.kobv.de/opus4-fhpotsdam/frontdoor/index/index/docId/1373 Digitale Bestandserhaltung in der Praxis - Entwicklung eines Preservation-Planning-Konzepts zur Langzeitarchivierung von digitalem Kulturgut am Beispiel der Verbundlösung Berlin-Brandenburg] by C. Loose, 2016, FH Potsdam
+
[https://opus4.kobv.de/opus4-fhpotsdam/frontdoor/index/index/docId/1373 Digitale Bestandserhaltung in der Praxis Entwicklung eines Preservation-Planning-Konzepts zur Langzeitarchivierung von digitalem Kulturgut am Beispiel der Verbundlösung Berlin-Brandenburg] by C. Loose, 2016, FH Potsdam
  
 
[https://d-nb.info/1047826933/34 Funktionale Langzeitarchivierung digitaler Objekte –  Erfolgsbedingungen des Einsatzes von Emulationsstrategien], Suchodoletz  2009, Universität Freiburg  
 
[https://d-nb.info/1047826933/34 Funktionale Langzeitarchivierung digitaler Objekte –  Erfolgsbedingungen des Einsatzes von Emulationsstrategien], Suchodoletz  2009, Universität Freiburg  
  
[http://www.arcticnet.ulaval.ca/docs/PDC_Best_Practices_FULL.pdf Best practices for sharing and archiving datasets - Polar data catalogue], 2014
+
[http://www.arcticnet.ulaval.ca/docs/PDC_Best_Practices_FULL.pdf Best practices for sharing and archiving datasets Polar data catalogue], 2014
  
 
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144948/ Long-term preservation of biomedical research data], 2018
 
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144948/ Long-term preservation of biomedical research data], 2018
Line 36: Line 36:
 
===Standards, iso norms, standardisation===
 
===Standards, iso norms, standardisation===
  
[https://wiki.dnb.de/pages/viewpage.action?pageId=120327710 Nestor- Standardisation] by DNB
+
[https://wiki.dnb.de/pages/viewpage.action?pageId=120327710 Nestor Standardisation] by DNB
  
[https://www.iso.org/standard/74338.html ISO 14641:2018: Electronic document management -- Design and operation of an information system for the preservation of electronic documents -- Specifications]  
+
[https://www.iso.org/standard/74338.html ISO 14641:2018: Electronic document management Design and operation of an information system for the preservation of electronic documents Specifications]  
  
[https://www.iso.org/standard/68562.html ISO 20614:2017: Information and documentation -- Data exchange protocol for interoperability and preservation (DEPIP)]
+
[https://www.iso.org/standard/68562.html ISO 20614:2017: Information and documentation Data exchange protocol for interoperability and preservation (DEPIP)]
  
 
[https://gfbio.biowikifarm.net/wiki/ISO_Standards_for_Digital_Archives Iso standards for Digital Archives], including OAIS reference model; overview with emphasis on the [https://www.gfbio.org/ GFBio] network
 
[https://gfbio.biowikifarm.net/wiki/ISO_Standards_for_Digital_Archives Iso standards for Digital Archives], including OAIS reference model; overview with emphasis on the [https://www.gfbio.org/ GFBio] network
Line 62: Line 62:
 
[https://www.lockss.org/ LOCKSS technology] with [https://lockss.github.io/ LOCKSS software]
 
[https://www.lockss.org/ LOCKSS technology] with [https://lockss.github.io/ LOCKSS software]
  
[http://www.preserveware.com/browse/tools/ Preserveware -- A  digital preservation hub, Tools]
+
[http://www.preserveware.com/browse/tools/ Preserveware A  digital preservation hub, Tools]
  
 
[https://github.com/pericles-project PERICLES github] and publication under file:///C:/Users/Gast/Downloads/PERICLES_AV_Insider_rd_publication.pdf
 
[https://github.com/pericles-project PERICLES github] and publication under file:///C:/Users/Gast/Downloads/PERICLES_AV_Insider_rd_publication.pdf
Line 75: Line 75:
 
===Archive (file) formats and archive files===
 
===Archive (file) formats and archive files===
  
[https://facile.cines.fr/ FACILE - Service de validation de formats: Vérifier l'éligibilité de vos documents à un archivage sur la plateforme PAC du CINES]  
+
[https://facile.cines.fr/ FACILE Service de validation de formats: Vérifier l'éligibilité de vos documents à un archivage sur la plateforme PAC du CINES]  
  
 
[https://en.wikipedia.org/wiki/File_format File format]. A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.
 
[https://en.wikipedia.org/wiki/File_format File format]. A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.
Line 84: Line 84:
  
 
[https://suchanek.name/texts/archiving/#TOC1 Best File Formats for Archiving] by Fabian M. Suchanek, 2019
 
[https://suchanek.name/texts/archiving/#TOC1 Best File Formats for Archiving] by Fabian M. Suchanek, 2019
 +
 +
[https://www.ub.tum.de/forschungsdaten-publizieren Recommended File Formats for Archiving Research Data]
  
 
Public Record Office and Nôm 喃 ([https://en.wikipedia.org/wiki/PRONOM PRONOM]) is a web-based technical registry to support digital preservation services. It is an operational public file format registry.
 
Public Record Office and Nôm 喃 ([https://en.wikipedia.org/wiki/PRONOM PRONOM]) is a web-based technical registry to support digital preservation services. It is an operational public file format registry.
Line 197: Line 199:
  
 
COST MOBILISE WG4 talk:
 
COST MOBILISE WG4 talk:
*[https://species-id.net/o/media/1/1b/CETAF-ISTC-2020-COST-MOBILISE-WG4_Archiving_fin.pdf Data archiving strategies in regard to CETAF facilities and planned DiSSCo services - highlighted by COST Mobilise] (Dagmar Triebel)
+
*[https://species-id.net/o/media/1/1b/CETAF-ISTC-2020-COST-MOBILISE-WG4_Archiving_fin.pdf Data archiving strategies in regard to CETAF facilities and planned DiSSCo services highlighted by COST Mobilise] (Dagmar Triebel)
  
  

Revision as of 14:25, 28 July 2020

General

EOSC Marketplace for Data Storage and Data Archiving, see also OCRE

Principles of Archival of Digital Assets, published by iRODS, 2014 (bit preservation and functional preservation)

Digitale Bestandserhaltung in der Praxis – Entwicklung eines Preservation-Planning-Konzepts zur Langzeitarchivierung von digitalem Kulturgut am Beispiel der Verbundlösung Berlin-Brandenburg by C. Loose, 2016, FH Potsdam

Funktionale Langzeitarchivierung digitaler Objekte – Erfolgsbedingungen des Einsatzes von Emulationsstrategien, Suchodoletz 2009, Universität Freiburg

Best practices for sharing and archiving datasets – Polar data catalogue, 2014

Long-term preservation of biomedical research data, 2018

Scientific collections, 2009 comprize also artefacts, technical objects, DNA samples

The FAIR Principles: First Generation Implementation Choices and Challenges (all articles in a single PDF), 2019

FAIR Data and Services in Biodiversity Science and Geoscience, DiSSCo context, Lannom et al. 2019

Provisional Data Management Plan for DiSSCo infrastructure, 2019: "All data that can be linked to collection objects (specimens) are in scope."

DiSSCo Technical Infrastructure, see also DiSSCo Prepare and DiSSCo Knowledge Base

RDA group Interoperable Data Archiving and Migration Using the RDRI Working Group Recommendations with iROD and DVUploader, see https://www.rd-alliance.org/sites/default/files/InteroperableDatasetExchange.RDA2020_0.pdf , BagIt specification complemented with BagIt Profiles, recommending to include DataCite metadata in each package

RDA group Research Data Repository Interoperability WG Final Recommendations with pdf.

RDA group FAIR Data Maturity Model WG

RDA group Assessment of Data Fitness for Use WG

problem of researchers to find appropriate data repositories for published data, see data repositories recommended by NATURE under https://www.nature.com/sdata/policies/repositories and policies for data preservation there

Standards, iso norms, standardisation

Nestor – Standardisation by DNB

ISO 14641:2018: Electronic document management – Design and operation of an information system for the preservation of electronic documents – Specifications

ISO 20614:2017: Information and documentation – Data exchange protocol for interoperability and preservation (DEPIP)

Iso standards for Digital Archives, including OAIS reference model; overview with emphasis on the GFBio network

DOA architecture with DONA Specification, 2018

SIARD-Dateiformat und Standard eCH-0165 SIARD-Formatspezifikation (SIARD = Software-Independent Archival of Relational Databases), 2018. Es handelt sich um eine normative Beschreibung eines Dateiformats für die langfristige Erhaltung von relationalen Datenbanken, siehe eCH-0165 SIARD Format Specification

Software tools

DBPTK Database Preservation Toolkit

SIARD Suite with SIARD Suite GitHub

KEEP Solutions Portugal, tools for preservation

E-ARK with Deliverables, E-ARK AIP pilot specification and E-ARK SIP Specification for Submission Information Packages

eArchiving project services and tools

LOCKSS technology with LOCKSS software

Preserveware – A digital preservation hub, Tools

PERICLES github and publication under file:///C:/Users/Gast/Downloads/PERICLES_AV_Insider_rd_publication.pdf

Community standards for data exchange in collection domain

useful to improve functional long-term preservation by including schema definitions as xsd?

Data exchange standards, protocols and formats relevant for the collection data domain, overview with emphasis on the GFBio network

FAIRsharing org

Archive (file) formats and archive files

FACILE – Service de validation de formats: Vérifier l'éligibilité de vos documents à un archivage sur la plateforme PAC du CINES

File format. A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

List of archive formats

Archive file

Best File Formats for Archiving by Fabian M. Suchanek, 2019

Recommended File Formats for Archiving Research Data

Public Record Office and Nôm 喃 (PRONOM) is a web-based technical registry to support digital preservation services. It is an operational public file format registry.

FAIR data archiving and "distributed" data archiving, visions and concepts

LOCKSS Lessons Learned in Successful Community Collaboration, LOCKSS as digital library program in the digital preservation field

Archiving in a FAIR way, an Overview of Data Archive Costs

Prompting an EOSC in practice: Final report 2018

Save Archive FEderation SAFE-PLN with MoU

Archiving and long-term storage organisations in Europe with AIPs from the Science Collection domain

The table includes a first selection of trusted data repositories/ data centers with goals in archiving scientific collection and biodiversity research data (last changes, April 2020).

name, country kind of organisation/ affiliation with respect to archiving services general mission, scope science collection metadata standards used for AIPs (see GFBio checklist) archive formats (see FACILE checklist) references and pilot studies AIP-PIDs contact person in WG4 context (preferably WG4 members) notes, certification
CINES, France national public institution national e-infrastructure DublinCore with extension? FACILE - list pour un archivage sur la plateforme PAC du CINES pilot description, ICEDIG document ePIC... Nicolas Cazenave archiving together with EUDAT-CDI?
EGI, The Netherlands a federated (European) e-Infrastructure, publicly funded European and international e-infrastructure DublinCore with extension? various archive formats? ?
FinBIF, archiving network, Finnland service infrastructure at one national history museum, publicly funded national e-infrastructure Darwin Core few selected archive formats?, e.g., XML+XSD?, JPEG 1.0?, tiff? pilot description, ICEDIG document HTTP URI... ?
GBIF data publishers, network of long-term storage and archiving institutions/ organizations an international federated e-Infrastructure, funded by member states and by single participating archiving institutions international e-infrastructure + national e-infrastructure + institutional e-infrastructure archiving done by GBIF data publishers via Darwin Core, see GBIF Darwin Core; alternatively ABCD few selected archive formats?, e.g. XML+XSD, JPEG 1.0? DOI, HTTP URI... Fabien Caviere local installation of IPT: GBIF Integrated Publishing Toolkit or BioCASe provider software generating AIPs for local archiving published data assets; ; GBIF nodes may act as data publishers on the national level; GBIF downloads are stored on GBIF servers for 6 months, see https://www.gbif.org/faq?q=DOI
GFBio network of data centers and archiving institutions, Germany service infrastructure at several national history museums and other archiving institutions, publicly funded national e-infrastructure + institutional e-infrastructure ABCD, Darwin Core few selected archive formats?, e.g., XML+XSD?, JPEG 1.0?, tiff?, wav? and? data archiving descriptions HTTP URI, DOI ... Peter Grobe, Tanja Weibulat AIPs for archiving published and non-published data assets from the science collection domain; partly together with regional (super)computing centers
GWDG, Germany institute operated and funded by the University of Göttingen and the Max-Planck-Gesellschaft zur Förderung der Wissenschaften e. V. (GmbH) international e-infrastructure + institutional e-infrastructure ? various archive formats? ePIC, DOI Sven Bingert Offering ePIC service for AIPs for different science domains, with public repositories for scientific data
VIAA, now meemoo, Belgium Belgique/ Flemish institute for archives, publicly funded national and regional e-infrastructure DublinCore with extension? various archive formats used in library domain?? Brecht Declercq Flemish Institute for Archives
Zenodo, Switzerland public services operated by CERN (the latter funded by member states) international e-infrastructure OAI-PMH and others, see under Zenodo metadata formats various archive formats used in library domain? pilot description, ICEDIG document DOI... Donat Agosti? general-purpose open-access repository, AIPs for archiving published data assets

Images of scientific collections, scientific collection objects and parts of them as well as of of natural science taxa with occurrence and descriptive data are in the focus of scientific collections. Other images and information gained for research studies and published in scientific papers might be linked to scientitic collection object data. This data might be long-term stored and even archived, e.g., by the BioImage Archive, see Ellenberg et al. (2018), BioStudies Archive, see Sarkans et al. (2018) and ArrayExpress.

Further materials for discussion

https://www.gbif.org/data-processing

FAIR principles

Data on the Web Best Practices

LERU Roadmap for Research Data

ICEDIG Deliverables: https://icedig.eu/content/deliverables

Digitisation infrastructure design for EUDAT / CINES: Report 2019: specifies the requirements for adapting CINES & EUDAT services for long-term storage of large-scale digitised biodiversity data

Digitisation infrastructure design for Zenodo 2019

Design of a collection digitisation dashboard: Report 2019, MIDS in Table 8

Digitalisation infrastructure for national open science clouds Report 2019, Finland

California Digital Library with CDL Guidelines for Digital Objects(CDL GDO)

What is a digital object (philosophy)

LIBER Fairness Repositories Report

IANUS IT-Empfehlungen für den nachhaltigen Umgang mit digitalen Daten in den Altertumswissenschaften: Datenbanken

Levels of digital preservation of the National Digital Stewardship Alliance (NDSA)

Digital Preservation Handbook of the Digital Preservation Coalition

CETAF Specimen Preview Profile (SPP) with Sourceforce Persistent Collection Objects Identifiers and Best practices for stable URIs and CETAF Specimen URI Tester, see Güntsch et al. (2017) Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects

UK National Archives: Archive Principles and Practice: an introduction to archives for non-archivists, 2016, see 3.5.6 and 3.5.7

GFBio Data archiving

Integrating Institutional Archives with Disciplinary Web Repositories Workshop (iDigBio related, January 2020

GFBio OAIS standard data pipelines for collection and specimen data

ePIC Persistent Identifiers for eResearch, DOI and HTTP URI (see RFC3986)

Heuscher, Stephan & Jaermann, Stephan & Keller-Marxer, Peter & Moehle, Frank. (2004). Providing Authentic Long-term Archival Access to Complex Relational Data.. Proceedings of the ESA/ESRIN Symposium PV-2004: Ensuring Long-Term Preservation and Adding Value to Scientific and Technical Data, Frascati, Italy, October 2004. ESA WPP. 241-261. see https://arxiv.org/abs/cs/0408054 with pdf

Core Trust Seal Certification Glossary, based on OAIS terms

2019: Biodiversity_Next Symposium (SI55): "Federated Infrastructures for Sustainable Biodiversity Data Management"

SI55 talks:

2020: CETAF Joint ISTC and Digitisation Working Groups Virtual Meeting

COST MOBILISE WG4 talk:



Further materials




Back to Working Group WG4

Back to WG4 Workshop "Data storage and archiving strategies" in Sofia (NMNHS)

Back to WG4 Workshop "Towards a documentation and guideline" in Warsaw

Back to MOBILISE website

see also Definitions of core terms in the data archiving context