Implementing the INGV Data Policy

Authors: G. Puglisi (coordinator), R. Basili, A.G. Chiodetti, A. Cianchi, M. Drudi, C. Freda, M. Locati, M. Pignone, A. Sangianantoni
Data: 24 July 2018

Introduction

The document outlining the principles of the INGV Data Policy was approved and formalized with Presidential Decree no. 200 (DP200) of 26 April 2016.

The definition of an INGV Data Policy responds to the need to handle the multitude of research data that are produced, and to set out the inspiring principles upon which to base a common institutional management of said data, and to manage how they are accessed, used, and reused. The INGV Data Policy is based upon the fundamental principle of «Open Access.” In this sense, INGV adopts a policy that enables free, open, full, and promptly access to its research data, complying with the principles established in EU and national regulations, and in accordance with its own institutional purposes, structure, and organization.

Since the first actions undertaken (DP223 11 June 2015) to reach full implementation of the document’s inspiring principles, it has been clear that one cannot speak of “Data Policy” without having adequate knowledge of the intangible heritage INGV has in the enormous mass of scientific data produced in its research activities in the various Sections. INGV has therefore identified the Data Registry as the key instrument for an effective and efficient management of INGV’s research data. In order to develop this Registry, the census of data produced by all INGV’s Sections was initiated in the second half of 2016, concluding on July 2017.

This first census found that INGV manages more than 250 types of research data, differing by origin, level of processing, and organization. This diversity is one of the aspects of INGV’s cultural richness, while at the same time representing a challenge in terms of how such a heritage is to be managed. Of course, compiling the Registry with its information and more than 250 types of data is no simple operation, nor is it one to be done all at once, but requires adequate programming making it possible to proceed by assessing any incompleteness in the information or difficulties in the procedures, and by carrying out the necessary corrections and implementations.

This document, in keeping with DP200, provides the framework of rules and procedures for managing the data, to be of support to the implementation of the aforementioned INGV Data Policy.

In particular, this document defines the key terms of use for managing INGV’s research data (Chapter 1); illustrates the objectives and strategies for implementing the INGV Data Policy and the activity plan for the starting-up, implementation, and ordinary management of the Data Registry (Chapter 2); defines the procedures for managing the data and the Data Registry (Chapter 3 and Chapter 4, respectively); defines the procedures for attributing the DOIs (Digital Object Identifiers) and for the management of Territorial data (Chapter 5); and lastly defines the Data Management Office whose mission is to promote the Open Science paradigm and manage the Data Registry (Chapter 5).

1. Definitions

For the purposes of the application of the Data Policy, the following definitions are applied, in alphabetical order:

Access

A user’s authorization or right to access the data via an infrastructure associated with a database. Access may be broken down, in relation to the modes of user identification and registration, as follows:

  • Anonymous: in this case, access to the service(s) takes place with no identification or accreditation procedure;
  • Registered: in this case, access to the service(s) requires an identification by means of accreditation procedures which may be automated;
  • Authorized: in this case, access to the service(s) requires specific authorization by the database manager.

Depending on the complexity, access to the data may have one or more functions coded with the abbreviation “AAAI”: user identification (Authentication), authorization to use the services (Authorization), and accounting for the use of the services (Accounting). This infrastructure must guarantee compliance with personal data protection regulations [1].

Database

Collection of data that are systematically independent or methodically arranged and individually accessible via electronic or other means. The protection of databases does not extend to their content and leaves rights to said content unimpaired. [2] [3] [4]. The definition excludes the IT infrastructure that manages the database, including any Services for interaction with data.

Data

(understood here in the meaning of Research Data [5]): individual items or records of any nature (physical or digital), at any level of processing, and however organized, as well as research products even if unpublished, commonly deemed and accepted by the scientific community as necessary for validating scientific discoveries. In this setting, the term “data” refers indifferently to the raw data acquired by the sensor, the physical sample of any nature, or a product obtained from any analysis on the data at any level of processing, whether automatic or manual. The following is a definition of types of data cited in the document:

  • static data: the data, once generated and/or published, is no longer updated;
  • dynamic data: the data may be modified or extended after being published; these data may be further broken down based on the type of variation:
    • data subject to revision: modifiable for the purpose of improving their quality/reliability;
    • continuously produced data: data produced by sensors that generate a time series;
  • monitoring data: data originating from institutional observational systems, acquired or produced recurrently and systematically to characterize and comprehend the physical and/or chemical processes of the Earth system;
  • surveillance data: data processed for purposes of natural risk management, civil protection, or public utility, provided to public or private, national and/or international bodies and/or institutions, in accordance with formal agreements;
  • spatial data [6]: any geographically localized information;
  • open data: which is to say “usable, reusable, or redistributable freely by anyone, also for commercial purposes, subject at most to the request for attribution and sharing in the same manner” [7].

Embargo

Period of time during which one or more data access services is suspended. The embargo period may vary from six months to a maximum of three years, depending on the type of data. In the case of data obtained in the context of national or international initiatives in which the Institute takes official part and that involve different embargo rules, the Institute adopts these rules. The embargo applies only when appropriately justified. The metadata on the Data subject to embargo are at any rate publicly available to allow them to be searched (Discoverability).

Persistent Identifier

Alphanumeric code whose structure is defined within objects management systems (data, books, persons, physical samples, etc.), some of which follow open and internationally shared standards (Open Standard). The adjective “persistent” refers to the characteristic of guaranteeing that the association between the associated object’s identification code and its position on the Web is maintained over the long term. This means that the identifier management system and access to the data are maintained operational over time.

Istituto Nazionale di Geofisica e Vulcanologia

(National Institute of Geophysics and Volcanology; hereinafter, “INGV”): the public research body that is the producer and controller of the data that are the object of the INGV Data Policy, as well as of this document implementing said policy.

License

Negotiated instrument that defines in detail the rules and conditions to be complied with for using the data. The definition of the contractual content is left to autonomous determination by the contracting parties, which may define the mutual rights and duties at their discretion. All the data published by INGV must be accompanied by a Creative Commons License [8], whose type and version are always to be specified, and with reference always made to the full text of the license. Data classified as “open” cannot be subjected to Creative Commons licenses that do not permit derivative works, also for commercial purposes, i.e., licenses clearly bearing the “Non Commercial” (“NC”) and/or “Non Derivative” (“ND”) clauses, and/or any other clause limiting the possibility of reusing and redistributing the data.

Data Producer

Given that, juridically speaking, the Data Producer is INGV, in this specific document, applying the legal principle of being one and the same (in Italian “immedesimazione organica”), reference made to “Data Producer” is to be understood as being made to the individual research employee or research group that produces the data through their own working activity, with the internal resources (ordinary funds) and external resources (funds originating from projects, conventions, etc.) provided by INGV, including the personnel cost.

Data Registry

Collects the metadata describing the Research Data that are the result of the scientific production of INGV and/or managed and/or published by INGV, regardless of whether these data are static or dynamic, and regardless of the procedures followed for their creation. The Data Registry is publicly accessible through INGV’s institutional Web portal, and use thereof aims at satisfying needs within INGV, but also the needs of outside users.

Data Manager

Is the scientific manager and overseer of the data, representing the Data Producer; is responsible for the scientific quality and the updating of the data, and performs a role of coordinating the research group that produces the data. In the case of static data, the Data Manager is responsible for their integrity.

Database Technical Manager

Is responsible for the Database’s management and operation; has IT knowledge adequate for managing the technological infrastructure that supports the Database; and plays an operative role in the data-related management system. Also provides indications as to the tangible retrieval of data from the database and oversees the monitoring of the various “connectors” (e.g.: web pages, web services) that, in a safe, standardized way, interface the outside users with the content of the database. May coincide with the Data Manager.

Services

Any one of the following operations applicable to a database: data search, display, transfer, transformation, editing, and/or updating. The individual service available for a database may be:

  • Open: the service is freely available and accessible to all, without restrictions;
  • Limited: the service is available, but only under the conditions established or agreed upon by/with the holder of the right to exploit the intellectual property; a particular case is that of data for which a certain type of service (typically the transfer) may be accessed, for a predefined period of time.

Data Owner

According to the regulations in force [9], the data controller is the Public Administration that originally formed for its own use or commissioned from another party the document representing the data or that has it at its disposal. INGV is therefore the Data Controller regardless of the data’s Level, whether created with their own human and/or instrumental resources, or at any rate managed, or if commissioned for institutional purposes. In the case of complex situations involving institutions other than INGV, it is necessary to execute an agreement clarifying the aspects related to control of the data (e.g. agreements for activities under agreement with the Department of Civil Protection [10]). The transfer of a data from one information system to another does not modify control of the data.” [11]

Data Management Office

(Ufficio Gestione data – UGD): acts with the purpose of promoting the Open Science paradigm, by managing the Data Registry and guaranteeing the progressive opening of the research data and the improvement of their management in accordance with the regulations in force. The Office collaborates with and supports the Transparency Manager [12].

[1]Regulation (EU) 2016/679 of 27 April 2016, General Data Protection Regulation (GDPR).
[2]Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases.
[3]Legislative Decree no. 169 of 06 May 1999. Implementation of Directive 96/9/EC the legal protection of databases.
[4]Law no. 633 of 22 April 1941. Protection of copyright and other related rights
[5]The definition of “data” excludes the information related to natural or legal persons, bodies, or associations, identified or identifiable, even indirectly, through reference to any other information, including a personal identification number. “Data” also excludes the data relating to financial statements, protocol, and document management.
[6]Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). Definition: “any data with a direct or indirect reference to a specific location or geographical area.”
[7]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). Art. 68, paragraph 3, defines as open-type digital data those data that: 1) are available in accordance with the terms of a license that permits the use thereof by anyone, also for commercial purposes, in disaggregated format; 2) are accessible via information and communication technologies, including public and private telematic networks, in open formats, are suited to automatic use by computer programs, and are provided with the related metadata; 3) are made available free of charge via information and communication technologies, including public and private telematic networks, or are made available at the marginal costs incurred for reproduction and dissemination thereof.
[8]Creative Commons licenses. https://creativecommons.org/licenses/
[9]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). (with modifications introduced by Legislative Decree no. 179 of 26 August 2016, Art.1, paragraph g). Chapter I, Art.1, paragraph bb.
[10]Attachment 1 to convention “A” of 2017 between the Department of Civil Protection and Istituto Nazionale di Geofisica e Vulcanologia.
[11]Legislative Decree no. 33 of 14 March 2013 and subsequent modifications and supplements. Re-organization of the regulations regarding the right to civic access and the obligations of publicity, transparency, and dissemination of information by public administrations.
[12]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale. Chapter V, Art. 50, Paragraph 3-bis.

2. Operational Context

2.1 Objectives

In keeping with the definitions made in DP200, the INGV Data Policy has the purpose of:

  • Organizing and making accessible the intangible scientific heritage of INGV, in order to contribute to knowledge while complying with the principles of Open Access as a basis of the Open Science paradigm; contributing towards achieving the Open Science paradigm will allow INGV not only to meet its statutory obligations but also to attain full valorisation of its role in both the national and international arena;
  • Promote INGV’s role in the context of national and international research and of the monitoring and surveillance systems in the area of statutory duties;
  • Capitalizing on the skills of INGV’s research personnel.

The objective of this specific document is to provide a framework of rules and to describe the instruments for managing data, for the purpose of supporting implementation of the INGV Data Policy defined in DP200, and thus the achievement of its objectives.

2.2 Strategy

The implementation strategy for the INGV Data Policy described in this document is based on the development of the Data Registry as a technical and organizational instrument, and on the establishment of the Data Management Office, with strategic and technical/operational tasks.

The Data Registry, according to its definition, collects metadata that describe the research Data that are the result of the scientific and technological production of INGV and/or managed and/or published by INGV, regardless of whether these data are static or dynamic, and regardless of the procedures followed for their creation. The development of the Data Registry will not only provide INGV with a system for the institutional identification of its own production of scientific data, but will also allow tracking the provenance of the data introduced into the research life cycle or, more generally, into the world of shared knowledge, that shall be certified using modern information technologies. The entire cycle of scientific research will benefit from the presence of a Data Registry. For example, it will be possible to improve the assessment of the metrics concerning the impact of the research results; the verifiability of undergraduate and/or graduate university degree theses based on non-original data (published in scientific articles) will be simplified; it will be possible to cite, precisely and in updated fashion, all the data that are used, while at the same time valorizing both the work of those using the data, and that of those producing them. In the first implementation phase, the Data Registry’s structure should be patterned after that of the Data Census conducted at the end of 2016, beginning of the 2017 and cited in the Introduction. It is foreseen that this structure might be modified depending on how the regulations and INGV’s databases will evolve, this being always and at any rate for the purpose of optimizing the flow of information between the Data Management Office and the data managers.

The Data Management Office (Ufficio Gestione data – UGD): is instituted to respond to the «Professionalism» principle in the “Principles of the INGV Data Policy” document, which enshrines INGV’s commitment to allocating the professional resources necessary to implement the Data Policy. The Data Management Office, in accordance with its definition, acting with the purpose of promoting the Open Science paradigm, manages the Data Registry both in the “Start-up and Trial” phase and in the “Ordinary Management” phase (see paragraph 2.3 below).

For the purposes of the aforementioned strategy, it is important for the establishment and subsequent implementation of the Data Registry to be the result of actions shared among the various components of INGV represented by executive bodies, management bodies (including the Data Management Office), and personnel. In fact, all these components must contribute towards the common objective of recognizing the activity and scientific authoritativeness of those producing the data, and at the same time the institutional responsibilities and duties towards the scientific community and all potential users.

2.3 Activity plan

The delicate phase of starting up and trialling the Data Registry and its subsequent ordinary management are entrusted to the Data Management Office.

2.3.1 Start-up and Trial

This first phase has the dual purpose of beginning to populate the Data Registry with the elements reported in the Census in accordance with a migration prioritization described below, and of testing and validating the rules and procedures defined in Chapter 3 below, while suggesting corrections where necessary. The detailed programming of the Registry population activities, the validation of the procedures, and the adoption of any corrections provided for in the Start-up and Trial phase are reported in Attachment 1.

Considering the high number of censused elements (more than 250), during the start-up phase the Registry will be populated following a scale of priorities that takes account of the maturity of the elements to be inserted, and of institutional needs.

To be transferred into the Registry, each data declared in the Census must be associated with a type of Service and therefore indicate the corresponding rules of access. Lacking this information, the data cannot be transferred from the Census to the Registry. Consequently, the first level of prioritization is carried out based on the type of services declared and the corresponding access rules, and on the characteristics that make data more or less accessible, such as for example the presence of DOIs and/or of operative Landing Pages. The definition of the type of service and the pertinent access rules is the responsibility of the holder of the data, and is implemented by the Data Manager.

The second level of prioritization is based on the consideration of INGV’s strategic and operative objectives and is therefore the responsibility of the Department Directors. The data’s institutional priority may be established on the basis of the characteristics of the data in question, and on its impact, if any, on INGV’s strategic aspects, which is to say on the data’s contribution towards achieving the institutional objectives established in the Three-Year Activity Plan. To establish the institutional data priority, one may, for example, consider whether it must be provided to outside financing bodies on the basis of specific agreements, or estimate the economic value connected with its production, understood as the weight of INGV’s contribution. This value may be assessed in terms of own economic resources (from FOE or from institutional projects), personnel resources, or instrumental resources for its production and preservation over the long term (sustainability).

In brief, for the purposes of prioritization for migration from the Census to the Registry, the Data Management Office will take the following characteristics of the censused data into consideration:

  • Type of service associated with the data and with the related access rules;
  • Existence of a DOI, or other permanent associated identifier, or of an associated operative Landing Page (also in the absence of DOIs);
  • It is a priority in the context of the institutional activity.

In this phase, at the same time as when the Registry begins being populated, the Data Management Office will trial the procedures defined in Chapter 3, in order to verify their validity. This trial, lasting 12 months, will allow the advantages, disadvantages, and problems related to the procedures to be highlighted; after the trial, the Data Management Office will draw up a report suggesting solutions to the criticalities in the procedures and/or the rules of operation and – should these criticalities are found – the necessary improvements. Based on this report, the Department Directors will assess whether to apply solutions and strategies proposed by the Data Management Office, to thus declare the Start-up and Trial concluded, and to go on to the subsequent phase of the Registry’s ordinary management. Should the Department Directors deem it necessary to extend the trial phase, they shall define the procedures and duration thereof in concert with the Data Management Office.

2.3.2 Ordinary Management

At the end of the Start-up and Trial phase, the Data Management Office will update the procedures, and where applicable the rules, for the Data Registry’s ordinary management based on its report and on the Department Directors’ recommendations.

The new document, containing updated/modified rules and/or procedures, will then be submitted for ratification by the Executive Board. Moreover, the Data Management Office will submit to the Department Directors, and thus to the Executive Board, a document named “Data Management Office: Internal Rules,” which will describe, with a level of detail permitting their effective operativity, the procedures for the Data Registry’s ordinary management. Once approval by all the competent bodies is secured, the phase of the Data Registry’s ordinary management can begin.

In principle, it is established here that the Data Management Office:

  • Will periodically update the Registry, following a timeline agreed upon with the Executive Board;
  • Will draw up, at a yearly frequency, a report containing the statistics of the managed data, an analysis of the effectiveness of applying the Open Science paradigm in INGV’s structures, the impact of evolving EU and national regulations and practices on data management, any criticalities emerging over the course of the year, and the consequent actions to be adopted.

Moreover, based on the yearly reports provided by the Data Management Office, the Executive Boards, having heard the Department Directors where necessary, will express its opinion on the need to update the Principles document of the INGV Data Policy and/or the document of the Data Registry’s Ordinary management.

3. Data Management

The following are the rules and procedures adopted by INGV for data management, and the institutional bodies contributing towards their management are identified.

3.1 Data Access Rules

In accordance with what is stated in DP200, a fundamental principle for INGV is Open Access to scientific information, or guaranteeing access thereto with no additional costs for the final user, or making it available at the marginal costs incurred for reproduction and dissemination [1].

In keeping with this principle, the following rules apply.

The Service, where the term “service” is to be understood as any of the operations applicable to a database that allows the data to be searched, displayed, transferred, transformed, edited, and/or updated, may be Open (freely available and accessible to anyone, without restrictions) or Limited (available, but under the conditions established or agreed upon by/with the holder of the right to exploit the intellectual property); in this case the limitation must be defined and justified. In particular:

  • The metadata search service will always be open, in order to give the data the greatest possible visibility (findability);
  • The data display service will be open, where the IT infrastructures make this possible;
  • Data transfer may be subject to limitations for specific types of users and/or for defined periods of time;
  • The data analysis and transformation services may be limited to only the data producers or to appropriately defined restricted user groups;
  • Any services for modifying and/or updating the data contained in the databases must necessarily be limited to authorized personnel and protected with security measures suitable for the purpose of conserving their integrity and guaranteeing total control over the modifications.

The type of access to the individual service may be modified, where necessary, upon adequate and approved justification.

Users are classified in accordance with three modes of access: Anonymous, Registered, and Authorized. In the case of registered and authorized users, the access systems must carry out the function of identification, authentication, and authorization for use of the requested services. As a general rule, INGV employees should never have greater limitations than outside users, of course in keeping with any agreements in the event a number of institutions are involved.

Suitably verified failure by users to comply with the access rules or the data use licences shall be raised with the interested parties and, in the case of institutional authorizations, with the relevant body as well. INGV reserves the right to adopt measures limiting access to the data against those (individuals or bodies) that fail to comply with the access rules established in the Data Registry. In more serious cases, INGV reserves the right to take legal actions to protect its intellectual property or to remedy any economic damage or damage to its reputation.

Pending the development of the INGV data portal, the websites that publish the data contained in the Data Registry, managed by the individual producers or by the Sections, must set out all the aforementioned information for the users. In particular, the users shall be informed of the following:

  • the licence associated with the data;
  • the rules for accessing and using the data in a section named “Terms of use of the data”;
  • the “Limits of liability” to inform the users about the level of reliability of the data, so as to be relieved of any liability that may be laid upon INGV and upon the authors in relation to potential damage derived from improper use of the data caused by third parties.

3.2 Data Producer Identification Procedures

As concerns compliance with the rules governing the management of the data of Public Administration, and for the purposes of entering the data into the Data Registry, the Producer of the Data must be identified. In so doing, it is appropriate to consider the type or class of the data. For this first phase of implementation of the Data Policy, only two data Classes – named A and B – are adopted, depending on the characteristics and role of the data in question. In particular, Class A identifies the data that have a specific role with regard to the Body’s institutional commitments and strategic objectives as established in the PTA. Class B identifies all the other data. The assessment of the data’s class is the responsibility of the Department Directors, as they are the institutional parties tasked with promotion, programming, and verification, as indicated in the body’s Statute [2]. In particular, for the purposes of the data’s migration to the Registry, the Class A data shall be considered a priority.

As concerns the identification of the Data Producer, the procedure then follows different paths, depending on the class attributed to the data.

In the case of data attributed to Class A, the identification of the Data Producer and, where applicable, of its Manager, will take place in accordance with the “top-down” procedure defined as follows.

  • The Department Director(s) define(s) the type of data based on the institutional priority, and at the same time identify(ies) the Section of reference or Sections in the case of several Sections involved in producing the data.
  • The Section Director(s) of reference identify(ies) the personnel involved in collecting and processing the data; in the event of several units of personnel, the Director(s) identify(ies) a Manager.
  • The Department Director(s) validate(s) the decision made by the Section Director(s).
  • The Data Manager initiates the procedure for entering the data into the Registry in accordance with the procedures indicated by the Data Management Office.

In the case of data attributed to Class B, the identification of the Data Producer and, where applicable, of its Manager, will take place in accordance with the “bottom-up” procedure defined as follows.

  • The personnel involved in producing the data proposes the Data Producer to its Section Director(s); in the case of several units of personnel, the Data Producer autonomously identifies a Data Manager; in the case of a Producer of Data involving several Sections, the involved personnel autonomously identifies a Section of reference which should coincide with that of the Manager.
  • The Section Director of reference validates the received proposal, considering the proposing personnel’s contribution to producing the data; in the case of several Sections, the decision is made jointly by the Directors of the involved Sections.
  • The Department Director(s) validate(s) the Section Director’s proposal, considering the data’s scientific and operational aspects.
  • Should some components of the Data Producer belong to other institutions, there must be a formal document regulating the co-control of the data.
  • The Data Manager initiates the procedure for entering the data into the Registry in accordance with the procedures indicated by the Data Management Office.

3.3 Data Suitability Verification Procedures

Once its class is defined, before a data can be entered into the Registry, its suitability must be verified. The procedure for verifying this suitability is diagrammed in Figure 1. The information necessary for the verification is provided to the Data Management Office by the Data Producer’s Contact (see Chapter 4).

In the case of unstructured data Set for which some technical/scientific assessment is required, this shall be seen to by the Department Directors, and shall be acquired prior to the start of the procedure of entering the data into the Registry.

image0

Figure 1. Procedure for verifying the data’s suitability for the purposes of their entry into the Data Registry.
[1]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). Art.1, paragraph 1, letter l-ter.
[2]INGV Board of Directors Decree no. 424 of 15 September 2017, as per Gazzetta Ufficiale no. 27 of 2 September 2018.

4. Data Registry

The development of the Data Registry is a crucial condition for implementing the INGV Data Policy.

The Data Registry, by listing the set of metadata referring to the scientific data produced by INGV, in fact endows INGV with a means for identifying the institutional data and the data related to projects and conventions, or produced by third parties. By way of principle, the Data Registry contains metadata:

  • Concerning institutional activities, regardless of whether or not they are structured in Databases, whatever type of access to services they establish;
  • Concerning projects and conventions, provided they call for Open Access;
  • Produced by third parties, the management and/or publication of which is devolved upon INGV by virtue of a formal agreement, provided they call for Open Access.

The Data Registry aims to improve and simplify the entire cycle of scientific research, and is designed to meet the needs of INGV’s personnel at all levels, as well as the needs of parties that interact or would wish to potentially interact with INGV, and that therefore require simplified and centralized data access tools.

The Data Registry aims to simplify the systems for assessing the metrics related to the impact of the results of the Research.

The Data Registry contributes towards the simplification and verifiability of scientific hypotheses, since it makes it possible to univocally track the data used to process them, thereby allowing them to be reproduced. It will also be possible for researchers to add, to the publications, univocal references to the data used, thereby at the same time fostering recognition of the work done by the persons who contributed towards creating those data.

The Data Registry is open and made accessible through INGV’s web portal. In order to manage and use the Data Registry, each element’s metadata must be collected and kept up-to-date. The Registry’s scientific, technical, and administrative management is entrusted to the Data Management Office.

The Registry’s items are organized in data groups ascribable to a common type of objects, for example successive versions of a scientific product or sub-products or subsets of data generated or simply contained in a given Data Collection. Each group of elements may in its turn by linked to other groups, by adopting the typologies of relationships suggested in the OpenAire guidelines [1].

4.1 Metadata

The following is the list of metadata needed to describe each element in the Data Registry (Table 1). This list may foreseeably be supplemented in light of institutional needs or future recommendations by Agenzia Digitale per l’Italia (AgID) [2]; these possible supplements will be carried out by the Data Management Office and made appropriately known to personnel and at any rate made available on the website of the Data Management Office.

Table 1. Metadata describing each element entered into the Data Registry

Metadata Description of the metadata
ID Record identifier.
Data Group ID Identifier of the group of elements.
Data Group Full name of the group of elements.
Data Group Acronym Acronym, if any, associated with the group.
Name Full name of the element.
Acronym Acronym, if any, associated with the element.
Type of Element Type of element (e.g.: “Digital database,” Digital data not structured in a database,” “Physical samples”).
Description Brief description of the element.
Persistent Identifiers Persistent identifiers associated with the element.
Class
  • Volcanology data
  • Geochemical data (geochemical analyses of rocks, water, and gas)
  • Geodetic data
  • Seismological and infrasonic data (Earth and sea)
  • Data of physical samples (samples and physical parameters of rocks, minerals, and various materials)
  • Data of atmospheric geophysics and Aeronomy
  • Geological data (Earth and sea)
  • Geophysical data (geomagnetic, geoelectric, EM, etc.), Earth and sea
  • Data from Numerical Modelling
  • Remote sensing data
Level
  • Level 0: raw data or basic data, that underwent no level of processing, excluding at most an automatic-type validation (examples: waveforms, GPS data, uncalibrated images, rock samples).
  • Level 1: data products obtained from automatic or semiautomatic procedures (examples: location, magnitude, focal mechanisms of earthquakes, shakemaps, historic series of the amplitude of the volcano tremor, and of movement of GPS stations).
  • Level 2: data products obtained from the search activity and at any rated based on non-automatic procedures (examples: crust models, strain maps, source models of earthquakes and of deformations, numerical simulation models of volcanic processes, results of geophysical campaigns, laboratory measurements on samples taken for scientific purposes).
  • Level 3: integrated data products obtained from complex analyses that supplement several Level 2 products, or from analyses that supplement Level 1 or 2 products of different types and/or originating from different communities (examples: hazard maps, catalogues of active faults, volcanic activity reports).
Geographic coverage This may be indicated with text (examples: World; Europe; Italy; Etna; province of Catania), or with the coordinates of the vertices representing the polygon of geographic coverage, coded with the WKT standard [3].
Data coding formats Where possible, indicating a reference to the standard for the less-common formats.
Associated metadata If a metadata model is adopted for the description of the resource on the whole and/or of its content, indicate which (examples: Dublin Core, DCAT, DataCite, RNDT, INSPIRE). Indicate “custom” if these are metadata coded in accordance with an internal, not widespread standard.
Data type Dynamic data or static data.
Update frequency In the case of dynamic data, the frequency with which the content is modified is specified here (e.g.: continuously recorded “data streaming”), regardless of the reason leading to the modification.
Purpose of use Different case and settings for use of the data (examples: emergency, communication, training, commercial uses).
Data Manager INGV employee of reference who sees to the scientific and administrative aspects related to the data. In addition to name, surname, and affiliation, the ORCID must be present.
Technical Manager INGV employee of reference who sees to the technological aspects related to the Database. In addition to name, surname, and affiliation, the ORCID must be present.
Data Producer Individual employee or group of employees of INGV that produces the data. For each employee, the name, surname, and affiliation, must be indicated, the role defined, where possible expressed in accordance with the OpenAire [4] specifications, and the ORCID code provided.
Database organization Type of organization of the archiving of the data (examples: monitoring network, database, raw data on filesystem or cloud, document archive, archive of physical samples, photographic archive).
Involved INGV Sections List of Sections and offices involved in creating and managing the data (examples: ONT, RM1, RM2, OV, OE, PA, BO, PI, MI).
URLs Web address(es) like the homepage (Landing page), or pages related to such services as data search and displaying, transferring, transforming, editing, and/or updating the data.
Web Service Indication of any modes of access to the data via Web service or API (Application Programming Interface) or other procedures that can be automated, with indication of the adopted standard (examples: RESTful, SOAP, CGI). If available, indicate the Web address from which it is possible to access;
Documentation Link to the documentation of reference, both scientific and technological in nature. If available, compile with the DOI of the publications, or otherwise with URL.
Citation Bibliographic citation of the data.
Keywords List of keywords identifying the data. Obligatory compilation of a list in English; addition of a list in Italian is optional.
Status Values admitted: “in planning,” “in development,” “operative.” Indicate “legacy” for data or products no longer managed or updated, but still accessible.
Ownership Control over the data belongs to INGV, except for cases in which other institutions are involved.
License Type of Creative Commons license associated with the data and/or the database, since they might differ (the license associated with the container is different from the license associated with the content).
Access to the data The admitted values are “anonymous,” “registered,” “authorized.” If not applicable, briefly describe any alternative terms of access.
Open Data Class Class according to the “5 stars” classification [5] that defines the type of Open Data.
Metadata classes Class according to the metadata classification proposed by Agenzia per l’Italia Digitale (“Levels of the model for metadata” from “Linee Guida Nazionali per la Valorizzazione del Patrimonio Informativo Pubblico 2016”).
RNDT Indication of the relevance of the data for the purposes of Repertorio Nazionale dei Dati Territoriali (national registry of spatial data).
Projects / initiatives of reference Project(s)s and/or initiative(s) of reference for the indicated data and/or product (examples: INGV-DPC, H2020 Convention – followed by the project name –, EPOS, EMSO, MED-SUV).
Other institutions involved In the case in which institutions other than INGV have contributed towards creating the data, indicate which, specifying for each level of contribution (examples: negligible, marginal, substantial).
Links It is possible to indicate links and the type of relationship in accordance with OpenAire guidelines [6]. It is possible to establish links to other Registry elements, or to elements outside the Registry, such as for example publications, or other Databases making said data available.
Data creation date Date when the data were created.
Record creation date Date when the element was entered into the Data Registry.
Date of last update of the record Date of last update of the information pertaining to the element.
Notes Any additional notes of use for the purposes of the Data Registry.

4.2 Ordinary management of the Data Registry

The Data Registry’s Ordinary management involves, at various levels, the following institutional parties:

  • The Data Manager and the Database Technical Manager, acting on behalf of the Data Producer;
  • The Section Director to whom the Data Manager reports;
  • The Department Director, responsible for institutional validation and scientific quality; in the case in which the data involves several Departments, the Department Directors identify one of reference;
  • The Data Management Office, which coordinates and manages the entire procedural process.

4.2.1 Admissibility criteria

For the purposes of their entry in the Data Registry, the data must comply with certain admissibility criteria.

By way of principle, all the data related to institutional activity are entered, whatever type of access to services they involve; the data pertaining to projects and conventions and those produced by third parties, the management and/or publication of which is devolved upon INGV, are entered only if they involve Open Access.

The data must be the result of work by INGV personnel, or of the joint work of personnel from INGV and from another organisation(s), or be data produced by other institutions the devolve upon INGV the role of making them accessible; if other institutions are involved on various grounds, two conditions must be met:

  • INGV’s employed personnel must have made a non-marginal contribution to the creation of the data, or deals with the management and/or publication of the data;
  • INGV, in order to prevent possible situations of litigation, must have formal written agreements in place, approved by the relevant bodies of other authorities or institutions that regulate and clearly report the terms established between the parties and, above all, the express acceptance of the entry of these data into the INGV Data Registry; it is specified that this documentation must be signed by the qualified parties, which is to say parties that control the data that are the object of the agreement.

The data must be accessible via the Internet; in the case of physical data (for example, rock samples), the metadata must be accessible, and the terms for accessing the physical object must be established.

For each data, the type of service, and the related access rules, must be specified in accordance with what is established in the Data Policy’s Principles. In the case of Registered or Authorized Access, the criteria must be defined and justified. In the event of any limitations in the Access Services, as in the case of embargo, these must be specified and appropriately justified. Lacking these specifications and justifications, the access service shall be considered open.

The metadata provided for by the Data Registry must be openly available.

A description must also be available that illustrates the data generation process, clearly identifying those data sources used (if any) of which INGV is not the only controller.

For the data classified as static, integrity and invariability over time, as when entered in the Registry, and periodical checks shall be performed by means of validation tools (e.g. hashing, method for creating and comparing encrypted keys). Should the need to vary a data arise, a new element associated with the new version of the data will be created; this new element will be assigned a new Registry identifier and, if present in the previous version, a new DOI as well. Once an element is entered into the Data Registry, it is important for it to be maintained accessible over time, even if subsequent versions, more evolved over the time, succeeded in the meantime.

The standards of interoperability must be complied with, both for the coding of the data and in any data access services, indicating whether they coincide with those suggested by Agenzia per l’Italia Digitale (AgID) [7], or whether it is a standard of reference in the scientific sector of reference.

It must be reported whether these are spatial data that can be entered in the national registry of spatial data (National Catalog for Spatial Data, Repertorio Nazionale dei Dati Territoriali – RNDT) [8], with particular reference to the spatial data defined as of “general interest. [9]

Each request for entering data must be accompanied by a sustainability plan agreed upon with the Directors of the involved Sections, clarifying the nature and duration of the financial coverage necessary for the infrastructure hosting the data, and describe whether and how the adopted solutions guarantee both the conservation and the accessibility to the data over the long term.

4.2.2 Procedure for entering elements

The ordinary submission procedure is composed of the following steps (Figure 2):

  1. The Data Producer’s identity is formalized (see Chapter 3.2), indicating for each component the relevance, role, and ORCID identification code, as per the indications of the Ministry of Education, Universities and Research (MIUR) [10] and the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) [11].
  2. Should some persons among the Data Producers belong to other organisation, it is necessary to ensure that there is a formal document that regulates the terms of the collaboration and the exchange of data, which must expressly provide that INGV is entitled to republish the data and to enter them into its own Data Registry;
  3. The Data Producer indicates the Data Manager and, in the case of a Data Collection, the Database Technical Manager;
  4. The Data Manager verifies the admissibility criteria, compiles the metadata, where possible, while also compiling the metadata associated with the DOI identifier (Chapter 5.2) and for the Level 3 and 4 data, and proposes one of the Creative Commons licenses (paragraph 4.2.5);
  5. The Data Manager submits the request via e-mail to the Data Management Office, attaching the material prepared as per the above point;
  6. The Data Management Office verifies the technical admissibility of the request and validates the metadata, interacting if necessary with the Data Manager for any corrections;
  7. The Data Management Office identifies the type of data that are the object of the request and establishes their subsequent path, which may be of two types: complete, which is adopted for the new Databases that are not part of data sets already present in the Registry; or simplified, which is adopted for the individual files or Databases that are part of a data group already present in the Registry (e.g.: a new version, or a subset). The Data Manager is sent a notification of the preliminary acceptance; in the event the request is rejected, an e-mail will be sent to the Data Manager, with the reasons for the inadmissibility;
  8. The Data Manager submits to the Section Director a written request to enter the data into the Registry, attaching the clearance of the Data Management Office, the metadata, and any necessary documentation (example: formal agreements for the exchange of data with other institutions).
  9. The Section Director verifies the reliability of the submitted request and transmits it to the Department Director;
  10. The Department Director assesses the request, also in relation to INGV’s Three-Year Activities Programme; in the case of Level 0 or 1 data, it assigns the license as delegate of INGV’s legal representative; sends the authorization to proceed to the Data Management Office;
  11. The Data Management Office proceeds to assign the Registry identifier and enters the new element into the Data Registry; if the data are neither structured nor structurable in an existing institutional Database, they are archived in Earth-Prints;
  12. The Data Management Office enters the element’s metadata into external metadata Registries, in particular into DataCite’s DOI Registry and, if the Data Manager signals that the data are spatial in type and are of interest for the Repertorio Nazionale dei Dati Territoriali (national registry of spatial data), enters the data into the RNDT Registry;
  13. The Data Management Office proceeds to update the Data Registry’s information on the INGV institutional portal.

image0
Fig. 2 – Block diagram of procedure for entering new elements into the Data Registry.

4.2.3 Modifications and supplements to elements

Any requests for modifications to the metadata associated with the elements already present in the Data Registry will be made known by the Data Manager to the Data Management Office, which will assess its admissibility on the basis of consistency with what is already present in the Registry. If the extent of the variations is deemed considerable, the creation of a new element in the Data Registry will be assessed, repeating the submission procedure in part or in whole. The Data Management Office will see to tracking all the modifications made on each element of the Data Registry. Periodically, the Data Management Office will verify the accessibility, integrity, and consistency of the data present in the Data Registry; should inconsistencies be found, the DATA MANAGEMENT OFFICE will interact with the Data Manager for the appropriate actions.

4.2.4 Removal of elements

The removal of an element from the Data Registry may take place upon submission of a justified request by the Data Manager to the Data Management Office, which will assess the admissibility thereof. If the request is approved, the element will not disappear from the Data Registry, but will be indicated, along with the reason for removal, as a removed element. Any persistent identifiers (e.g.: DOI) will not be removed, but the corresponding metadata will be appropriately modified to signal that these are removed elements. The Data Manager will also be asked to create a Landing Page explaining the reason for the removal and presenting, where it exists, a link to the element replacing the removed one.

4.2.5 Licences associated with the elements

Since the regulations in force adopt the “open by default” principle [12] according to which “The data […] that administrations publish, by any means, without the express adoption of a license […] are to be understood as issued as open-type data” INGV, as controller, will place a license [13] on each element of the Data Registry [14]. In accordance with the provisions stated in the INGV Data Policy and with the suggestions made by the guidelines of the European Commission [15], the adopted licenses shall be of the Creative Commons type [16].

For the purpose of supporting Open Science through the publication of “Open-type data, [17]” it is established that Level 0 and 1 type data are attributed the “Creative Commons Attribution (CC BY)” license [18], by virtue of the principle enshrined in the “Principles of the INGV Data Policy” according to which the owner [19] of the intellectual property of these data is INGV. As regards the version of the license, at the time of the drafting hereof, reference to v4.0 is made, but subsequently any updates must be taken into consideration [20].

For the Level 2 and 3 data, the Data Manager may, taking account of the regulations in force, suggest one of the Creative Commons licenses at the moment of the request to the Data Management Office that will deal with assessing its admissibility. If the Data Manager proposes a Creative Commons license other than CC BY, it must provide justification for the proposal, in order to guide the Data Management Office in the admissibility assessment process. In the case in which no license is proposed by the Data Manager, the CC BY license shall be automatically attributed. The assigned license must be reported on the Landing Page of the site from which the data are distributed, the characteristics of which are detailed in point 5.2 below.

4.2.6 Persistent identifiers associated with the elements

The data entered into the Data Registry shall, in addition to a Registry identifier, also have a persistent identifier broadly adopted in a scientific setting as the DOI code. The DOI Registry Agency used is DataCite, whose metadata scheme is adopted [21]. For details on the procedure for assigning this identifier, reference is made to Chapter 5, “Registries of metadata not managed by INGV.”

4.2.7 Exclusion of liability and terms of use of the data

The Data Management Office, in concert with the Legal Affairs and Litigation Office, will establish, on a case-by-case basis, the procedures and actions for managing the exclusion of liability of INGV and of personnel in connection with any incompleteness and uncertainty of the data present in the Data Registry, the use, even partial, of the data reported in the Data Registry by third parties, and any damage caused to third parties, derived from their use.

[1]OpenAire. Guidelines for Data Archives.
[2]Agenzia Digitale per l’Italia. Linee Guida per i cataloghi dati.
[3]Well-known text, ISO/IEC 13249-3:2016, https://en.wildpedia.org/wiki/Well-known_text
[4]OpenAire. OpenAIRE Guidelines for Data Archives.
[5]5 stars Open Data. http://5stardata.info
[6]OpenAire. Guidelines for Data Archives.
[7]Agenzia Digitale per l’Italia (2017). Linee Guida Nazionali per la Valorizzazione del Patrimonio Informativo Pubblico.
[8]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). Art. 59, paragraph 5.
[9]Decree of the Presidency of the Council of Ministers of 10 November 2011. Art.3, Paragraph 1. List in Attachment 1.
[10]Decree of the Ministry of Education, Universities and Research no. 120 of 07 June 2016.
[11]ANVUR, Project IRIDE.
[12]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). Art. 2, paragraph 2.
[13]Legislative Decree no. 36 of 24 January 2006. Art.5, Paragraph 1, “[…] The controller of the data adopts, by priority, standard open licenses […].” Art.2, Paragraph h, “standard license for reuse: the contract, or other negotiated instrument, drawn up in electronic form where possible, defining the procedures for reusing the documents of public administrations or of public-law bodies.”
[14]Legislative Decree no. 165 of 30 March 2001, paragraph 2. The parties tasked with placing licenses on the data are public administrations, understood as “all the administrations of the State, including institutes and schools at any level, and educational institutions educative, companies, and administrations of the State under autonomous system, Regions, Provinces, Municipalities, mountain communities and their consortia and associations, university institutions, independent public housing institutes, chambers of commerce, industry, handicrafts and agriculture and their associations, all national, regional, and local non-economic public authorities, administrations, concerns and bodies of the national health service, the agency for the agency for the representation of public administrations in negotiations (Agenzia per la rappresentanza negoziale delle pubbliche amministrazioni – ARAN) and the Agencies pursuant to Legislative Decree no. 300 of 30 July 1999. Until the organic revision of the sector’s regulations, the provisions as per this decree shall continue to apply to the Italian National Olympic Committee (CONI) as well.”
[15]European Commission notice (2014/C 240/01). Guidelines on recommended standard licences, datasets and charging for the reuse of documents.
[16]Creative Commons. https://creativecommons.org/
[17]Legislative Decree no. 8 of 07 March 2005, Art.68, paragraph 3, letter b
[18]Creative Commons Attribution 4.0 International (CC BY 4.0). https://creativecommons.org/licenses/by/4.0/
[19]Legislative Decree no. 82 of 07 March 2005, Art. 1, paragraph cc, as modified by Legislative Decree no. 179 of 26 August 2016, Art. 1, paragraph g
[20]Creative Commons Licenses. https://wiki.creativecommons.org/wiki/License_Versions
[21]Datacite. Metadata Scheme. https://schema.datacite.org,/

5. Registries of Metadata not Managed by INGV

The data present in the INGV Data Registry can be linked on an institutional level to metadata registries not managed at INGV. Here, the link to two registries in particular is provided for: DataCite, which allows an identifier DOI to be assigned to the elements and manages metadata adapted to describing the data produced by scientific Research, and national registry of spatial data (Repertorio Nazionale dei Dati Territoriali – RNDT) managed by Agenzia per l’Italia Digitale (AgID), and that is placed in the context of the European Infrastructure for Spatial Data (INSPIRE) [1]. Any adoption of new external registries, where the need is expressed on an institutional level, will be assessed in concert between the Data Management Office and the Department Directors.

In order to comply with the principle of “Data entry minimisation” [2], the metadata needed to describe the entry of data into external registries are to be extracted directly from the INGV Data Registry in totally automated fashion, and thus without having to re-enter information already compiled earlier. The update of the metadata thus takes place in cascade; upon updating the metadata in the INGV Data Registry, the metadata associated with the linked data in the external registries are automatically updated.

5.1 Landing Page Requirements

The Landing Page is the web page that is associated with the data either in the INGV and the external registries, which is to say the page the users arrive on when consulting these registries. In order to harmonize the information presented on these pages, and for the purposes of improving INGV’s coordinated image to the outside, the Landing Pages must contain the following elements.

  • The INGV logo, which clarifies and identifies the paternity of the data for the external user, and a direct link to the INGV institutional portal. If other institutions are involved in producing the data, they must also be clearly identifiable and a link to the respective institutional portals must be present.
  • The name and a brief description of the element of the Data Registry that is made available, reporting information consistent with that stored in the Registry.
  • A comprehensive description of the data that affords potential users an informed use, clarifying the purposes with which the data were created, if available, provide a list of the scientific and technical publications of reference, at all times indicating their DOI code, if there is one; the Data Registry may contain one or more references to these publications, and it may therefore be hoped that the publications reported on the Landing Page are the same ones reported in the Data Registry.
  • Direct access to the data or, if it is a question of data with non-free access, a clear explanation of the data access procedures.
  • In the case of data associated with elements in the Data Registry that belong to a group of elements, make available a link to the other Landing Pages that contain the other elements of the group, clarifying the nature of the link existing among the group’s various data. Above all in the case of groups of elements composed of different versions of the same product, it is essential to present the user with a link to the other versions, clarifying what the differences from one version to the other are, and at any rate clarifying what the most up-to-date version is.
  • The bibliographic citation of the data listing the authors, title, and the institutions or bodies that control the data. To indicate INGV, use the wording “Istituto Nazionale di Geofisica e Vulcanologia (INGV).” The authors must coincide with what is reported in the Data Registry; in the case of a large number of persons (for example, more than four), it may also include, in addition to the main scientific managers, a generic reference to the working group. The citation must include, at the end, the DOI code associated with the data, in its form resolvable on the Web, preceded by “https://doi.org/.” The bibliographical citations must be written in English; a possible, additional citation in Italian may also be presented.
  • The list of persons that contributed towards creating the data, clarifying which of them is the Scientific Manager and, if present, the Technological Manager, consistently with what is reported in the Data Registry; the users must be able to contact the Data Manager and/or the Technical Manager, and it is therefore necessary to publish their e-mail address or, alternatively, it must be possible to obtain an interactive tool to receive and provide requests.
  • Indication of the Creative Commons (CC) license as reported in the Data Registry, which must be linked to the description web page on the Creative Commons site; it will have to be made clear to the users what are any limitations on accessing the data, the limitations on use, and the limitations of liability on the part of INGV, and any notes connected with the legal sphere.

The Data Management Office will see to supporting data Managers in verifying the presence of all the necessary elements in the Landing Page, and in particular the information’s consistency with what is reported in the Data Registry will be guaranteed.

5.2 DOIs

The Data Management Office will, where possible, associate each element of the Data Registry with a persistent DOI, entering the related metadata into the DataCite registry [3].

5.2.1 Guidelines for Assignment

Assignment of a DOI is bound by a series of mandatory conditions, since:

  • a web page to be associated, called the “Landing Page,” must be available and openly accessible;
  • it must be possible to access the data directly from the Landing Page;
  • a Creative Commons license must be associated with the identified data;
  • the list of metadata must be compiled in accordance with the DataCite scheme [4].

The convention executed in 2013 between INGV and CRUI (Conference of Italian University Rectors) provides two DOI prefixes:

  • “10.6092/INGV.IT-“ a sub-code of the CRUI prefix, shared with other institutions, and with explicit reference to INGV;
  • “10.13127,”a neutral prefix for INGV’s exclusive use.

The DOI is structured with a prefix and a suffix; here is an example based on the code with the CRUI prefix:

image0

The following are the guidelines for assigning identifiers, drawn up taking consideration of the indications contained in the DOI Handbook [5].

The code “10.6092/INGV.IT-“ is to be used for the data whose ownership belongs to INGV exclusively, and whose geographic coverage is of strictly national interest. The DOI codes based on this prefix must comply with the following structure:

10.6092/INGV.IT/< data-group>/<specific identifier>

The prefix “10.13127” is to be used when the use of an anonymous code is desirable, for example in the case in which data is to be identified to which several institutions have contributed, or that are of international importance, or in the case in which the use of “non-speaking” identification codes – a term of the trade by which it is meant that the code presents no intelligible form – is requested. The DOI codes based on this prefix must comply with the following structure:

10.13127/< data-group>/<specific identifier>

Complex groups of the Registry’s elements may adapt a common base code, followed by a (“/”), followed in term by a different identifier for each element in the data group; there may be more than one sublevel, depending on each specific data group’s complexity.

Groups of data containing different versions of the same element must adopt a constant prefix, adding the version after the period (“.”).

The suffix’s length should not exceed 50 characters, and may contain the following characters: numbers (0-9), upper-case letters of the English-language alphabet (A-Z); the “minus” sign (“-«” and the “underscore” sign (“_”).

To disseminate the use of the associated DOI codes, it is recommended that reference to the code always be added in presentations, abstracts, posters, technical reports, social networks, and above all, publications. For a more functional use, it is recommended that the code be presented in the form of a resolvable address (e.g.: https://doi.org/10.13127/xxxx).

5.2.2 Guidelines for Compiling Metadata

The metadata to be associated with the DOI adopt the DataCite schema [6] in its most recent version. This document makes reference to version 4.1 of the scheme (the most recent one at the time of the drawing up hereof); it is foreseeable that in the future, these may vary with new versions, and therefore the Data Management Office will be responsible for signalling any updates on its web page.

The following is a list of available metadata; for each, specification is made of the tag to be used, the number of possible occurrences, the definition in accordance with the DataCite scheme, and an indication on the content. The metadata are subdivided into those with mandatory compilation (Table 2), which is to say the presence of which is obligatory for entering data into the Data Registry, and metadata with optional compilation (Table 3).

The metadata are compiled using the XML format; if possible, the Data Management Office will make appropriate tools available to simplify the process of compiling this file.

Table 2 - Metadata with mandatory compilation.

Tag Occurr. Definition according to DataCite (v4.1) Content
Identifier 1 The Identifier is a unique string that identifies a resource. For software, determine whether the identifier is for a specific version of a piece of software, (per the Forcell Software Citation Principles [7] ), or for all versions. Compiled with the assigned DOI code.
Title 1 A name or title by which a resource is known. May be the title of a dataset or the name of a piece of software. Title in English; any acronym is added in parentheses. It is possible to specify the title translated into other languages using the attribute “TranslatedTitl e.”
Publication Year 1 The year when the data was or will be made publicly available. In the case of resources such as software or dynamic data where there may be multiple releases in one year, include the Date/dateType/d ateInformation property and sub-properties to provide more information about the publication or release date details. Year of first publication of the datadi prima pubblicazione dei dati.
Resource Type 1 A description of the resource. For the purposes of the Data Registry, it is usually compiled with “Dataset.” The DataCite scheme leaves the field’s compilation open.
Description 1-n All additional information that does not fit in any of the other categories. May be used for technical information. Text description of the data, clear and concise and in English. Add the “DescriptionTyp e” attribute compiled with “Abstract.” Other types of descriptions can also be added: “Methods,” «SeriesInformat ion,” “TableOfContent s,” “TechnicalInfo. ”
Subject 1-n Subject, keyword, classification code, or key phrase describing the resource. Free compilation, taking care to specify the “SubjectScheme” attribute indicating the classification scheme used.
GeoLocation 1-n Spatial region or named place where the data was gathered or about which the data is focused. A “GeoLocationPla ce” series and/or a “GeoLocationPol ygon” series, and/or a “GeoLocationPol ygon” series may be specified.
Publisher 1 The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. For software, use Publisher for the code repository. If there is an entity other than a code repository, that “holds, archives, publishes, prints, distributes, releases, issues, or produces” the code, use the property Contributor / contributorType / hostinglnstitut ion for the code repository. Enter the name of the Institution that makes the data available. The field is compiled with “Istituto Nazionale di Geofisica e Vulcanologia (INGV).”
Creator 1-n The main researchers involved in producing the data, or the authors of the publication, in priority order. List the main scientific and/or technological managers, indicating the affiliation and ORCID identifier code for each. In addition to the main managers, a generic reference to the Working Group can also be entered.
Contributor 1-n The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. To supply multiple contributors, repeat this property. For software, if there is an alternate entity that «holds, archives, publishes, prints, distributes, releases, issues, or produces» the code, use the contributorType “hostingInstítu tion” for the code repository. List the persons that contributed to the data, identifying for each the role carried out, affiliation, and ORCID code. Institutions can also be added. Set the “nameType” attribute as “personal” for persons and “organizational ” for institutions. The roles provided for are: ContactPerson, DataCollector, DataCurator, DataManager, Distributor, Editor, Hostinglnstitut ion, Other, Producer, ProjectLeader, ProjectManager, ProjectMember, RegistrationAge ncy, RegistrationAut hority, RelatedPerson, ResearchGroup, RightsHolder, Researcher, Sponsor, Supervisor, and WorkPackageLead er
Rights 1 Any rights information for this resource Type of Creative Commons license.
Funding Reference 1-n Information about financial support (funding) for the resource being registered List of institutions that funded the creation of the data.
Date 0-1 Different dates relevant to the work. The “dateType” attribute may contain: Accepted, Available, Copyrighted, Collected, Created, Issued, Submitted, Updated, Valid. If available, compile with relevant dates.
Language 0-1 The primary language of the resource Compile with the English wording of the language in which the data are publicly available.
Alternate Identifier 0-n An identifier or identifiers other than the primary Identifier applied to the resource being registered. This may be any alphanumeric string which is unique within its domain of issue. May be used for local identifiers. Alternate Identifier should be used for another identifier of the same instance (same location, same file). If the data have relationships, of any nature, with other research products associated with identifiers, this tag can be used to establish a link. See the list of admitted relationships below.
Size 0-n Size (e.g. bytes, pages, inches, etc.) or duration (extent) e.g. hours, minutes, days, etc., of a resource If the data can be quantified, compile this field
Format 0-n Technical format of the resource. Use file extension or MIME type where possible If the data are available in one or more data coding standards, indicate the formats here.

5.2.3 Relationships with other research products

The DataCite metadata schema allows the DOI to be linked to other digital resources available on the Internet. The “relatedldentifier” tag tasked with establishing these links can specify, in the “relatedIdentifierType” attribute, one of the following types of identifier: ARK, arXiv, bibcode, DOI, EAN13, EISSN, Handle, IGSN, ISBN, ISSN, ISTC, LISSN, LSID, PMID, PURL, UPC, URL, URN. The type of relationship between the DOI and another digital resource is specified using the “relationType” attribute. Table 4 Lists the admitted relationships in which (A) represents the data being described in the Data Registry that is associated with the DOI, and (B) the digital element that is being linked.

Table 4 – List of types of relationships admitted by the DataCite metadata scheme.

Type of relationship Description provided by DataCite
IsCitedBy Indicates that B includes A in a citation
Cites Indicates that A includes B in a citation
IsSupplementTo Indicates that A is a supplement to B
IsSupplementedBy Indicates that B is a supplement to A
IsContinuedBy Indicates A is continued by the work B
Continues Indicates A is a continuation of the work B
Describes Indicates A describes B
IsDescribedBy Indicates A is described by B
HasMetadata Indicates resource A has additional metadata B
IsMetadataFor Indicates additional metadata A for a resource B
HasVersion Indicates A has a version (B)
IsVersionOf Indicates A is a version of B
IsNewVersionOf Indicates A is a new edition of B, where the new edition has been modified or updated
IsPreviousVersionOf Indicates A is a previous edition of B
IsPartOf Indicates A is a portion of B; may be used for elements of a series
HasPart Indicates A includes the part B
IsReferencedBy Indicates A is used as a source of information by B
References Indicates B is used as a source of information for A
IsDocumentedBy Indicates B is documentation about or explaining A
Documents Indicates A is documentation about B
IsCompiledBy Indicates B is used to compile or create A
Compiles Indicates B is the result of a compile or creation event using A
IsVariantFormOf Indicates A is a variant or different form of B
IsOriginalFormOf Indicates A is the original form of B
IsIdenticalTo Indicates that A is identical to B, for use when there is a need to register two separate instances of the same resource
IsReviewedBy Indicates that A is reviewed by B
Reviews Indicates that A is a review of B
IsDerivedFrom Indicates B is a source upon which A is based
I IsSourceOf Indicates A is a source upon which B is based
IsRequiredBy Indicates A is required by B
Requires Indicates A requires B

5.2.4 Identification of Fragments of Complex Data

In order to recover a subset of a set of data (fragment or subset) that is associated with a DOI, solutions may be used to avoid the unnecessary assignment of many different identifiers for each possible fragment of the original data. Towards this end, the concept of “fragment identifier” is introduced.

This solution is supported by the DataCite registry, which implemented the “Media Fragment Identifiers” (MFIDs), a standard developed by W3C and based on IETF (Internet Engineering Task Force) recommendations, designed to simplify access to such data flows as video or audio. The call is structured as follows:

<scheme name> <hierarchical part> [ ? <query> ] [ # <fragment> ]

Since they are based on the Handle System [8], DOIs can use “Template handles,” which allow an undefined number of parameters to be added to the identifier, inserted after the hash sign (“#”). This solution was taken into consideration by the “Data Citation” working group [9] from the Research Data Alliance (RDA), which recommended it in a dynamic data setting. The technique for extracting data subsets with the aid of parameters is called “data slicing.” In the seismological setting, trials are underway [10] [11] in the context of the European COOPEUS [12], ENVRI [13], and EUDAT [14] projects.

5.3 Italian INSPIRE data registry for spatial data

The national registry of spatial data (National Catalog for Spatial Data, Repertorio Nazionale dei Dati Territoriali – RNDT) [15] was identified as the “databank of national interest [16]” defined as the “set of information collected and managed digitally by public administrations, homogenous by type and content, the knowledge of which may be used by public administrations, also for statistical purposes, for the exercise of their functions and in compliance with the responsibilities and regulations in force.”

The content of the RNDT and its modes of constitution and updating were defined by the Committee for technical rules on the spatial data of the public administrations of the Ministry of Public Administration and Innovation, in concert with the Ministry of the Environment, Land and Sea [17]. Based on this regulatory setting, the RNDT is the national catalogue of metadata regarding the spatial data and services relating to them, available at Public Administrations.

The RNDT publishes the metadata produced and conferred by each accredited administration, which, as provided for by the regulations in force, remains fully responsible for the correctness and updating thereof, as well as for the management and updating of the data to which said metadata refer.

5.3.1 Spatial data of general interest

In describing the content of the RNDT, lawmakers define in detail 110 types of “Data of general interest,” and INGV controls some of them, including:

  • Geodetic networks and monographs of geodetic elements: networks of points with known coordinates relating to a system of common geodetic reference, used for the proper sizing and orientation of the topological-cartographic survey of a large land area, and the related monographs.
  • Digital elevation models: Representation of the land morphology in digital format, including DTM, DEM, DSM, DTED-type and similar representations.
  • National seismic network: Stations and networks measuring and recording seismic activity in progress (shifting ground).
  • Seismic hazard maps of reference for the national territory: representations illustrating the ground’s horizontal peak acceleration values (ag) and the spectral values for various return periods (approved with Order of the President of the Council of Ministers no. 3519 of 28 April 2006, Attachment 1b), to be used in the new technical regulations for constructions approved with the Ministerial Decree of 14 January 2008).

INGV, as Public Administration holding some of the data of general interest, must see to increasing and updating the national registry of spatial data (Repertorio Nazionale dei Dati Territoriali – RNDT) so as to make the consultation of metadata accessible to all through access to the RNDT Catalogue and, in cascade, to meet the obligations of the INSPIRE Directive.

Contribution to the RNDT is also provided for by the “Specifications of the standards for the formats of data and metadata, for their treatment for the purposes of publication (transparency) and reuse (open data), and for the delivery of the software applications,” contained in Attachment 1 of the “Convention between the Civil Protection Department and INGV for the activity of seismic and volcanic surveillance on national territory, technical/scientific consulting, and studies on seismic and volcanic risks [18],” reporting that “to be properly used, all the provided web services and delivered data shall be accompanied by the related metadata describing the data’s properties, characteristics, and history, as well as the description of the individual fields associated with the data tables. These metadata shall be drawn up in compliance with the standards provided for by the national registry of spatial data (Repertorio Nazionale dei Dati Territoriali – RNDT), as per the decree of the President of the Council of Ministers of 10 November 2011.”

Reference is to be made to the RNDT web portal for the “Operative guide for accreditation of Public Administrations,” [19] with the accreditation procedure of Public Administrations required to contribute to the Registry.

5.3.2 Guidelines for Compiling RNDT Metadata

The “RNDT” metadata profile is based on ISO Standards 19115, 19119, and TS 19139, produced by the ISO/TC211 [20] Technical Committee which deals with standards for geographic information. Compliance with the technical rules of the RNDT, in keeping with the ISO standards of reference, ensures simultaneous compliance, with no further obligations, with the European regulations for the implementation of the European INSPIRE Directive [21] as concerns the metadata. The INSPIRE metadata in fact represent a subset of those provided for by the RNDT; therefore, a set of metadata’s compliance with the RNDT profile guarantees its compliance with INSPIRE.

Reference is made to the RNDT web portal [22] for the technical rules for compiling the RNDT metadata for the spatial data that describe in detail the metadata to be associated with the vector data, raster images, and data access services. The Data Management Office will have appropriate IT tools for compiling these metadata in an automated fashion.

[1]Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE), http://eur-lex.europa.eu/legal-content/EN/ALL/?uri=celex:32007L0002
[2]Position Statement on Research Information Systems, November 2016, https://www.scienceeurope.org,/wp-content/uploads/2016/11/SE_PositionStatement_RIS_WEB.pdf
[3]DataCite. https://www.datacite.org/
[4]DataCite metadata scheme. https://schema.datacite.org/
[5]International DOI Foundation. DOI Handbook. https://www.doi.org/hb.html
[6]DataCite. Metadata scheme. https://schema.datacite.org/
[7]Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group (2016). Software citation principles. PeerJ Computer Science. https://doi.org/10.7717/peerj-cs.86
[8]Handle Registry. https://www.handle.net/
[9]Rauber A., Asmi A., van Uytvanck D., Pri511 S. (2015). Data Citation of Evolving Data.
[10]Klump J. and Huber R. (2016). DOI for geoscience data - how early practices shape present perceptions. Earth Science Informatics, 9(1): 123-136. https://doi.org/10.1007/s12145-015-0231-5
[11]Huber R., Asmi A., Buck J., De Luca J.M., Diepenbroek D., Michelini A. (2015). Data citation and digital identifiers for time series data / environmental research infrastructures. Joint COOPEUS/ENVRI/EUDAT PID workshop, Bremen, 25-26 June 2013. https://doi.org/10.6084/m9.figshare.1285728.v1
[12]https://www.coopeus.eu/"
[13]https://envri.eu
[14]https://eudat.eu
[15]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). Art. 59 “Spatial data.”
[16]Legislative Decree no. 82 of 07 March 2005. Digital Administration Code (Codice dell’Amministrazione Digitale – CAD). Art. 60 “Databank of national interest.”
[17]Ministerial Decree of 10 November 2011, Technical rules for defining the content of the national registry of spatial data (Repertorio Nazionale dei Dati Territoriali – RNDT), as well as the modes of initial constitution and updating thereof. http://www.gazzettaufficiale.it/eli/id/2012/02/27/12A01801/sg
[18]http://istituto.ingv.it/images/Convenzioni_DPC/convenzione dpc_Allegato_A_2018.pdf
[19]Guida operativa per l’accreditamento delle Pubbliche Amministrazioni, version 2.0 of 2014, http://www.rndt.gov.it/RNDT/home/images/RNDT_guida_operativa_accreditamento_v2.0_20140725.pdf
[20]ISO Technical Committee on digital geographic information, https://committee.iso.org/home/tc211
[21]
Commission Regulation (EC) No 1205/2008 of 3 December 2008 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards metadata.
[22]Repertorio Nazionale dei Dati Territoriali, http://www.rndt.gov.it

6. Data Management Office

The institution of the Data Management Office responds to the “Professionalism” principle of DP200. This principle enshrines INGV’s commitment to allocating the necessary professional resources for the purpose of implementing the Data Policy.

The Data Management Office has the following purposes, closely interconnected with one another:

  • Promoting the Open Science paradigm;
  • Managing the Data Registry;
  • Providing technical/scientific support in drawing up contracts or conventions that involve managing INGV data.

The Data Management Office promotes the Open Science paradigm, guaranteeing the progressive opening of the data and the improvement of their management in keeping with the regulations in force. Towards this end, the Data Management Office promotes personnel training initiatives and advances proposals for changes in the management of INGV’s data. The data processed by the Data Management Office are, exclusively, Research data as defined in Chapter 1 of this document.

The Data Management Office carries out its activity in compliance with what is defined herein, operating within the “Services Centre for the coordination of the activities in support of Research.”

The Data Management Office is instituted by decision of the Executive Board. Its members are appointed by decree of the President, remain in office for a three-year term, and may be re-elected only once.

6.1 Tasks of the Data Management Office

The Data Management Office pursues its purposes in accordance with an Activity Plan that defines the yearly objectives and describes in detail the times, procedures, and human and financial resources needed to carry out the activity. The Activity Plan is submitted for approval to the General Manager and to the Department Directors, for their respective spheres of responsibility.

During the Start-up and Trial phase necessary for instituting the Data Registry, the Data Management Office will proceed to transfer the data present in the Census conducted in the INGV Sections during the 2016-2017 period into the Data Registry. This transfer will take place according to a list of priorities defined at the same time as the approval hereof by the Department Directors, based on the criteria defined in the previous chapters. As indicated in point 2.3 above, at the end of the first year the Data Management Office will draw up a report to be sent to the Department Directors, suggesting solutions for any criticalities and improvements.

The Data Management Office must have instruments for the Registry’s automated management based on principles of transparency, efficiency, and simplification, integrated with INGV management systems. These instruments must allow the INGV Registry’s metadata to be automatically aligned with the external metadata Registries – both those provided for herein such as DataCite’s DOI Registry or the national registry of spatial data (Repertorio Nazionale dei Dati Territoriali – RNDT) and for other metadata registries that may be considered in the future. At the moment of its institution, the Data Management Office shall insert into its Activity Plan the construction of the aforementioned tools for ordinary management.

The Data Management Office must have a web page appropriately advertised on the institutional web portal, collecting and spreading information on its activity.

The Data Management Office will deal with the monitoring of the data listed in the Data Registry, for the purpose of guaranteeing their consistency and compliance with what is declared in the phase of entry into the Data Registry, and towards this purpose may interface directly with the Data Managers and with the Technical Managers of the databases.

6.2 Composition of the Data Management Office

The composition of the Data Management Office takes account of the Guidelines of Agenzia dell’Italia Digitale (AgID) 2017 [1], adapting them to INGV’s specific operating context and to the performance of the assigned tasks. The Data Management Office is therefore composed of a Coordinator, three Scientific Experts, and an IT Expert, whose CVs must be in line with the content of the AgID Guidelines.

The Coordinator performs the following functions:

  • Organizing the activities of the Data Management Office, while preparing the relevant activity plan and coordinating the human and financial resources assigned to the data management office;
  • Preparing the activities reports and any contributions requested from the Data Management Office for drawing up the institutional programming and reporting documents;
  • Performing the role of Open Data Manager (Data Manager provided for by the AgID Guidelines);
  • Representing the Data Management Office with the INGV’s guidance and governance bodies and in relations with the outside;
  • Collaborating and coordinating with the Legal Affairs and Litigation Sector as concerns legal and regulatory aspects;
  • Collaborating and coordinating with the “Commission for Open Access to Research Contributions”;
  • Collaborating with and supporting the Transparency Manager [2], in the context of the figures assigned to it by the regulations, in order to reinforce the objectives of transparency and of maximum use of open-type public data.

The Coordinator is selected from among INGV’s research and technology personnel possessing the characteristics of Data Manager as provided for by the AgID Guidelines.

The three Scientific Experts perform the following functions:

  • Aiding the Coordinator in all the matters relating to the scientific management of the data;
  • Verifying the suitability of the data in compliance with what is established in Chapter 3;
  • Following the procedure of entering the data into the Registry and into the RNDT in compliance with what is established in Chapters 4 and 5;
  • Following the assignment of DOIs, in compliance with what is established in Chapter 5;
  • Providing indications on the optimization and harmonization of the standards of the data and metadata used by INGV, in compliance with national and international directives and with the good practices proposed by the scientific community.

The Scientific Experts, one for each Scientific Department, are selected among INGV’s researchers and technologists personnel belonging to the three departments.

The IT Expert performs the following functions:

  • Aiding the Coordinator in the technical and IT management of the data;
  • Verifying the function of the databases for the purposes of managing the Data Registry;
  • Providing indications for the optimization and harmonization of the INGV databases, in compliance with national and international directives and with the good practices proposed by the scientific community;
  • Interacting with the Information Services Centre.

The IT Expert is selected from among the Technical Managers listed in the Data Registry. During the Start-up and Trial phase, the IT Expert is selected from among the Technical Managers listed in the Census.

[1]The National Guidelines for the Valorization of Public Information Heritage (Linee Guida Nazionali per la Valorizzazione del Patrimonio Informativo Pubblico) (http://1g-patrimonio-pubblico.readthedocs.ioi) recommend identifying “a clear internal data governance with strategic and specific professional figures,” with the desire to found an “Open Data Working Group (Open Data Team).” In the case of INGV, the Data Management Office also carries out the role of Open Data Working Group. The AgID Guidelines also specify that “for the Open Data Team’s work to be decisive within the Administration, it is important for the team to dialogue with the most political level, both to obtain the necessary ‘impetus’ from it, and to offer proposals and stimuli to political decision-makers.”
[2]pursuant to Legislative Decree no. 33/2013 and subsequent modifications and supplements

Annex 1. Start-up and Trial of INGV Data Registry

In order to initiate the trial of the INGV Data Registry, and in keeping with the content of Chapters 2.3 and 6.1 of the Document Implementing the INGV Data Policy (hereinafter the “Implementation Document”), this Attachment defines:

  1. Activity Plan
  2. List of data to be implemented in the Data Registry during the Start-up and Trial phase
  3. Verification on the data considered in the Start-up and Trial phase
Activity Plan

The Data Management Office (Ufficio Gestione Dati – UGD) is constituted with the appointment of its members, by Decree of the President of INGV.

By no later than a month after it is constituted, the Data Management Office defines and develops the instrument for the Registry’s automated management, in compliance with Chapter 6.1 of the “Implementation Document.”

By no later than six months after it is constituted, the Data Management Office proceeds to implement an initial group of elements identified in Table 1 (30-40%) and provides a preliminary report to the Department Directors describing the activities’ progress, defining the verification activities in detail (e.g. type and number of tests to be carried out), and highlighting and criticalities found, and the possible solutions.

Within twelve months, the Data Management Office completes the implementation of the elements identified in Table 1, and initiates the verification phase.

Twelve months after it is constituted, the Data Management Office provides a final report on the Data Registry Start-up and Trial phase, containing:

  1. Activities carried out (times, procedures, personnel / sections involved, etc.);
  2. Criticalities found, and solutions adopted;
  3. Proposals for the plan for the Data Registry’s Ordinary management.

List of Data

The Census elements of the INGV data deemed suitable for use in the Data Registry Start-up and Trial phase are reported in Table 1. This is in application of the prioritization criteria defined in Chapter 2.3.1 of the “Implementation Document.”

Verification Activities

The Data Registry verification activities must be aimed at ascertaining the actual functioning of the implemented databases (in compliance with the access rules defined in Table 1), and their function in the INGV data management system (if already implemented). Should there be a link to external databases (as defined in the “Implementation Document”), their interoperability must also be verified.