Open Data Portal Guide¶
1. Introduction¶
The Open Data portal of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) makes the institutional "Data Registry" publicly available. Portal collects metadata relating to Research Data resulting from the INGV scientific production and / or managed and / or published by INGV.
The Data Registry is the main tool for the implementation of the INGV Data Policy; it responds to the need to govern the multiplicity of published research data, set up shared institutional management practice and manage access to data and their use and reuse. The INGV data policy is based on the fundamental principle of "Open Access". In this sense, INGV adopts a policy that allows free, open, full and timely access to its research data, respecting the principles established in the EU and national legislation, in accordance with its institutional purposes. According to Italian legislation (Legislative Decree 7 March 2005, no. 82. Codice dell’Amministrazione Digitale - CAD. Art. 1, paragraph l-ter), open digital data must have these characteristics:
- be available under the terms of a license that allows their usage by anyone, including for commercial purposes, in a disaggregated format;
- be accessible through information and communication technologies, including public and private telematic networks, in open formats; be suitable for the automatic use by computer programs, provided with the related metadata;
- be made available free of charge or at the marginal costs incurred for their reproduction and dissemination.
The Research Data listed in the Data Registry consists of individual objects or records of a physical or digital nature, at any level of processing, or of organised collections of objects or records. They include the research products needed to validate scientific findings. Here the term "data" refers indifferently to the raw data acquired by the sensor, to the physical sample of any nature and to a product obtained from an analysis of data at all levels of both automatic and manual processing.
The Data Registry catalogues data published by the INGV and is designed to meet the needs of both the “experts” and the general public. The Registry was established in 2018 through the publication of the “Data policy implementation document” and is edited by the INGV Data Management Office. For any doubt about the meaning of the specific terminology adopted in the portal, the “Definitions” section in the “Data policy implementation document” has been made available.
The INGV open data portal is a first step in the process of a progressive adoption of the Open Science paradigm in Research. Its publication takes place with the intent of triggering a virtuous process, both in the daily activity of researchers and in potential data users, a process that in the medium / long term aims at a more complete and efficient sharing of the results of public research.
Below you may find the trend of the publication of new entries in the Open Data Portal since the official creation of the institutional Data Registry in January 2019. Those datasets published prior to 2019 were part of a pilot operation to test the mechanism to publish DOIs and related metadata using the DataCite services.
INGV has three scientific departments:
Earthquakes,
Volcanoes e
Environment.
In the following chart you may see the subdivision of published metadata records by scientific departments, where some of the record may be related to more than one department.
2. Identification of data through DOI codes¶
Scientific method is based on the reproducibility of experimental results. Among the various factors that contribute to the reproducibility of a result, an important place is occupied by the certain identification of data. For this purpose, the Data Registry has set the unique identification of data as a key element. This is closely linked to the data producer and to the characteristics of quality, reliability, and trustworthiness that the producer can guarantee, benefitting the potential data user. From a technical point of view, data identification is entrusted to the DOI (Digital Object Identifier), a code ratified as a standard by the ISO (International Organisation for Standardisation) and managed by the IDF (International DOI Foundation), a non-profit organisation created in 1998. The DOI code is a so-called “persistent identifier”. The persistence of a DOI consists in remaining unchanged over time, even if the web page it is associated with may vary. This feature allows stable pointing to a digital object available on the web, even if it were to be “transferred” from one website to another.
The assignment of a DOI to a digital object guarantees, in addition to the persistence of the link, the association to a series of metadata, i.e. a list of details describing the data referred to with the DOI. In fact, a DOI can only be assigned if a minimum amount of metadata is filled in, the type of which varies according to the nature of the object that can be assigned an identifier. In addition to the data to which INGV has directly associated DOI codes, the Data Registry can also contain owner or co-owner data, whose DOI is assigned by other organisations that manage publication on the web in total autonomy, like some scientific journals or data archives, such as Pangaea, Zenodo or Figshare.
Common metadata such as title, year of publication and the web address where the data can be downloaded, are always associated with the data in the Registry. Other metadata are:
- a brief data description;
- the complete list of authors and collaborators, with an indication of the roles covered by each and their respective affiliations;
- the data ownership i.e. which organisations they belong to;
- the terms of use of the data expressed through one of the Creative Commons licenses, as required by the Data Policy;
- geographic coverage, i.e. the geographic area data refer to;
- time coverage, i.e. the time frame data refer to;
- the projects and organisations that funded the data collection, processing and compilation;
- linked data and the relationships existing with them (e.g.: derived data or data sources);
- publications linked to data.
Thanks to the wealth of metadata associated with each DOI code, it is possible to automate the generation of the corresponding bibliographic citations, regardless of the bibliographic standard adopted. An example of a tool that makes use of this possibility is available at https://citation.crosscite.org, where it is possible to generate citations starting from a DOI code, choosing from thousands of coding formats linked to scientific journals. The INGV portal incorporates in the data sheet of each dataset a ready-to-use bibliographic citation that is generated by using this tool.
3. Consultation via web interface¶
Consultation of data is possible by:
- searching with one or more keywords;
- through a geographical search by drawing an area of interest on a map;
- through the name of one or more persons involved in the data creation.
It is possible to search by combining the first two methods, that is, by entering a keyword and at the same time specifying the geographical area where you want to search for available data. For example, by inserting the term "tsunami", drawing a polygon that encloses Turkey and clicking on "Confirm" under the map, the result will be a list of data showing the term "tsunami" in the title, in the description and / or among the key search terms and referring to the designed area.
Searching by using the name of the people involved provides the list of data where the person is among the main authors or has contributed to some extent.
4. Advanced search¶
In addition to a simple search using keywords, it is possible to search data by specifying more precisely where to look per a word. To use the advanced search, it is required to enter a specific prefix before the searched term to tell the search engine where to look for that word.
The following is a list of the possible prefixes:
-
title
, search the word inside the title
Example:title:*etna*
-
creator
, search for a name, a surname or an ORCID code among the list of authors
Example:creator:*LOCATI*
, orcreator:*Mario*
, orcreator:*0000-0003-2185-3267*
Warning: the surname must be written all uppercase, the name must be written with the first letter uppercase and other letters lowercase -
identifier
, to search for a DOI code
Example:identifier:*10.13127/asmi*
Warning: the DOI code must be written all lowercase -
issued
, to search for data published on a certaind date, the date format is YYYY-MM-DD
Example: to search for a precide dateissued:2017-06-30
to search for anything published in a certain yearissued:2017-*
to search in a period of timeissued:[2018-01-01 TO 2018-07-31]
-
publisher_name
, to search for a publisher
Example:publisher_name:Zenodo
Warning: the search term is case sensitive -
notes
, to search inside the data descritpion
Example:notes:centuries
5. Other methods of consultation¶
In addition to the on-screen consultation, each webpage dedicated to data contains downloadable files with embedded, encoded metadata according to the most popular standards such as:
- DCAT-AP_IT, https://dati.gov.it/dcat-ap-it-v10-profilo-italiano-dcat-ap-0;
- INSPIRE ISO 19115/19139 as per RNDT recommendations, https://geodati.gov.it/geoportale/manuale-rndt;
- DataCite, https://schema.datacite.org/;
- NASA DIF, https://earthdata.nasa.gov/esdis/eso/standards-and-references/directory-interchange-format-dif-standard;
- Schema.org, https://schema.org/;
- OAI-DC, https://guidelines.openaire.eu/en/latest/literature/use_of_oai_dc.
To make automatic access to the Data Registry contents possible, here is some available web services:
- Catalogue Service for the Web (CSW; https://www.ogc.org/standards/cat) of the Open Geospatial Consortium (OGC; https://ogc.org);
- CKAN 2.8 API (Application Programming Interface; https://docs.ckan.org/en/2.8/api/) at https://data.ingv.it/api/3/;
- OAI-PMH, accessible by using the services offered by DataCite at https://oai.datacite.org/oai;
6. Reporting of errors or incompleteness¶
Although all those involved in portal management strive to provide content as complete, updated and correct as possible, some inaccuracies might nevertheless occur while following the inserting procedure. In the event of errors or incorrect information, please report to ufficiogestionedati@ingv.it.
7. License associated to metadata¶
All metadata published by INGV on its Open Data Portal are available free of restriction under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license..