Implementing Data.gov
and Geodata.gov.gr with FOSS4G



Dr. Angelos Tzotsos

OSGeo Charter Member

pycsw Developer


FOSSCOMM 2015


Outline

  • Introduction
  • CKAN
  • pycsw
  • Data.gov
  • Geodata.gov.gr
  • Demo

Introduction

Introduction

  • Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.
  • The goals of the open data movement are similar to those of other "Open" movements such as open source, open hardware, open content, and open access.

Introduction

  • Data.gov is the home of the US government's open data.
  • You can find Federal, state and local data, tools, and resources to conduct research, build apps, design data visualizations, and more.
  • The Data.gov team works at the U.S. General Services Administration, and the whole project is open source.

Introduction

  • Geodata.gov.gr is providing open geospatial data and services for Greece, serving as a national open data catalogue, an INSPIRE-conformant Spatial Data Infrastructure, as well as a powerful foundation for enabling value added services from open data
  • Operating since 2010, geodata.gov.gr was one of the first open data catalogues in the world, contributing to the national and international open government agenda.
  • It is designed, developed, and maintained by IMIS/Athena RC, with the aim to provide a focal point for the aggregation, search, provision and portrayal of open geospatial information.

CKAN

CKAN: An abbreviation for Comprehensive Knowledge Archive Network

Open Source web platform for publishing and sharing data with impressive deployment history:


pycsw

pycsw is a OGC CSW server implementation written in Python.

pycsw is an Open Source project released under the MIT license.

pycsw

pycsw is certified OGC Compliant, and is an OGC Reference Implementation

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

This product conforms to the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web], Revision 2.0.2. OGC, OGC®, and CERTIFIED OGC COMPLIANT are trademarks or registered trademarks of the Open Geospatial Consortium, Inc. in the United States and other countries.

pycsw has recently graduated OSGeo Incubation

OSGeo Project

CKAN

CKAN

  • CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using open data.
  • CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.

CKAN Features

  • Publish and find datasets
  • Store and manage data
  • Federated nodes
  • Harvesting
  • Metadata Editing/Management
  • APIs and Extensions

Publish Data

Search and Discovery

Metadata

Visualization


ckanext-spatial

  • A spatial field on the default CKAN dataset schema, that uses PostGIS as the backend and allows to perform spatial queries and to display the dataset extent on the frontend
  • Harvesters to import geospatial metadata into CKAN from other sources in ISO 19139 and other formats
  • Commands to support the CSW standard using pycsw
  • Plugins to preview spatial formats such as GeoJSON

Geospatial

pycsw

What is Metadata?

Metadata is often described as “data about data”, or the who, what, where, and when.

In the geospatial world, for each dataset we maintain, we should record information about the data such as:


  • general description
  • location
  • usage restrictions
  • projection
  • technical contact
  • time period
  • date created
  • date modified
  • version

Metadata Standards

  • Dublin Core: established a core/common group of 15 metadata elements
  • FGDC CSDGM: approved by the U.S. Federal Geographic Data Committee originally in 1994 and composed of Sections, Compound Elements, Data Elements
  • ISO 19115: International Standards Organization’s TC211 committee created this in 2003 and is composed of more than 400 “Core”, “Mandatory”, and “Optional” elements
  • ISO 19139: The XML implementation schema for ISO 19115 specifying the metadata record format

OGC CSW Specification

The Open Geospatial Consortium (OGC) OpenGIS Catalogue Service Implementation Specification, currently at version 2.0.2, is a standard for discovering and retrieving spatial data and metadata.

Catalogue Services for the Web (CSW) is the HTTP protocol binding of the Catalogue Service Implementation Specificaton that allows for publishing and searching of metadata.

CSW Operations

  • GetCapabilities (mandatory) - allow clients to retrieve information describing the service instance
  • DescribeRecord (mandatory) - allows a client to discover elements of the information model supported by the target catalogue service
  • GetRecords (mandatory) - get metadata records
  • GetRecordById (optional) - get metadata records by ID
  • GetDomain (optional) - obtain runtime information about the range of values of a metadata record element or request parameter
  • Harvest (optional) - references the data to be inserted or updated in the catalog
  • Transaction (optional) - defines an interface for creating, modifying and deleting catalogue records

Example Requests

pycsw

  • pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]
  • pycsw allows for the publishing and discovery of geospatial metadata

Features

  • Harvesting support for WMS, WFS, WCS, WPS, WAF, CSW, SOS
  • Implements ISO Metadata Application Profile 1.0.0
  • Implements FGDC CSDGM Application Profile for CSW 2.0
  • Implements INSPIRE Discovery Services 3.0
  • Supports ISO, Dublin Core, DIF, FGDC and Atom metadata models
  • Standalone of embedded deployment (CGI or WSGI)
  • Transactional capabilities (CSW-T)
  • Flexible repository configuration (SQLite, PostgreSQL, PostGIS, MySQL)
  • Federated catalogue distributed searching

More features...

  • Simple configuration
  • Extensible plugin architecture (profiles, repositories/backends)
  • Seamless integration with Python environments (e.g. GeoNode, Open Data Catalog)
  • Includes commandline utility to administer the metadata repository
  • Implements the Search/Retrieval via URL (SRU) search protocol
  • Implements OpenSearch
  • Realtime XML Schema validation

Standards Support

  • OGC CSW 2.0.2
  • OGC CSW 3.0.0
  • OGC Filter 1.1.0
  • OGC OWS Common 1.0.0
  • OGC OpenSearch Geo/Time
  • OGC GML 3.1.1
  • OGC SFSQL 1.2.1
  • Dublin Core 1.1
  • SOAP 1.2
  • ISO 19115 2003
  • ISO 19139 2007
  • ISO 19119 2005
  • NASA DIF 9.7
  • FGDC CSDGM 1998
  • SRU 1.1
  • A9 OpenSearch 1.1

Data.gov

History

  • 2009: First version based on Obama's Memorandum on Transparency and Open Government
  • 2011: CKAN initial investigation of pycsw as default ckanext-spatial CSW component
  • 2012: Development started for Data.gov 2.0 as Open Source
  • 2013: OKFN implemented the first prototype as a CKAN 2.0 extension
  • 2013: GSA takes over the extension development and reaches production state
  • 2013: CKAN drops internal CSW implementation in favour of pycsw within CKAN Spatial extension
  • late 2013: pycsw implements new features (Full Text Search for PostgreSQL, repositoring filtering, connection pooling)
  • early 2014: pycsw 1.8.0 is released and deployed on data.gov
  • mid 2014: pycsw implements new features (targeted for 1.10 release)
  • since 2014: pycsw and CKAN are used and maintained in production

Architecture/Components


Data.gov CKAN theme


Spatial Search


Spatial Search


Spatial Datasets


Spatial Datasets Preview


ISO 19115 Metadata


CSW Interface


Installation/Configuration

  • Automated process with Ansible
  • Automated packaging to RPMs
  • Database, Front-end and Harvester clusters
  • CentOS 6

Geodata.gov.gr

PublicaMundi

Scalable and Reusable Open Geospatial Data

EU FP7 Project (STREP/ICT)

Goals

Research and develop methodologies, as well as scalable, reusable tools to facilitate:

  • the publication
  • discovery
  • and reuse




of open geospatial data


GNU

Free and Open Source
Software (FOSS)


  • PublicaMundi development is based exclusively on the OSGeo stack
  • Based on CKAN open data catalogue
  • PublicaMundi spatially extends CKAN using OGC standards
  • Source code, Issue Tracker on GitHub


GNU OSS

OGC standards and INSPIRE

  • Discovery Services
  • View Services
  • Download Services
  • Processing Services

Earth Observation Big Data

  • Integration with rasdaman
  • Integration with ZOO WPS
  • Raster processing services based on GRASS GIS, OrfeoToolbox, Saga GIS
  • WCPS and WPS support

System Architecture


Contributions

  • OGC OpenSearch Geo/Time: First implementation of the new OpenSearch specification through pycsw
  • OGC WPS 2.0: First implementation of the new specification through ZOO Project
  • OGC WCST 1.0: New specification driven around the developments of PublicaMundi and rasdaman
  • OGC CSW 3.0.0: First reference implementation of the new specification through pycsw
  • GeoDCAT: PublicaMundi funded contributions to the new specification

Integration Environment

  • Beta deployment of software to labs.geodata.gov.gr
  • The servers of the project were installed on the data center of Greek Ministry of Education
  • The integration environment of PublicaMundi is deployed on top of the Synnefo cloud stack, within a number of virtual machines

VM clusters

The software components of PublicaMundi are deployed initially into 8 virtual clusters, with the provision of spinning up more virtual machines into each cluster if necessary.

  • Database cluster
  • CKAN cluster
  • GeoServer cluster
  • Rasdaman cluster
  • ZOO cluster
  • Proxy/Analytics cluster
  • Tiles/Caching cluster
  • Storage cluster

GeoServer cluster

Monitoring

Deployment

PublicaMundi utilizes Ansible Playbooks in order to deploy software to the integration environment, starting from empty Debian 7 virtual machines, with only network and ssh root access being preconfigured from Synnefo

Geodata.gov.gr theme

Geodata.gov.gr theme

Geodata.gov.gr theme

Publishing Workflow

  • Added support for INSPIRE metadata
  • Added support for Geospatial datasets (raster, vector)

Metadata Editor

Administrators Dashboard

Vector support

Vector support

Raster support

Full CSW support

Mapping API

Mapping API

Data API

MapClient

MapClient

Consortium


Athena IMIS
Rasdaman
Geolabs
GET

Demo

How to Access Data

QGIS MetaSearch plugin

OWSLib



>>> from owslib.csw import CatalogueServiceWeb
>>> from owslib.fes import PropertyIsLike
>>> from owslib.fes import BBox
>>> csw = CatalogueServiceWeb('http://catalog.data.gov/csw-all')
>>> csw.identification.title
>>> csw.getrecords2()
>>> csw.results
{'matches': 432392, 'nextrecord': 11, 'returned': 10}
>>> q = PropertyIsLike('csw:AnyText', 'oregon thermal springs')
>>> csw.getrecords2(constraints=[q])  # freetext
>>> csw.results
{'matches': 8, 'nextrecord': 0, 'returned': 8}
>>> for key, value in csw.records.iteritems():
...     print value.title
... 
>>> bbox=BBox([-136.8, 35.3, -101.4, 51.6])
>>> csw.getrecords2(constraints=[q, bbox])  # freetext OR spatial
>>> csw.results
{'matches': 14, 'nextrecord': 11, 'returned': 10}
                

Thank you

Questions?