Assisting the Semanticization of data with Vocabularies, Languages,and Tools



Semantic Web researchers at the École des Mines de Saint-Étienne, FR, are interested in making the Semantic Web formalisms and technologies more accessible to the companies and the web of things. This presentation will overview three of our recent contributions that all target this goal.

  1. MINES Saint-Étienne leads a Specialist Task Force (STF) financed by the European Telecommunications Standards Institute (ETSI), with the goal to consolidate the Smart Anything REFerence (SAREF) standard ontology and its community of industrial users, based on the experience of the the EUREKA ITEA 12004 SEAS project (3 years, 15 M€, 35 partners). The SEAS ontology is modular and versioned, and is built on top of core reference ontology patterns that can be instantiated to create the SEAS ontology itself with a homogeneous and predictable structure for the modelling and the description of any kind of engineering-related data/information/systems. Ontology patterns are like design patterns in object oriented programming. They describe structural, logical, or naming, best practices that one can consider when building an ontology.

  2. SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. It offers a simple template-based option to generate RDF Graphs from documents, and presents the following advantages: a) anyone familiar with SPARQL can easily learn SPARQL-Generate; b) SPARQL-Generate leverages the expressivity of SPARQL 1.1: Aggregates, Solution Sequences and Modifiers, SPARQL functions and their extension mechanism; c) it integrates seamlessly with existing standards for consuming Semantic Web data, such as SPARQL or Semantic Web programming frameworks. One can use its Apache 2.0 implementation to generate RDF from web documents in XML, JSON, CSV, HTML, CBOR, and plain text with regular expressions.

  3. The Linked Datatypes initative (LINDT) aims at enabling lightweight descriptions of useful knowledge on the Web of Data, using simple RDF literals empowered by RDF Datatypes. The flagship Datatype is cdt:ucum that can be used to describe measurements with any unit defined in The Unified Code for Units of Measure: a code system intended to include all units of measures being contemporarily used in international science, engineering, and business. Different from using existing vocabularies for quantities and units of measures (, QUDT, OM, ...), SPARQL queries can leverage the native SPARQL operators (=, <, etc.) to compare UCUM literals, and arithmetic functions (+, -, *, /) to manipulate quantity value literals.


Maxime Lefrançois is Associate Professor in the Connected-Intelligence team at the École des Mines de Saint-Étienne, France, since 2017. He prepared his Ph.D. at INRIA Sophia-Antipolis on knowledge representation for the Meaning-Text linguistic theory. Between 2014 and 2017, he was a post-doctoral researcher at École des Mines de Saint-Étienne, and was involved in several bilateral, national, and European projects, including the ITEA2 SEAS project in the context of which he organized a 3 days knowledge engineering workshop with 45 participants, that initiated the development of the SEAS ontology: a modular and versioned ontology built on top of the OGC&W3C SOSA/SSN standard, that consists of simple ontology patterns that can be instantiated for different engineering-related verticals. Maxime is a co-editor of the SOSA/SSN standard, and currently works on injecting the SEAS proposals in the ETSI SmartM2M SAREF European standard ontology. He is the initiator and main developer of the SPARQL-Generate language and the Linked Datatype initiative. He has experience in organizing workshops and tutorials in international events: co-organizer of a ESWC 2018 tutorial "from heterogeneous data to RDF graphs and back", co-chair of a ISWC 2018 workshop on Semantic Sensor Networks, and co-chair of the Workshops and Tutorials at IOT 2018.