Semi-automatic RDFization of heterogeneous object description



A major challenge in cyber-physical environments is the increasing heterogeneity that complexifies access to devices. This challenge can be addressed by hingin, a linked-data compatible property-graph-based platform promoted by Orange. Besides being able to represent cyber-physical environments at a system-level, hingin can provide uniform access to heterogeneous devices in them. However, a necessary condition for hingin to satisfy this goal is to have a description of these devices. Nevertheless, the heterogeneity related to devices are well present in their descriptions and mainly include syntactic and semantic heterogeneity that occurs as a result of the varying data formats and vocabularies respectively. Consequently, inputting these descriptions in hingin remains a challenge. To tackle this challenge, standards and technologies that were originally conceived for the Semantic Web can be used. More specifically, RDF may be used to handle syntactic heterogeneity by acting as a lingua franca as it is independent of data formats.

Also, it may be used to resolve semantic heterogeneity by using vocabularies and ontologies to eliminate the ambiguity of terms as it fixes their interpretations. However, transforming the existing description of objects to RDF is again challenging. Mapping languages can be used to encode the transformation.

However, their usage is complex even with the intervention of human experts as it involves manually considering data elements from the device description and looking for ontology terms to which they can be mapped.

Thus, in this work, our aim is to provide a semi-automatic and generic approach to facilitate the generation of RDF from heterogeneous device description. We chose a semi-automatic approach to compensate for the lack of semantics in device description and potential imprecision in the final transformation. Moreover, our approach is generic in that it is independent of hingin and can interoperate with any other platform via its RESTful API.

Our approach takes as input the raw description extracted from sources such as the object manuals, keywords that describe schema elements in the latter description and a set of ontologies. It outputs possible mappings to transform the object description to RDF. To generate these mapping rules, the approach first identifies ontology entities that can be used to model individual schema elements from the input data by calculating a similarity score between them. If the score exceeds a certain threshold specified by the human expert, an ontology entity is considered a suitable mapping for a schema element. Using the latter mappings, the mappings rules are generated. Finally, the human expert chooses, modify and refine one of the mappings that are finally used to transform the original data to RDF.