Bridging to communities: identities and authorities

The text below is an abridged excerpt from our article on semantic interoperability appeared in 2017.

This page contains a brief outline of its final contents.

Identities and vocabularies

In semantic annotation practice, it is common to encounter situations when an abstract observable (such as an individual animal, plant, or a material object such as a delimited volume of matter) must be identified by a “species”, such as a taxonomic or chemical one. For such situations, k.IM recognizes specific types of universals we name identities, which can be bound to observable concepts so that the use of a given identity type becomes mandatory to further specialize the observable:

namespace biology using physical;

agent Individual                                                                                   
  is physical:SelfAssertedBody
  requires identity Species;

In this case, the set of possible identities may be very large or even infinite. Since it is of course impractical to expect that ontologies can list all possible identities, this presents a problem when reasoning must compare concepts at two separate endpoints, as the identity used at one may not be known at the other. Having users create concepts for identities whenever a new one is needed would break interoperability, and the alternative – adding them to the shared worldview on an as-needed basis – would make the worldview prohibitively difficult to coordinate and maintain.

The k.LAB authorities: endorsing and bridging vocabularies within ontologies

In such situations, we use authorities to link authoritative terminologies and ontologies. In k.LAB, authorities are software components that translate terms provided by authoritative terminologies, maintained by standard-defining organizations such as IUPAC for chemical nomenclature, into logical axioms that can be inserted into the namespaces provided in the worldview to create stable concepts that are available at all points of use. Authorities are identified in k.IM by names bound to a specific identity in a worldview:

namespace biology;

abstract identity Species                                                                          
  is Taxonomy
  defines authority GBIF.SPECIES;

This statement binds the GBIF.SPECIES authority to the biology:Species identity, requiring that any concrete biology:Individual is identified using it (based on definition 7, each Individual is in turn bound to adopting a biology:Speciesidentity). For example, a spatial coverage (e.g., a raster GIS dataset) describing the counted occurrences of honeybee individuals (Apis mellifera) per square kilometer could be annotated as follows:

model raster(“data/bees.tif”) 
  as count biology:Individual identified as “1341976” by GBIF.SPECIES per km2;                  

Code 1341976 in the GBIF catalogue is the identifier for the Apis mellifera species, tracking its unchanging taxonomic identity through any changes in nomenclature that may have occurred over time. For increased readability, definition (9) can also be written with a concept declaration that makes the identity explicit for a reader:

agent HoneybeeIndividual
  is biology:Individual identified as “1341976” by GBIF.SPECIES;
										                
model raster(“data/bees.tif”) 
  as count HoneybeeIndividual per km2;

In such situations, the user-defined concept (HoneybeeIndividual) functions as an alias for the GBIF honeybee concept, so that independent uses of the concept will not produce ambiguity, even if different specifications like the above are given and different concept names are used in them. The two specifications above are functionally identical and compile to the same OWL axioms. Within the GBIF.SPECIES authority, producing logical axioms for the GBIF code 1341976 entails verifying that the code is a valid species identifier: a different outcome, such as using a non-existent or, e.g., a family code, would result in a parsing error reported to the user. This mechanism guarantees the ability to reason across namespaces and allows full interoperability of taxonomic names when used at independent and uncoordinated endpoints. Multiple sub-authorities (such as GBIF.FAMILY, GBIF.CLASS, etc.) allow binding different classes of identifiers managed by the same organization. The GBIF web-accessible catalog service provides codes that identify species and other taxonomic names in a stable and reliable way. It also provides metadata, such as labels, common names and broader terms, that are automatically linked to each concept created, allowing full specification of the identity and automated documentation of the resulting informational artifacts.

Authorities built in k.LAB and endorsed in the im worldview at the time of this writing. Each authority uses an external service or vocabulary and can provide one or more views that bridges to a specific type of identity. The concepts produced by authorities carry the URIs of the original concepts as metadata, when those are produced by the corresponding authority.

Authority Views Description
GBIF GBIF.SPECIES
GBIF.CLASS
GBIF.PHYLUM
GBIF.GENUS
GBIF.ORDER
GBIF.FAMILY

GBIF.KINGDOM

Enables direct use in k.IM of the GBIF codes for the specific classes of identities handled by each view. These codes track taxonomic identities of different rank throughout any changes in terminology and nomenclature that may have occurred through time. Each code is validated using GBIF’s web services and metadata are added to resulting concepts, including relationships to parent classifications. GBIF search services are used to provide search facilities built into the k.LAB software, to ease locating and using the GBIF identifiers in k.IM.
IUPAC IUPAC Enables direct use of InChi strings in k.IM, fully specifying molecular composition and structure for any chemical compound. The authority incorporates the excellent software support provided by IUPAC and related academic projects, so that the InChi strings can be located from within k.LAB, and validated. The resulting concepts’ metadata may include other information and produce molecular drawings for the k.IM user to check.
SOIL SOIL.WRB Enables direct use in k.IM of the World Reference Base classification of soils, including bridging to the online vocabulary hosted by FAO, parsing and validation of complex soil taxonomies expressed as WRB classifiers.
AGROVOC AGROVOC.CROP
AGROVOC.PROCESS
AGROVOC.SPECIES
Enables the direct use in k.IM of URIs or URI fragments from the AGROVOC vocabulary maintained by FAO.  At the time of this writing, the three views listed enable access to terms related to crop types, agricultural processes and “commonsense” species identifiers used in agriculture, less specific and not interoperable with the precise taxonomies used in GBIF.

In addition to the identities managed by GBIF, representing the full taxonomic hierarchy from kingdom to variety, k.LAB provides authorities that recognize and interpret: (i) chemical identities (using the InChi naming conventions); (ii) soil taxa according to the World Reference Database nomenclature; and (iii) several classes of agricultural terms provided in AGROVOC (see table). In most cases, authorities provide both validation of identifiers and search facilities, building on services provided by the managing institutions. For example, if a user refers to a chemical compound using a wrongly formatted InChi string, an informative error is reported. In contrast, a correct string can be translated by the IUPAC authority into a molecular diagram for the user to check. Availability of a specific authority within a worldview is equivalent to an endorsement of that authority in it. Authorities, complemented with search tools and validation, such as those provided in k.LAB, provide consistency and a sound annotation discipline in a usage landscape characterized by widespread redundancy and inconsistency. “Bridging” authorities, while not yet attempted, might also be designed to accept terms from one authority and turn them into the same axioms of another covering the same domain. For example, SOIL.USDA may in the future complement the existing SOIL.WRB authority as an alternative source of soil taxonomy identifiers, producing axioms compatible with the latter. This would enable transparent mediation of competing vocabularies and further expand opportunities for interoperability and reuse of existing annotated data.