Data and models: a semantic view
This section will contain the official, up-to-date documentation for the part of the k.IM language that is used to write data and models in k.IM, including examples and “cookbook”-style recipes. This section, like its twin section dedicated to specifying knowledge in k.IM, will be structured in several sub-pages.
This page contains a brief outline of its final contents, which will be completed in 2018.
- Instead of merely “writing model code,” in a semantics-first workflow we define strategies to observe individual concepts, which may contain code to compute observations. Seen this way, data and models are merely two versions of the same thing: data represent “evidence” (still subject to scale choices and assumption) while models represent hypotheses, both of which can be applied to a particular context to build observations of the observable within it. So we can call both data and models definitions of observations.
- Contextualization occurs when a definition of concept X (e.g., elevation) is applied to a context (e.g., a particular region of earth at a specific time and resolution) in order to produce an observation (dataset) of that concept. This applies to qualities (such as elevations) as well as all other types of observables. For example, observing a process will produce qualities that may change over time (if the process affects them). During that time, the act of observing a process may cause the creation of new subjects, events, and relationships, all of which will be contextualized (see below) after creation using models from the semantic web.
- From a philosophical perspective, data and models can be seen as intensional and extensional definition of observations, respectively.
- Observables of different types allow different kinds of models. Most broadly, countable observables (subjects, events and relationships) admit instantiators, which create objects, and contextualizers, which apply to each individual object and explain it by modeling its qualities and behaviors. This clear distinction is one of the most powerful aspects of semantic modeling: an instantiator can be chosen to create, say, all the households in a village; each household still has the chance to be modeled independently, using different models, based on their location, income or other observations created in it by the instantiator. By applying this paradigm across an entire problem area, simple, short models can be written independently to create very sophisticated final applications.
- While semantic specifications in k.IM are declarative and made within the shared worldview, procedural code can be run as actions linked to specific phases of the lifecycle of each observation (e.g., initialization, time transitions). Those actions can be embedded in k.IM using a very powerful derivative of the Groovy programming language, pre-processed to allow natural specification of semantics and its artifacts. The integration with Groovy allows modelers to specify operations from simple data transformations to very complex behaviors and interactions with a few lines of code.
- Examples of writing code in k.IM
- Code written in external programs can be reused within components, which are archive files that contain binary code and knowledge, and expose potentially complex external models (such as climate or hydrological models) to the k.IM language through simple function calls. Components are written using the k.LAB API, allowing modelers to write without regard to input, output, visualization etc., which can streamline and simplify the most time-consuming tasks. Components can be hosted on k.LAB servers and transparently synchronized when needed, without any need for user intervention. Components developed and maintained by the partnership provide integration for complex models such as hydrological models, a revision of LPJ-GUESS for vegetation and agricultural dynamics, the WEKA machine learning library, and many others.
- Examples of extensions in components and APIs