Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using snakelog #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ RUN = poetry run
# Tests
# ----------------------------------------
test:
$(RUN) python -m unittest discover -p 'test_*.py'
$(RUN) python -m unittest tests/test_*.py

tests/models/%.py: tests/inputs/%.yaml
$(RUN) gen-python $< > $@.tmp && mv $@.tmp $@
Expand Down
51 changes: 36 additions & 15 deletions docs/basics.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,40 @@
# Basics
# How it works

## Overview

The linkml-dl wrapper works by executing the following steps:

1. The schema is compiled to Souffle DL problem (see generated schema.dl file)
2. Any embedded logic program in the schema is also added
3. Data is converted to generic triple-like tuples (see `*.facts`)
4. Souffle is executed
5. Inferred validation results turned into objects
6. TODO: other inferred facts are incorporated back into objects
- The schema is compiled to Souffle DL problem (see generated schema.dl file)
- Any embedded logic program in the schema is also added
- Data is converted to generic triple-like tuples (see `*.facts`)
- Souffle is executed
- Inferred facts are collected:
- validation results are collected into a results object
- inferred facts are incorporated into new copy of the input object

## Compilation
## Compilation of schemas to Datalog

Assuming input like this:

```yaml
classes:
Person:
class_uri: schema:Person
attributes:
age:
age_in_years:
range: integer
maximum_value: 999
```

The generated souffle program will look like this:

```prolog
.decl Person(i: symbol)
.decl Person_asserted(i: identifier)
.output Person
Person_asserted(i) :- triple(i, RDF_TYPE, "http://schema.org/Person").
Person(i) :- Person_asserted(i).

.decl Person_age_in_years_asserted(i: identifier, v: value)
.decl Person_age_in_years(i: identifier, v: value)
.output Person_age_in_years
Expand All @@ -50,26 +60,37 @@ validation_result(

(note that most users never need to see these programs, but if you want to write advanced rules it is useful to understand the structure)

## Facts
## Conversion of data to Facts

The linkml data file (which can be JSON, YAML, RDF, or TSV) is converted to a triple-like model following the souffle spec:
The LinkML data file is converted to a triple-like model following the souffle spec:

```prolog
.decl triple(s:symbol, p:symbol, o:symbol)
.decl literal_number(s:symbol, o:number)
.decl literal_symbol(s:symbol, o:symbol)
```

Under the hood, this is a two step process:

1. convert the data to RDF using the [standard rdflib dumper](https://linkml.io/linkml/data/rdf.html)
2. convert triple to the tuples above
- each triple is mapped to triple/3 facts
- if the object is a literal:
- it is serialized as a json string
- an additional fact is added mapping this to a souffle number or symbol

Every slot-value assignment is turned into a triple. If the value is a literal/atom then an additional fact is added mapping the node to the number or symbol value.

## Execution

`linkml_datalog.engines.datalog_engine` will do this compilation, translate your data to relational facts, then wrap calls to Souffle

## Parsing
Note that Souffle needs to be on the command line for this to work

Generated programs and facts will be placed in a temporary working directory, unless `-d` is passed.

The engine will then read back all `validation_result` facts and translate these to the LinkML validation data model (influenced by SHACL)
## Parsing

## Inferences
The engine will then read back all `validation_result` facts and translate these to the LinkML validation data model,
and will walk the input object reading any new inferences

Currently other inferred facts are not read back in, but in future a new data object will be created.
62 changes: 60 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,69 @@
# linkml-datalog

Validation and inference over LinkML instance data using souffle
Validation and inference over LinkML instance data using Soufflé


![souffle logo](https://souffle-lang.github.io/img/logo-2x.png)
![linkml logo](https://avatars.github.com/u/79337873?s=200&v=4)

## Caveats
=== "schema"

```yaml
Person:
attributes:
name:
identifier: true
friend_of:
multivalued: true
range: Person
symmetric: true
in_network_of:
range: Person
multivalued: true
annotations:
transitive_closure_of: friend_of
```

=== "data"

```yaml
persons:
- name: p:akira
friend_of: [p:bill]
- name: p:bill
friend_of: [p:carrie]
- name: p:carrie
friend_of:
```

=== "output"

```yaml
persons:
- name: p:akira
friend_of:
- p:bill
in_network_of:
- p:akira
- p:bill
- p:carrie
- name: p:bill
friend_of:
- p:carrie
- p:akira
in_network_of:
- p:akira
- p:bill
- p:carrie
- name: p:carrie
friend_of:
- p:bill
in_network_of:
- p:akira
- p:bill
- p:carrie
```

__Caveats__

This is currently experimental/alpha software!
21 changes: 19 additions & 2 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,28 @@
# Installation

This project requires [souffle](https://souffle-lang.github.io/)
## Install Soufflé

After installing souffle:
This project requires [Soufflé](https://souffle-lang.github.io/), a fast engine
for executing [datalog](https://en.wikipedia.org/wiki/Datalog) programs

* [Install Souffle](https://souffle-lang.github.io/install)

Make sure the souffle executable is on your command line:

```bash
souffle --help
```

## Install LinkML-Datalog

Python 3.9 is required.

Install in the standard way from PyPi:

```bash
pip install linkml-datalog
```

## Docker

Docker containers will be provided in future
22 changes: 21 additions & 1 deletion docs/translations.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Translations of LinkML to Datalog

LinkML is primarily a data modeling language in the mold of
LinkML is primarily a data modeling language in the vein of
JSON-Schema, UML, or a shape language like SHACL. The core is
deliberately simple and does not have complex semantics.

Expand Down Expand Up @@ -68,6 +68,26 @@ This will generate
sibling_of(i,j) :- sibling_of(j,i).
```

## Transitive closures over

LinkML 1.2 will introduce [transitive_form_of](https://w3id.org/linkml/transitive_form_of),
to declare that one slot (e.g. `ancestor_of`) is the transitive form of another slot (e.g `parent_of`)

For now, you can get the same semantics from an annotation:

```yaml
ancestor_of:
annotations:
transitive_closure_of: parent_of
```

This will generate

```yaml
ancestor_of(i,j) :- parent_of(j,i).
ancestor_of(i,j) :- parent_of(j,z), ancestor_of(z,j).
```

## Association classes

Compilation to datalog will also handle associative classes (e.g. reified statements). This is very useful when we want to be able to model
Expand Down
Loading
Loading