Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Conjure

A declarative approach to conjure RDF datasets from RDF datasets using SPARQL with caching of repeated operations. Conjure enables blazing fast data profiling and analysis on the basis of RDF dataset catalogs.

Concept

  • First, one provides a specification of an abstract collection of input records (RDF resources) which carry references to (RDF) datasets. The use of DCAT dataset specifications is the recommended and natively supported approach.
  • Finally, one provides a job specification whose execution maps each of the input records to an output dataset.
  • The output is a data catalog with provenance information referring to the result dataset files.
dataSources = dataSourceAssembler.assemble()
dataSources.map(processor)

The processor is a mere Spring Bean. Thus, the exact same conjure configuration can be used in arbitrary environments, regardless of single, multi-threaded or cluster setups such as using native Java or Apache Spark.

Building dist packages

The spark bundle suitable for spark-submit and its wrapping as a debian package can be built with

mvn -P dist package

The spark debian package places a runnable jar-with-dependencies file under /usr/share/lib/conjure-spark-cli

Spring boot

Custom contexts can be specified via these jvm options: -Dspring.main.sources=ctx1.groovy,ctx2.groovy

About

A declarative approach to conjure RDF datasets from RDF datasets using SPARQL with caching of repeated operations.

Resources

Packages

No packages published
You can’t perform that action at this time.