Declarative Logic Programming. Michael Kifer
Читать онлайн книгу.use a similar syntax, where a list of variables appears after the keyword
:find
and the body is given after the :where
keyword. Therefore, Query 1 can be defined as follows (the three dots after the arguments to :find
will return a collection):
In this fashion, Queries 2–4 can be written in Datomic as follows:
Aggregates are handled in Datomic by applying aggregate functions to variables. For instance, Query 5 is written as follows:
1.8.5 Flora-2 and Ergo
Flora-2 [Kifer 2015, Yang et al. 2003] is a knowledge-representation and reasoning system that supports Datalog as a subset of its various reasoning paradigms, such as the ones described in Section 1.4. The bouquet of these extensions is unique to Flora-2 and many were pioneered by that system. The list of these advanced features includes well-founded negation (Section 1.4.1), aggregation (Section 1.4.3), object-oriented extensions based on F-logic and defeasible reasoning (Section 1.4.6), HiLog (Section 1.4.7), logical updates (Section 1.4.8), certain styles of functional programming, explicit logical quantifiers, and more. Flora-2 uses XSB as an underlying inference engine. In particular, it relies on XSB’s top-down evaluation strategy, but also brings in new elements such as dynamic reordering of subgoals. Ergo is a commercial variant of Flora-2 offered by Coherent Knowledge, LLC. It includes additional language extensions and optimizations as well as development tools and connectors to various external data sources (for example, SPARQL endpoints, tabular data and JSON). Our running example of the fragment of the Genealogy Database, expressed in Flora-2, is shown below. To illustrate the object-oriented features of this system, we modify the database schema to use frame-based rather than predicate-based representation. (If we used predicate representation then Flora-2 rules would be similar to XSB’s with minor syntactic differences.)
These declarations specify the types of the attributes for classes Dissertation
and Person
(=>
means “has type”). In this application, these declarations are not required, but they make the schema of the database explicit and easier to grasp.
Since the object-oriented schema above is quite different from the original relational schema in Section 1.8.1, the next pair of rules specifies the “bridge” between the two schemes:
We recall that the symbol ->
means “has value” and thus the first rule above says that, given a tuple 〈d, c, t, g, u, y, ar〉 in table dissertation
and a tuple 〈adv, d〉 in advised
, one can derive that d is an object in class Dissertation
such that: its attribute cid
has value c, the attribute advisor
has a value adv, title
has value t, and so on. Note that an attribute can have multiple values and so, if dissertation d was advised by several advisors, the advisor
attribute for d will be multi-valued.
The next batch of statements contains the actual queries for our running example. It starts with three auxiliary rules that define additional view-methods for classes Dissertation
and Person
. The first rule, for example, says that if there is a Dissertation
-object (denoted by a name-less variable ?
) that has a candidate with Id ?P
(i.e., the attribute cid
has value ?P
) and an advisor ?A
then that dissertation’s advisor is also an advisor for ?P
. The next two rules recursively define the method ancAdvisor
in class Person
.
The last rule is interesting because it appears in the form of a fact, but is really a shorthand for a rule like
but count{…}
is an evaluable aggregate function that can be placed directly as an argument to q5
. This last query provides a glimpse of how logical and functional styles can be mixed in Flora-2.
1.8.6 Experiments
In this section, we report experimental results on two of the four modern systems we have discussed: XSB and LogicBlox. The systems that we have chosen for experiments are representative of two schools of evaluation, top-down and bottom-up. Each school has its own challenges, and we will illustrate those challenges via experiments, and show techniques to tackle them.
An issue pertinent to both schools of evaluation is indexing. During an evaluation, a system needs to retrieve facts for a predicate given some bound arguments. XSB, by default, generates an index on the first argument of each predicate. However, as we will illustrate, that index may not be enough. In such cases, XSB allows the user to specify additional indexing directives. LogicBlox automatically generates relevant indexes, and the user cannot affect the system’s choices. We will illustrate the importance of indexing in XSB via experiments.
An issue pertinent only to top-down evaluation is repeated calls to subgoals, as described in Section 1.5.2.1. We will show how the user specifies tabling directives, and the difference in query evaluation time with and without tabling in XSB.
An issue pertinent only to bottom-up evaluation is that, in its basic form, it does not take the particular query into account. Many methods exist to transform the rules so that the query is taken into account, most notably the Magic Sets transformation [Bancilhon et al. 1986]. We will use demand transformation instead [Tekle and Liu 2010], since it is a simpler method and its space complexity can be exponentially less (in the number of arguments of predicates) than Magic Sets. We will illustrate the effect of demand transformation via experiments in the LogicBlox system.
Another impact on performance for a system is the inclusion of database features such as atomicity and durability. These require various mechanisms, including locks and writes to disk, which impact performance negatively. LogicBlox has various such features, but XSB does not, and the experiment results should be evaluated in this light as well.
Optimization of Datalog queries by transforming rules and queries has been widely studied. These include specialization and recursion conversion [Tekle et al. 2008]; the latter is illustrated for relevant queries for both systems.
We now show our experiments for each query. We performed the experiments on a 2.8 GHz Intel Core i5 with 8 GB RAM, using XSB 3.7 and LogicBlox 4.3.10. We performed each experiment 10 times and took the average times of the runs.
Loading Input Data. Input data load times are important to the viability of a system, but it is a task that need be performed only once, and does not change with respect to a given query. For this reason, we separate loading input data from query answering in our experiments. In order to load the input data into XSB, we run the following query:
and