User's guide to CHR Grammars (CHRGs)

Version 1.2, 2016

Henning Christiansen
Research group PLIS: Programming, Logic and Intelligent Systems
Department of People and Technology, Roskilde University, Denmark

© Henning Christiansen 2002, 2010, 2011, 2016

Version 1.2 adapted to changes in SWI Prolog 7; expected to run with earlier versions of SWI. It has not been tested with SICStus revently.
Version 1.1, small updates July 19, 2011.
Version 1.0, released December 3, 2010.

This document explains how to use CHRG which is a grammar notation implemented on top of the language of Constraint Handling Rules, CHR. You need to have a recent version of SICSTUS Prolog 4 or SWI Prolog to run the CHRG system.

To read this users' guide, you need to be familiar with Prolog and some knowledge of CHR may also be useful.

Please report any bugs, unclear points, etc. to the author.

Contents

1 Introduction
2 Getting started: An introductory example
3 Syntax and semantics of the CHRG notation
   3.1 Syntax of CHRG source files
   3.2 Syntax of CHRG rules
   3.3 Semantics of CHRG rules
4 Options
5 Predicates
6 Abduction in CHRG
       Compacting abductive solutions
7 Assumption Grammars in CHRG
8 A few notes on efficiency
9 Cautions and error messages
10 Compatibility with version 0

1 Introduction

CHR Grammars (CHRGs) are defined basically as a layer of syntactic sugar on top of Constraint Handling Rules (CHR), analogously to how Definite Clause Grammars (DCGs) are put on top of Prolog, thus filling the indicated corner in the following commutating diagram.

CHRG has become a surprisingly powerful grammar formalism and language analysis system with the following main features. CHRGs work bottom-up and present the following features:

The following paper is the central reference on CHR grammars: Find also other related papers in Henning Christiansen's publication list.

To list of contents

2 Getting started: An introductory example

You need source and example files for the CHRG system in your working directory., get these files here CHRG_source_and_examples.zip.

The following assumes that you have installed and have started a recent version of SICSTUS Prolog 4 or SWI Prolog.

You will need to compile the CHRG source files to use the CHRG system. The simplest way to do this is to do to do that from your own source file, as shown in the following file included with the system with the name simpleCHRG.

:- compile(chrg).

:- chrg_symbols np/0, verb/0, sentence/0.

np, verb, np ::> sentence.
[peter] ::> np.
[mary] ::> np.
[likes] ::> verb.

end_of_CHRG_source.

Assuming that you have a running Prolog session, you can load this file as follows.

?- compile(simpleCHRG).

When the file is loaded, these declarations and rules are first translated into CHR dittos and in turn compiled as any other file containing a mix of Prolog and CHR. (Notice that the file chrg loads the CHR library and the libraries for lists and terms.)

You can also load CHRG from the Prolog prompt as follows, if you prefer, before loading your own CHRG files.

?- compile(chrg).

The source file shown above declares three grammar symbols, for example, np/0 declares a grammar symbol of arity 0 that may be used in the rules that follows. Grammar symbols may have attributes, therefore the arity; the full syntax and more examples is given later.

The first rule says that a sequence np,verb,np is understood as a sentence. Operationally, it means that if a sequence of adjacent np,verb,np is observed within the underlying constraint store, then a representation of sentence is added. The three subsequent rules apply a notation for terminal symbols using square brackets similarly as in DCG. The lefthand side - referred to as the head of the rule - may consist of arbitrary sequences of terminal and nonterminal symbols. Notice that the purposes of lefthand and righthand sides of a rule are opposite to DCG and most other grammar notations; this convention is chosen as to keep syntax analogous to CHR and also to indicate the bottom-up nature of CHRG. In order to execute the parser, call the predicate parse with a list of Prolog atoms in the following way.

?- parse([peter,likes,mary]).

A listing of the text is printed out with a numbering of word boundaries (optional; can be switched off).

<0> peter <1> likes <2> mary <3> 

The result from the parsing process is given as a set of constraints.

all(0,3),
np(0,1),
verb(1,2),
np(2,3),
sentence(0,3),
token(0,1,peter),
token(1,2,likes),
token(2,3,mary) ? 

The two numerical arguments of each constraint refer to word boundaries, and the constraint sentence(0,3) should be understood as "A sentence has been recognized between word boundaries 0 and 3". Token constraints such as token(0,1,peter) represent terminal symbols; the square brackets around terminal symbols in the CHRG rules are nothing but syntactic sugar for token symbols.

The all(0,3) is not interesting in itself; it holds the start and end marker for the entire string and is used for implementing the special all grammar symbol (section 3.3) and the accept and acceptc predicates (section 5).

It may be instructive to take a look at the CHR declarations plus rules produced temporarily by CHRG when a source file is read in:

:- chr_constraint np/2, verb/2, sentence/2, token/3.

np(N0,N1), verb(N1,N2), np(N2,N3) ==> sentence(N0,N3).
token(peter,N0,N1) ==> np(N0,N1).
token(mary, N0,N1) ==> np(N0,N1).
token(likes,N0,N1) ==> verb(N0,N1).

So what the CHRG system basically does is to add two extra arguments to grammar symbols and replace the arrow symbol with its CHR equivalent. You can have the system to print out these gererated rules during compilation; see section 4.

To list of contents

3 Syntax and semantics of the CHRG notation

A CHRG source file can mix the syntax described here freely with CHR and Prolog, so that you may, e.g., define your own auxiliary constraints and predicates and use them within the CHRG rules. You may, thus, also need to refer to doucumentation for SICSTUS Prolog 4 or SWI Prolog.

We describe syntax by means of BNF-like notation and occationally by example. The semantics is described informally and in terms of examples; a formal (and less redable) description can be found in the scientific paper referred to in the beginning of this document.

Typewriter font is used for text to be literally part of a source text whereas italic is used for parts to be replaced by actual text defined elsewhere, i.e., the two fonts characterize terminals, resp., nonterminals in the (meta-) grammar for CHR grammars. Square brackets in roman font [ ] (hopefully distinguishable from square brackets in typewriter font [ ]) denotes optional component.

   3.1 Syntax of CHRG source files
   3.2 Syntax of CHRG rules
   3.3 Semantics of CHRG rules

To list of contents

3.1 Syntax of CHRG source files

The general shape of a CHRG source file is as follows.

:- chrg_symbols spec, .., spec.
[:- chrg_abducibles spec, .., spec.]
rule .
..
rule .
end_of_CHRG_source.

Each spec is of the form Prolog-atom/arity similarly to specs in Prolog or CHR; notice that the mode and type information is not available for declarations of chr symbols and abducibles. The declarations of chrg_symbols and chrg_abducibles define the vocabulary to be used in CHRG rules; recall that you can intermix with arbitrary CHR and Prolog declarations and definitions. You may also write rules in CHR about declared abducibles and even about declared grammar symbols, when you have found out how they are translated into CHR constraints.

For enhanced readability, the CHRG syntax includes but singular and plural synonymes for these declarations, i.e., :-chrg_symbol, :-chrg_symbols, :-chrg_abducible, :-chrg_abducibles and even :-chr_constraint, :-chr_constraints.

When CHRG rules are read in, the translation into CHR rules pays different attention to grammar symbols and the other two categories. Grammar symbols are extended with two extra arguments referring to word boundaries and the other two are applied as they are. Constraints behave as in CHR and you can also write CHR rules about them (and you can even use grammar symbols in CHR rule provided that you add manually two extra arguments). Abducibles behave basically like constraints except that a few extra facilities are added; this is explained in section 6.

As indicated, only the grammar_symbols declaration is required, and the three different sorts of declarations can be given in any order.

The following grammar symbols are implicitly declared by the system and must not appear in declarations provided by the user.

A CHRG source file may contain directives providing various options, some inherited from CHR and some specific to CHRG, see section 4.

Notice that the The purpose of the end marker end_of_CHRG_source is essential; if you leave it out, the system may not observe this and the compilation of your source file will be a strange half-baked thing.

To list of contents To start of section 3

3.2 Syntax of CHRG rules

A grammar rule has the following general form:

[ left-context -\ ] core-of-head [ /- right-context ] arrow [ guard | ] body

All that comes before the arrow is called the head of the rule. A guard left out is considered equivalent to true. The guard may be any test that satisfied the requirements as a gourad in a CHR rule; see documentation for SICSTUS Prolog 4 or SWI Prolog.

The core of the head is defined by the following grammar, where grammar-symbol and constraint refer to terms built from declared symbols and with a number of attributes as indicated by the declaration (the roman font vertical bar | denotes alternatives):

core::=
   grammar-symbol |
   core:(Prolog-var, Prolog-var) |
   optional( grammar-symbol ) | optional( grammar-symbol , substitution ) |
   { any-callable-Prolog-term } |
   [ Prolog-term [ , .., Prolog-term ] ] |
   core , core |    core $$ core

First alternative is a usual grammar symbol; the colon notation in the second alternative makes it possible to unify the word boundaries normally hidden for the user (see section 3.3) to Prolog variables; a variable may be used here as well (will be bound to a comma-pair during execution). The two variants with "optional" are used for grammar symbols that are optional elements in the core-of-head; a possible substitution can be used for assigning variables occurring in the optional part; only activated when the rule applies with no match for the optional core part; see about substitution with the "where" notation below. The next borrows notation from DCG in that nongrammatical symbols (here constraints only) are sourrounded by curly brackets. Next, square brackets for sequences of terminal symbols; comma, the usual sequencing operator; and finally $$ for parallel matching (described below).
NB: The core of a rule must contain at least one grammar symbol. The colon operator should only be applied to instances of core that represent a meaningful sequence of grammar symbols.

A core must contain at least one non-optional grammar symbol. Optional parts are implementated by compiling different versions of the rule, one with and one without the optional part; the assigment is executed at compile time similar to the "where" notation (below). Optional parts may not be applied in context parts.

Example:

optional(a(X),X=no), optional(b), c ::> d(X)
is an abbreviation for the following four rules:
a(X), b, c ::> d(X)
b, c ::> d(no)
a(X), c ::> d(X)
c ::> d(no)

Parantheses can be used as in Prolog in order to group things. The $$ operator has precedence number slightly higher that comma which means that (a,b$$c,d) is understood as ((a,b)$$(c,d)).

Notice that grammar symbols include gaps, "..." and "n...m" for integers 0≤n<m. It is possible to have values of n and m to be determined dynamically referring attributes of other grammar symbols in the head but it is not checked that the actual values are feasible so lower-level error messages may occationally occur; the CHRG system will issue a warning when a rule with dynamic gap bounds are read in.

There is a restriction on the use of gaps in the core of a head so that the core must be bounded defined in the following way. This ensures that the core matches a specific interval of word boundaries when applied (and thus meaningful boundaries for whatever the body might add).
NB; The colon operator should only be applied to instances of core that represent bounded sequences of grammar symbols.

  • A core of the head of a rule is bounded if it is both left and right bounded.
  • A core A1,..,An is left bounded (right bounded) if A1 (An) is not a gap.
  • A core A$$B is left bounded (right bounded) if at least one of A and B is left bounded (right bounded).

The left and right context is either of the same shape as a core or of the form ( core ; .. ; core ) (no restrictions to gaps apply). The use of semicolon in left and right context in a rule is syntactic sugar for the set of rules that results from an unfolding of the semicolon structure.
Example:

(a ; b) -\ c /- (d ; e) ::> f
is an abbreviation for the following four rules:
a -\ c /- d ::> f
b -\ c /- d ::> f
a -\ c /- e ::> f
b -\ c /- e ::> f

A body is in principle any callable Prolog term, but we apply some syntactic conventions as to keep a correspondence with the syntax used in the head.

body::=
   grammar-symbol |
   body:(Prolog-var, Prolog-var) |
   ( body , .., body ) |
   ( body ; ..; body ) |
   ( test ->body ;     ..;     [ test -> ] body ) |
   { any-callable-Prolog-term } |
true | false | fail | term=term | dif(term,term)
test ::= anything that Prolog's conditional notation accepts as a test.

Gaps and the parallel match operator are not allowed in the body.

Notice that when the colon operator is applied in the head of a rule, it typically reads some boundaries from the constraint store, whereas when used in the body, they typically set boundaries (over-riding the default; to be explained); a variable may be used here as bould should be bound to a comma-pair of boundaries during execution (otherwise a failure may occur).

Warning: SICStus Prolog has a syntactic peculiarity which makes it read rules having disjunctions (semicolon) or conditionals in a wrong way. For example, SICStus reads the rule
a(X), b(Y) <:> X > Y -> ab(X) ; ba(Y).
as if it was written
a(X), b(Y) <:> X > Y -> ab(X) | ba(Y).
which gives no sense. To get things right in SICStus, you must add a trivial guard as follows.
a(X), b(Y) <:> true | X > Y -> ab(X) ; ba(Y).
SWI get things right.

As it appears, a few standard built-in Prolog predicates can be written without curly brackets; these predicates are rarely confused with grammar symbols and it looks nicer without brackets when such a single goal comprise the body. Special operators to be used in the body are available for Assumptions Grammars, see section 7.

The arrow is one of ::> in which case the rule is called a propagation rule and <:> in which case the rule is either a simplification or a simpagation rule. The latter two are distinguished by the simpagation rule having one or more symbols prefixed by an exclamation mark "!" (examples and explanation will follow).

Constraints can be used in both head and body of a rule, as in the following example:

:- chr_constraint h/1.
[a] ::> {h(a)}, aa.
[b], {h(X)} ::> bb(X).

According to the semantics described below, the first rule when applied will produce a hypothesis h(a) together with nonterminal symbol aa. This hypothesis makes it possible for the second rule to apply for terminal [b] thus producing nonterminal symbol bb(a).

Finally, it is possible to attach some extra components to a rule, some of which are inherited from CHR, so that where we indicated a rule in the grammar for a source file, the full syntax is as follows:

[ name @@ ] rule [ gpragma pragma ] [where substitution]

The optional name and pragma parts are exactly as in CHR with the same syntax and meaning and not described here; grammar symbols and constraints in the head can be followed by "# Prolog-variable" as in CHR; see documentation of SICSTUS Prolog 4 or SWI Prolog.
Notice that CHR's symbols @ and pragma have been replaced in the CHRG syntax by @@ and gpragma due to subtle implementation problems.

A "where substitution'' part serves to simplify the writing of complicated rules, e.g., with complicated attribute expressions or other parts of the rule that could be separated out.
Example:

a(A) -\ B /- ..., q(X,Y)  ::>  {C}, sentence_in_funny_context(A,Z)

where A = ugly(st(r,u,c(t,u,r(e)))),
      B = (np, verb, np),
      C = (append(X,Y,Z), write(Z))

The meaning is that any occurrence in the rule of A, B, and C is replaced by the indicated term. The transformation is purely syntactical and executed when CHRG reads the rule from a source file.

The where notation can also be applied for Prolog and CHR rules in a CHRG source file.

To list of contents To start of section 3

3.3 Semantics of CHRG rules

We give here an informal overview of the meaning of CHRG rules and facilities. We leave out abducibles and assumptions and refer to the respective sections below.

Encoding of grammar symbols in the constraint store

A string of terminal symbols to be parsed is represented as a set of constraints, where the actual position in the string is represented by boundary numbers. When, for example, the parsing predicate is called as parse([the,chinese,dragon]), the symbol "the" is will span from position 0 to position 1, "chinese" from 1 to 2, and "dragon" from 2 to 3. When grammar rules produce new occurrences of grammar symbols, their boundaries will correspond to the minimum and maximum of boundaries for the "reduced" grammar symbols. When referring to adjacent grammar symbols, it means that end boundary for the first one coincides with start boundary for the second one. Similarly, we may talk about positions "before" and "after" some given position with the usual meaning.

Grammar rules

Grammar rules work bottom-up by recognizing occurrences of grammar symbols in the store, and if a grammar symbol is applied in one rule application, and is not removed from the constraint store, it may also take part in other rule applications.

A propagation rules such as

a, b  ::>  c

means that adjacent occurrences of grammar symbols "a" and "b" will give rise to a an occurrence of "c" spanning over the combined boundaries of "a" and "b", i.e., the start boundary of "c" is the same as the start boundary of "a", and its end boundary is the same as the end boundary of "b". As the rule is a propagation rule, the occurrences of "a" and "b" stays in the store.

A simplification rule

a, b  <:>  c

works in a similar way, except that the occurrences of a" and "b", to which the rule is applied, are remioved from the constraint store.

A simpagation rule is similar to a simplification rule, except that one or more grammar symbols in its head are prefixed by a "!" symbol, meaning that such a grammar symbol is not removed, but stays in the store. For example:

a, !b  <:>  c

This, when this rule is applied, "a" will be removed, "b" stays, and a "c" is added. The head of a rule may also include constraints which, then serve as conditions for the rule to apply. For example:

{con}, a, b  ::>  c

When con is in the store (without any relation to boundaries), the rules applies as a,b::>c; without con, the rule cannot apply. Boundaries for "a", "b", and "c" are calculated as if {con} was not there.

Grammar rules may also contain guards, which has the seme meaning as in CHR. For example:

a(X), b(Y)  ::>  X is Y+Z, Z > 17 | c(Z)

The rule can only apply when the actual occurrences of "a" and "b" have attribute values whose sum is greater that 17.

When a rule is applied, the contents of the body of the rule (its right hand side) is executed by adding and grammar symbols and executiong the code within curly bracket. For example:

a, b  <:>  c, {write(hello)}.

Grammar symbols "a" and "b" are replaced by "c" as above, and the Prolog goal write(hello) is executed.

A grammar rule may contain more that one grammar symbol, for example:

a, b  <:>  c, {write(hello)}, d.

It works similarly as the previous rule, with both "c" and "d" added having that same boundaries, i.e., both span the same substring as "a,b". Grammar symbols "a" and "b" are replaced by "c" as above, and the Prolog goal write(hello) is executed.

Terminal symbols are treated as any other grammar symbols. Consider, for example, the following two rules.

[the], n  <:>  np.
[the,chinese,dragon], blabla <:> chinese_dragon_statement.

These rules works as expected; notice that the subsequence [the,chinese,dragon] will be treated as three different, adjacent grammar symbols.

Left and right context parts in rule heads

A unique feature of CHRGs are their context-sensitive rule. Any rule can be equipped with a left and a right context, that describe patterns that must be present to the left or right of the items that are recognized. Consider, for example, the following four rules.

(1)        a        ::>  b
(2)  c1 -\ a        ::>  b
(3)        a /- c2  ::>  b
(4)  c1 -\ a /- c2  ::>  b

Each one of them may, from an occurrence of an "a" produce a "b" spanning over the same boundaries. For rule (2), however, this can only happen when the is a c1 immediately before the "a"; for rule (3), a c2 is required immediately after the "a"; for rule (4), both are required.

In the case of simplification (or simpagation) rule, the context parts are not removed from the store, only the core in between the special markers.

Together with the gaps described in the following, these context parts are quite flexible; we show more examples below, combining the two.

Gaps in rule heads

A gap does not represent any specific grammar symbol, but is used to indicate the relative positioning of grammar symbols. Consider this example.

a, ..., b <:> ab.

This rule can apply when an "a" and a "b" occurs, with the "a" being zero or more positions before the "b". The boundaries for the new ab are found from "a" and "b". In case of a simplification rule as shown in the example, only the "a" and the "b" are removed from the store. Whatever grammar symbols may be in between them, and perhaps overlapping with them, are not affected.

There are three different kinds of gaps. The one shown above, "...", representing a distance of any size, "...n" represents a distance of exactly n, and "n...m" any distance of sized from (and including) n to (and including) m. Distances are counted such that two adjacent grammar symbols have distance zero. The following rule shows a combination of all.

a, ..., b, ...7, c, 2...1000, d <:> abcd.

The following examples show gaps applied with context parts.

(1)  a, ... -\ b           <:> c
(2)  a, ... -\ b /- ...5, d <:> c

Rule (1) means that a "b" can change into a "c" provided that there is an "a" somewhere before the "b"; rule (2) requires, furthermore, that the "b" is followed by a "d" exactly 5 positions to the right. Rules like this may be useful to formalize conditions that apply in descriptions of biologocal sequences data such as DNA, RNA and proteins.

Parallel matching

As the text can be read in alternative ways (e.g., when propagation rules are used) and these "readings" may be present in the constraint store at the same time, it gives good sense to introduce rules that require more that one pattern to match the string to be recognized. For example:

a,...,b $$ c ::> d.

When an "a" is followed by a "b" in a certain distance, and the same boundaries also covers a "c", we produce a "d". In case of simplification rules, all alternative grammar symbols matched in the head will be removed. The following is an example of a simpagation rule with parallel matching.

![what], ... $$ sentence <:> question.

Thus, a sentence starting with [what] is converted into a question without removing the [what].

Notice that parallel matching, gaps and left-right contexts can be combined in arbitrary ways. However, a rule can only have one left and one right context part; these cannot be separately specified for parallel alternatives (which anyhow would not add any new expressive power).

Referring to word boundaries in rules

Word boundaries can be made explicit in rules; in the head it will assign or check values and in the body, add explicit boundaries. We illustrate this by some examples.

(1)  a:(0,_) <:> b.
(2)  a:(X,Y) <:> {X10 is X+10, Y10 is Y+10}, b:(X10,Y10).
(3)  a:Range <:> {auxiliary_predicate(Range,RangeNew)}, b:RangeNew.

The first rule will apply to an occurence of a if it appears at the beginning of the string only. Rule (2) will replace any occurrence of a moved 10 positions to the right (which will be a weird thing to do). Rule (3) unifies the variable of Range with a comma structure giving the boundaries and use an addition predicate, expected to be programmed in Prolog, to set new boundaries for of b.

The following example shows a useful application using also parallel matching and gaps.

a:R1, ... $$ ...,a:R2 <:> overlap(R1,R2) | a.
overlap((A1,B1),(A2,B2)):- A1 < B2, A2 < B1.

Using the auxiliary predicate overlap defined as shown in the guard, the grammar rule will join any two overlapping occurrences of a into a new a. By recursion, the rule reduces the occurrences of as to those that correspond to maximal substrings that can be reached by jumping between overlaps of the original as.

Other constructs for the body

The section on Syntax of CHRG above indicates also that conditionals and disjunction can be used in rule bodies; here are some examples.

(1)  a(X), b(Y) <:> true | X > Y -> ab(X) ; ba(Y).
(2)  a(X), b(Y) <:> true | ab(X) ; ba(Y).

As noticed above, the trivial guard in nedded in SICStus Prolog, but can be avoided in SWI. Rule (1) will deterministically determin, based on the values of X and Y which of ab(X) and ba(Y) that will replace a(X),b(Y); rule (2) sets of a choice point, so that firstly a(X),b(Y) is replaced by ab(X), and under backtracking, this may be changed into ba(Y).

To list of contents To start of section 3

4 Options

The following can be set in your CHRG source files.
DirectiveDefault PurposeRestrictions
:- chrg_option(show_text,{on,off}). on Print out tokenlist given to the parse predicate with numbered word boundaries.
Note: Each token is printed when actually called, so it may be repeated under backtracking.
Can be switched on and off any time.
:- chrg_option(show_rules,{on,off}). off Whenever a CHRG rule is read from a source file, print its translation into CHR. Useful for debugging and for understanding CHRG.
Note: You will also see a lot of rules generated automatically by the system that serves al sorts of internal purposes.
Can be switched on and off any time.
:- chrg_option(lr_optimize,{on,off}). off Optimize translation into CHR by adding passive pragmas to all but leftmost symbols of core and possible right context. NB: For grammars of propagation rules with only right contexts, the semantics is not changed. When left and right context or simplification or simpagation rules are used, there are subtle cases where a rule is not applied. Can be switched on and off any time; in source files, will be in action for rules following.
NB: Works at CHRG compile time, so if you want to change already compiled rules, compile your file once again.
:- chrg_option(compact_abduction,{on,off}). off When on, abductive CHRGs tend to produce abductive solutions with fewer elements before others by continually trying to unify new abducible with old ones, however with all solutions generated on backtracking. See below for details, Can be switched on and off any time; in source files, will be in action for rules following.
NB: Works at CHRG compile time, so if you want to change allready compiled rules, compile your file once again.
:- chrg_option(verbose_AG,{on,off}). off When on, answers including unsatisfied expectations are allowed; all auxiliary constraints and unused/unsatisfied assumptions are given as part of the answer.
When off, unsatisfied expectations are not allowed in final state and all auxiliaries are removed before answer is printed. If optionCGHR(show_text,on), backtracking needed to get rid of answers with unsatisfied expectations can be observed.
Can be switched on and off any time.

The CHRG system does not reset the options when a file is recompiled, which may occasionally lead to other results than you expect. If you are switching options on and off in your source files and recompile, it may be useful to include the following directive in the start of the file.

:- set_default_chrg_options.

It may also be called (without the ":-") from the command line.

To list of contents

5 Predicates

The CHRG system offers a collection of useful auxiliary predicates which are listed below. Predicates related to abduction and assumption grammars are described in the respective sections.

PredicateMeaning Remarks
parse(list-of-tokens) Converts a list of tokens into a sequence of call of token constraints. Its use described in section 2.
See also option show_text in section 4.
accept(grammar-symbol) If possible, unifies grammar-symbol with a grammar symbol in the constraint store that covers the entire string; otherwise it fails. Intended to be called following a call to parse.
acceptc(grammar-symbol) Similar to accept but it also removes all constraints in the constraint store. Useful to avoid having the constraint store printed out as part of the answer.

To list of contents

6 Abduction in CHRG

The background for abduction in CHRG is described in the journal paper referenced in the beginning if thus document. If you are not familiar with abduction in logic programming, you are adviced to follow the references in this paper. We give here a brief introduction and describe the facilities in the CHRG system that support abduction in language interpretation.

Abducible predicates works more or less the same way as CHR constraints in that they can be used within heads and bodies of CHR rules as well as CHRG rules; in the latter, withing curly brackets. However, using an abducibles' declaration exemplified as follows provides some additional facilities.

abducibles h1/1, h2/1.

This makes predicates h1, h1_, h2, and h2_ available as constraints. Those ending with an underline are negated version so that h1_(7) represents an approximation to the logical negation of h1(7). It is not allowed to declare an abducible predicate whose name ends with a underscore.

The system generates automatically CHR rules that prevent, say, h1_(fido) and h1(fido) to exist in a constraint store at the same time. However, the usual axiom for logical negation, stating that we must have either h1_(fido) or h1(fido) is not supported.

CAUTION:
When doing abduction with CHRG, you should ensure that the grammar is syntactically unambiguous - otherwise abducibles for different interpretations of the text are cluttered up in the constraint store.

Integrity constraints are rules that describe general properties of abducibles. They are conveniently written. Assume for example that h1 and h2 describe two incompatible properties; this can be defined by a CHR rule as follows:

h1(X), h2(X) ==> fail.

This integrity constraint will effectively prevent incompatible pairs of hypotheses to be create so that other choices are tried instead.

The body of an integrity constraint may contain other abducibles and built-in Prolog predicates, quite often a unification as in the following which is relevant when family relations are learned:

father_of(X,Y), father(X,Z) ==> X=Z

I.e., any given value for X can have at most one father.

If the set of abducible atoms is finite for a given abducible predicate, say {h(a),h(b)} for predicate h, this can be expressed by the following integrity constraint.

h(X) ==> (X=a ; X=b).

Notice: In the methodology we have devised for abduction, abducibles have no place in the head of a grammar rule. However, we do not prohibit this in case someone could find new ways to extend our methods by somehow merging grammar rules and integrity constraints.

Abducibles declared in this way can also be used in Prolog predicates defined in your CHRG soiuce files.
Example:

p(X):- h1(X), q(X,Y), h2(Y).
p(X):- h2_(X).

Predicate p can now be used in grammar rule for hypotheses that must be satisfied, and the combined semantics of CHRG, CHR, and Prolog will ensure that the two different ways to derive p(X) are tried out on backtracking.

For an example of abduction in CHRG, see the file abductiveGarfield.

Important notice: You can use the CHRG system for abductive logic programming with integrity constraints. Declare abducible predicates, add integrity constraints as CHR rules and predicate definitions in Prolog (keeping the above mentioned restriction in mind), and forget everything about the grammar rule notation.
(But you still need to end your file with end_of_CHRG_source.)

Compacting abductive solutions

The final state may include abducible atoms with variables with the meaning that any ground assignment to such variables (not conflicting with integrity constraints) represents a solution to the abductive problem. Consider as an example the following set of abducible atoms returned as answer H={h(A), h(B), h(C)}. This set may be separated into different classes io identical and different abducibles in several ways (some perhaps excluded by integrity constraints):

All different.
A=B, C different.
A=C, B different.
B=C, C different.
A=B=C.

Thus H may contain solutions such as {h(a), h(b), h(c)} and {h(d)} that both may be minimal, but in some cases there may be reasons to prefer the one with the fewest elements.

The CHRG system provides an option

:- chrg_option(compact_abduction,on).

which dynamically tries to unifiy as many abducibles as possible, trying out alternatives on backtracking. Alternatively, instead of using this option, you may use the predicate compact_abducibles in order to have the smaller solutions produced, but as a post-parsing phase, as follows:

?- parse([a,neat,little,story]), compact_abducibles.

(The combined designer and implementer of CHRG has forgotten why he added this facility.)

Most earlier approaches to abductive logic programming are inherently based on the principle that we called compaction above. In our own publications on abduction, we have argued strongly against taking this principle as default, as we find it intuitively wrong. Furthermore, it may often lead to combinatorial explosions. - We have included compaction for compatibility and comparison only.

To list of contents

7 Assumption Grammars in CHRG

For a background on Assumption Grammars, see the paper:

Dahl, V., Tarau, P., Li, R.,
Assumption grammars for processing natural language. Proc. Fourteenth Int'l Conf. on Logic Programming. pp. 256-270, MIT Press, 1997.

The present version of Assumption Grammars is described in more detail in the journal paper referenced in the beginning of this document..

The CHRG notation is extended with the prefix operators +, *, -, =+, =*, =- which can be used in the body of CHRG rules; these operators should not be placed inside curly brackets but be placed at the same level as grammar symbols. ((This for implementation reasons: The CHRG compiler needs to manipulate some of these operators and by convention, it does not analyze what is inside curly brackets.))

The operators can prefix arbitrary hypotheses that need not be seperately declared. (In fact if a constraint, grammar symbol, or abducible h/1 is declared, it will be unrelated to h/1 used with the Assumption Grammar operators). Their meaning is the following where h is some arbitrary function symbol; arity can be arbirary.

+h(a)Assert linear assumption h(a) for available for subsequent text
Linear means "can be used once".
*h(a)Assert intuitionistic assumption for subsequent text
Intuitionistic means "can be used any number of times"
-h(X)So-called expectation: consume/apply an assumption and unify its argument with X.
=+h(a), =*h(X), =-h(X)as above but textual order of assertion and application/consumption can be arbitrary.

In many examples, the argument to operators with a star or plus are ground but that need not be the case. Operators including "=" are called time-less, the others sequential.

An expectation is satisfied whenever it has been matched with a corresponding assumption given by by a suitable one of +, *, =+, or =*. For a parse to succeed, all expectations must be satisfied. To ensure this, the system may backtrack if it reaches a final state that includes unsatisfied expectations; if you have :- chrg_option(show_text,on) (which is default) you will see this backtracking.

You can decide to accept final states with unsatisfied expectations using optionCGHR(verbose_AG,on); see section 4.

CAUTION:
When using Assumption Grammars in CHRG, you must ensure that the grammar is syntactically unambiguous - otherwise assumptions for different interpretations of the text are cluttered up in the constraint store.

For an example of an Assumption Grammar in CHRG, see the file simpleAG.

To list of contents

8 A few notes on efficiency

The inherent time complexity of a given grammar of CHRG is highly dependent on the grammar.

For locally unambigous grammars (no node ever involved in more than one syntax tree apart from subtree relations) that do not use extra constraints or abducibles, complexity is linear.

This may also be obtained by using simplification rules only and-or using :- chrg_option(lr_optimize,on).

For arbitrary grammar without attributes: Cubic.

For grammars with high degree of local ambiguity and attributes, the complexity can be very high. The following grammar is a worst case:

[a] ::> a(0)
a(T1), a(T2) ::> a(t(T1,T2))

This is due to the explosive number of different parsetrees that can be constructed - so give it a string of more that 30 a's and you'll never get the result.

Finally, it should be mentioned that grammars with an extensive use abduction with compaction, or assumptions very often can run into combinatorial explosions. If your application depends heavily on assumptions, and you experience scale-up problems, you want to know that there implementation of assumptions are a bit coarse and that there is lot of room for improvements.

To list of contents

9 Cautions and error messages

The CHRG system gives to a large extent meaningful error messages when declarations and rules do not satisfy the various restrictions described (or implicitly assumed) above. Some errors are not identified by CHRG but by the underlying CHR system and some even later at run time.

General advice:
If CHRG stops reading your file with an incomprehensive error message, try to set
:- show_rules(on).
and try again. This may indicate how far you got in the file before things went wrong.

Another general advice:
In case CHRG, or the underlying CHR system, stops compilation due to some error, the system is not good at cleaning up, so if you experience strange things when trying to compile you source file again, close down Prolog and the CHRG system and start it again.

The following things are not checked and may give confusing behaviour:

To list of contents

10 Compatibility with version 0

The syntax for rules is unchanged, but declarations of various sorts have been changed into a style that is more in line with the current CHR syntax. A few facilities have been removed, which anyhow was not that useful.

It should not take long time to adapt a program written for version 0 to version 1, and we have added gaps with a fixed length.

A bug has been fixed, some subtle cases of rules with gaps could not be compiled correctly in version 0.

To list of contents