author | James Cheney <jcheney@inf.ed.ac.uk> |

Thu, 09 Aug 2012 15:05:56 +0100 | |

changeset 4323 | 8efc0cc1e9e0 |

parent 4322 | d55c694e9994 |

child 4324 | 2e942c753bd4 |

* Draft of overview section

model/prov-constraints.html |

--- a/model/prov-constraints.html Thu Aug 09 14:55:52 2012 +0100 +++ b/model/prov-constraints.html Thu Aug 09 15:05:56 2012 +0100 @@ -784,9 +784,264 @@ </div> <div class="note"> - James to add prose + In progress, outline to be filled in </div> + <p> + This section collects common concepts and operations that are used + throughout the specification, and relates them to background + terminology and ideas from logic [[Logic]], constraint programming + [[CHR]], and database constraints [[DBCONSTRAINTS]]. This section + does not attempt to provide a complete introduction to these topics, + but it is provided in order to aid readers familiar with one or more + of these topics in understanding the specification, and to clarify + some of the motivations for choices in the specification to all readers. + </p> + + + <h2>Constants, Variables and Placeholders</h2> + <p> + PROV statements involve identifiers (URIs), literals, + placeholders, and attribute lists. However, in order to specify + constraints over PROV instances, we also need <em>variables</em> + that represent unknown identifiers, literals, or placeholders. + These variables are similar to those in first-order + logic [[Logic]]. A variable is a symbol that can be replaced by + other symbols, including either other variables or constant + identifiers, literals, or placeholders. In a few special cases, we + also use variables for unknown attribute lists. + </p> + + <p>Several definitions and inferences conclude by saying that some + objects exist such that some other formulas hold. Such an inference + introduces fresh <em>existential variables</em> into the instance. + Such a variable denotes a fixed object that exists, but its exact + identity is unknown, so we often refer to such variables as + <em>existential variables</em>. We do not allow existential + variables that stand for unknown attribute lists. </p> + + <p>In particular, many + occurrences of the placeholder symbol <span class="name">-</span> stand for unknown + objects; these are handled by expanding them to existential + variables. Some placeholders, however, indicate the absence of an + object, rather than an unknown object. In other words, the + placeholder is overloaded, with different meanings in different + places. + </p> + <p>An expression is called a <em>term</em> if it is either a + constant identifier, literal, placeholder, or variable, and write + <span class="math">t</span> to denote an arbitrary term. +</p> + + + <h2>Substitution</h2> +<p>A substitution is a function that maps variables to terms. Concretely, since we only + need to consider substitutions of finite sets of variables, we can + write substitutions as <span class="math">[x_1 = t_1,...,x_n=t_n]</span>. A substitution + <span class="math">S = [x_1 = t_1,...,x_n=t_n]</span> + can be <em>applied</em> to a term as follows. +<ol><li> + If the term is a variable <span class="math">x_i</span>, one of the variables in the + domain of <span class="math">S</span>, then <span class="math">S(x) = t_i</span>. + </li> + <li> + </li> + </ol> +</p> + + + + + + <h2>Formulas</h2> + <p> + For the purpose of constraint checking, we view PROV statements + (possibly involving existential variables) as + <dfn>formulas</dfn>. An instance can be viewed as "theory" in + logic, that is, a set of formulas all thought to describe the same + situation. The set can also be thought of a single, large formula: + the conjunction of all of the atomic formulas. + </p> +<p>The atomic constraints considered in this specification can be + viewed as atomic formulas:</p> +<ul> + <li>Uniqueness constraints...</li> + <li>Ordering constraints... + </li> + <li>Typing constraints... + </li> + <li>Impossibility constraints</li> + </ul> + <p> Similarly, the definitions, inferences, and constraint rules in this + specification can also be viewed as logical formulas.</p> +<ul> + <li> + A definition of the form "A IF AND ONLY IF there + exists y1...ym such that B1 and ... and Bk" + can be thought of as a formula "\forall x1,....,xn. A <==> \exists y1...ym such that B1 and ... and Bn", where x1...xn are the + free variables of the formula. +</li> +<li>An inference of the form "IF A THEN there + exists y1...ym such that B1 and ... and Bk" can be thought of as a formula "\forall x1,....,xn. A ==> \exists y1...ym such that B1 and ... and Bn", where x1...xn are the + free variables of the formula. +</li> +<li>A uniqueness, ordering, or typing constraint of the form "IF A THEN C" can be viewed as a formula + "forall x1...xn. A ==> C". </li> +<li>A constraint of the form "IF A THEN INVALID" can be viewed as a formula + "forall x1...xn. A ==> false". </li> + </ul> + + + <h2>Satisfying definitions, inferences, and constraints</h2> + <p> + In logic, a formula's meaning is defined by saying when it is + <em>satisfied</em> in a situation. Likewise, we can view + definitions, inferences, and constraints as being satisfied or not + satisfied in a PROV instance. + </p> + +<h2>Merging</h2> + + <p>Merging is an operation that takes two terms and compares them to + see if they are equal, or can be made equal by substituting an + existential variable with another term. This operation is a special + case of <em>unification</em>, a common operation in logical + reasoning, including logic programming and automated reasoning. +</p> + + + <h2>Applying definitions, inferences, and constraints</h2> +<p>Formulas can also be interpreted as having computational + content. That is, if an instance does not satisfy a formula, we can + often <em>apply</em> the formula to the instance to produce another + instance that does satisfy the formula. Definitions, inferences, + and uniqueness constraints can be applied to instances: +</p> + <ul> + <li> + A definition of the form "A IF AND ONLY IF there + exists y1...ym such that B1 and ... and Bk" + can be applied by searching for an occurrence of <span class="math">A</span> and replacing + it with <span class="math"> B1 and ... and Bk</span>, generating fresh existential + variables y1...ym. The formula <span class="math">A</span> is redundant, and can be + removed from the instance. +</li> +<li>An inference of the form "IF A THEN there + exists y1...ym such that B1 and ... and Bk" +</li> +<li>A uniqueness constraint of the form "IF A THEN t = t'" can be + applied by searching for an occurrence of <span class="math">A</span> in the instance, and + if one is found, merging the terms <span class="math">t</span> and <span class="math">t'</span>. </li> + + </ul> + + <p>The process of applying definitions, inferences, and constraints + to a PROV instance until all of them are satisfied is similar to + what is sometimes + called <em>chasing</em> [[DBCONSTRAINTS]] or <em>saturation</em> + [[CHR]]. We call this process <em>normalization</em>. + </p> + + + <h2>Termination</h2> + <p> + It is possible to define sets of formulas such that normalization + does not terminate, that is, such that there is an infinite sequence + of legal application steps. This is obviously something we want to + avoid. There is a great deal of work on termination of the chase in + databases, or of sets of constraint handling rules. This + specification ensures termination of normalization + </p> + <p> + There is an important subtlety that is essential to guarantee + termination. This specification draws a distinction between knowing + that an identifier has type 'entity', 'activity', or 'agent', and having + an explicit entity(id), activity(id), or agent(id) statement in the instance. + For example, focusing on entity, it is a stronger statement to say + that entity(id) holds in the instance, because we can apply + entity-generation-invalidation-inference to infer generation and + invalidationevents for the entity. In contrast, if we only know + that 'entity' in typeOf(id), this does not imply that entity(id) + holds, so we cannot necessarily assume that + </p> + <p> + This distinction (for both entities and activities) is essential to + ensure termination of the inferences. If we strengthened the type + inference constraints to inferences that add new entity statements + to the instance, then we could keep generating new entities and + activities in an unbounded chain into the past (or future). The + design adopted here assumes that instances explicitly declare the + entities and activities that are explicitly declared in the instance + can be inferred to have start and end events, but not those that are + indirectly referenced in other relations. + </p> + + <h2>Checking ordering, typing, and impossibility constraints</h2> + <p> + The ordering, typing, and impossibility constraints are checked + rather than applied. This means that they do not generate new + formulas expressible in PROV, but they do generate basic constraints + that might or might not be consistent with each other. Checking + such constraints follows a saturation strategy similar to that for + </p> + <ol> + <li>For ordering constraints, we check by generating all of the + precedes and strictly-precedes relationships specified by the rules. These can be thought + of as a directed graph whose nodes are terms, and whose edges are + precedes or strictly-precedes relationships. The constraints are + inconsistent if there is a cycle in the graph that includes a + strictly-precedes edge, and consistent otherwise. </li> + <li>For typing constraints... + </li> + <li> + For impossibility constraints... + </li> + </ol> + <p> + A normalized instance that satisfies all of the checked constraints + is called <a>valid</a>. Validity can be, but is not required to be, + checked by normalizing and then checking constraints. Any other + algorithm that provides equivalent behavior (that is, accepts the + same valid instances and rejects the same invalid instances) is allowed. + In particular, the checked constraints and the + applied definitions, inferences and uniqueness constraints do not + interfere with one another, so it is also possible to mix checking + and application. This may be desirable in order to detect + invalidity more quickly. + </p> + + + + <h2>Equivalence and Isomorphism</h2> + <p> Given two normal forms, a natural question is whether they contain + the same information, that is, whether they are equivalent (if so, + then the original instances are also equivalent.) By analogy with + logic, if we consider normalized PROV instances with existential + variables to represent sets of possible situations, then two normal + forms may describe the same situation but differ in inessential + details such as the order of statements or of elemennts of + attribute-value lists. To remedy this, we can easily consider + instances to be equvalent up to reordering of attributes. However, + instances can also be equvalent if they differ only in choice of + names of existential variables. Because of this, the appropriate + notion of equivalence of normal forms is <em>isomorphism</em>. Two + instances <span class="math">I1</span> and <span class="math">I2</span> are isomorphic if there is an invertible + function on existential variables <span class="math">h</span> such that <span class="math">h(I1) = I2</span>. + This is similar to the notion of equivalence used in [[RDF]], where + blank nodes play an analogous role to existential variables. </p> + +<p>Equivalence can be checked by normalizing instances, checking that + both instances are valid, then + testing whether the two normal forms are isomorphic. (It is + technically possible for two invalid normal forms to be isomorphic, + but to be condered equivalent, the two instances must also be + valid.) +As with validity, the algorithm suggested by this specification is + just one of many possible ways to implement equivalence checking; it + is not required that implementations compute normal forms + explicitly, only that their determinations of equivalence match + those obtained by the algorithm in this specification. + </p> </section> @@ -1105,6 +1360,7 @@ </section> + <section id="inferences"> <h2>Definitions and Inferences</h2> @@ -1635,7 +1891,10 @@ <p> <span class='conditional'>IF</span> <span class="name">activity(a,t1,t2,_attrs)</span> <span - class="conditional">THEN</span> there exist <span class="name">_start</span>, <span class="name">_e1</span>, <span class="name">_end</span>, + class="conditional">THEN</span> there exist <span + class="name">_start</span>, <span class="name">_e1</span>, <span + class="name">_a1</span>, <span class="name">_end</span>, <span + class="name">_a2</span>, and <span class="name">_e2</span> such that <span class="name">wasStartedBy(_start; a,_e1,_a1,t1,[])</span> and <span class="name">wasEndedBy(_end; a,_e2,_a2,t2,[])</span>.