## 20.7. 20.7 Branching queries

With the help of regular queries we can select the nodes of a graph that are reached from the root by a path the labels of which fit to a given regular pattern. A natural generalization is to add more conditions that the nodes of the path leading to the node have to satisfy. For example, we might require that the node can be reached by a label sequence from a node with a given label. Or, that a node with a given label can be reached from another node by a path with a given label sequence. We can take more of these conditions, or use their negation or composition. To check whether the required conditions hold, we have to step not only forward according to the direction of the edges, but sometimes also backward. In the following, we will give the description of the language of branching queries, and introduce the forward-backward indexes. The forward-backward index which is safe and exact with respect to all branching queries is called FB-index. Just like the 1-index, this is also usually too large, therefore we often use an FB()-index instead, which is exact if the length of successive forward steps is at most , the length of successive backward steps is at most , and the depth of the composition of conditions is at most . In practice, values , and are usually small. In case of queries for which the value of one of these parameters is greater than the corresponding value of the index, a checking step must be added, that is, we evaluate the query on the index, and only keep those nodes of the resulted index nodes that satisfy the query.

If there is a directed edge from node to node , then this can be denoted by or . If node can be reached from node by a directed path, then we can denote that by or . (Until now we used . instead of /, so // represents the regular expression _* or * in short.)

From now on, a label sequence is a sequence in which separators are the forward signs (/, //) and the backward signs (, ). A sequence of nodes fit to a label sequence if the relation of successive nodes is determined by the corresponding separator, and the labels of the nodes come according to the label sequence.

There are only forward signs in forward label sequences, and only backward signs in backward label sequences.

Branching queries are defined by the following grammar.

In branching queries, a condition on a node with a given label holds if there exists a label sequence that fits to the condition. For example, the root// / [ // and not / ]/ query seeks nodes with label such that the node can be reached from the root in such a way that the labels of the last two nodes are and , furthermore, there exists a parent of the node with label whose label is , and among the descendants of the node with label there is one with label , but it has no children with label that has a parent with label .

If we omit all conditions written between signs [ ] from a branching query, then we get the main query corresponding to the branching query. In our previous example, this is the query root// / / . The main query always corresponds to a forward label sequence.

A directed graph can be assigned naturally to branching queries. Assign nodes with the same label to the label sequence of the query, in case of separators / and , connect the successive nodes with a directed edge according to the separator, and in case of separators // and , draw the directed edge and label it with label // or . Finally, the logic connectives are assigned to the starting edge of the corresponding condition as a label. Thus, it might happen that an edge has two labels, for example // and “and”. Note that the graph obtained cannot contain a directed cycle because of the definition of the grammar.

A simple degree of complexity of the query can be defined using the tree obtained. Assign to the nodes of the main query and to the nodes from which there is a directed path to a node of the main query. Then assign to the nodes that can be reached from the nodes with sign on a directed path and have no sign yet. Assign to the nodes from which a node with sign can be reached and have no sign yet. Assign to the nodes that can be reached from nodes with sign and have no sign yet, etc. Assign to the nodes that can be reached from nodes with sign and have no sign yet, then assign to the nodes from which nodes with sign can be reached and have no sign yet. The value of the greatest sign in the query is called the depth of the tree. The depth of the tree shows how many times the direction changes during the evaluation of the query, that is, we have to seek children or parents according to the direction of the edges. The same query could have been given in different ways by composing the conditions differently, but it can be proved that the value defined above does not depend on that, that is why the complexity of a query was not defined as the number of conditions composed.

The 1-index assigns the nodes into classes according to incoming paths, using bisimulations. The concept of stability used for computations was descendant-stability. A set of the nodes of a graph is descendant-stable with respect to a set of nodes if or , where is the set of nodes that can be reached by edges from . A partition is stable if any two elements of the partition are descendant-stable with respect to each other. The 1-index is the coarsest descendant-stable partition that assigns nodes with the same label to same classes, which can be computed using the `PT` -algorithm. In case of branching queries, we also have to go backwards on directed edges, so we will need the concept of ancestor-stability as well. A set of nodes of a graph is ancestor-stable with respect to a set of the nodes if or , where denotes the nodes from which a node of can be reached.

Definition 20.41 The FB-index is the coarsest refinement of the basic partition that is ancestor-stable and descendant-stable.

Note that if the direction of the edges of the graph is reversed, then an ancestor-stable partition becomes a descendant-stable partition and vice versa, therefore the `PT` -algorithm and its improvements can be used to compute the coarsest ancestor-stable partition. We will use this in the following algorithm. We start with classes of nodes with the same label, compute the 1-index corresponding to this partition, then reverse the direction of the edges, and refine this by computing the 1-index corresponding to this. When the algorithm stops, we get a refinement of the initial partition that is ancestor-stable and descendant-stable at the same time. This way we obtain the coarsest such partition. The proof of this is left to the Reader.

`FB-Index-Creator(` `)`

```  1
`WHILE`

changes   3
`DO`

Compute the 1-index.   4
Reverse the direction               of edges, and compute the 1-index.  5
`RETURN`

```

The following corollary follows simply from the two stabilities.

Corollary 20.42 The FB-index is safe and exact with respect to branching queries.

The complexity of the algorithm can be computed from the complexity of the `PT` -algorithm. Since is always the refinement of the previous partition, in the worst case refinement is done one by one, that is, we always take one element of a class and create a new class consisting of that element. So in the worst case, the cycle is repeated times. Therefore, the cost of the algorithm is at most .

The partition gained by executing the cycle only once is called the F+B-index, the partition obtained by repeating the cycle twice is the F+B+F+B-index, etc.

The following proposition can be proved easily.

Claim 20.43 The F+B+F+B+ +F+B-index, where F+B appears times, is safe and exact with respect to the branching queries of depth at most .

Nodes of the same class according to the FB-index cannot be distinguished by branching queries. This restriction is usually too strong, therefore the size of the FB-index is usually much smaller than the size of the original graph. Very long branching queries are seldom used in practice, so we only require local equivalence, similarly to the A()-index, but now we will describe it with two parameters depending on what we want to restrict: the length of the directed paths or the length of the paths with reversed direction. We can also restrict the depth of the query. We can introduce the FB()-index, with which such restricted branching queries can be evaluated exactly. We can also evaluate branching queries that do not satisfy the restrictions, but then the result must be checked.

`FB(` `)-Index-Creator(` `)`

```  1
`FOR`

`TO`

3
`DO`

`Naive-Approximation(`

`)`

Compute the A()-index.   4

`Naive-Approximation(`

`)`

Reverse the direction                     of the edges, and compute the A()-index.  5
`RETURN`

```

The cost of the algorithm, based on the computation cost of the A()-index, is at most , which is much better than the computation cost of the FB-index, and the index graph obtained is also usually much smaller.

The following proposition obviously holds for the index obtained.

Claim 20.44 The FB()-index is safe and exact for the branching queries in which the length of forward-sequences is at most , the length of backward-sequences is at most , and the depth of the tree corresponding to the query is at most .

As a special case we get that the FB()-index is the FB-index, the FB()-index is the F+B+ +F+B-index, where F+B appears times, the FB()-index is the 1-index, and the FB()-index is the A()-index.

Exercises

20.7-1 Prove that the algorithm `FB-Index-Creator` produces the coarsest ancestor-stable and descendant-stable refinement of the basic partition.

20.7-2 Prove Proposition 20.44.