Skip to content

Pattern Matching Algorithm

dwightguth edited this page Nov 29, 2018 · 3 revisions

The LLVM Backend pattern matching algorithm is based on the algorithm from Maranget's paper Compiling Pattern Matching to Good Decision Trees.

However, it has been extended with the follow-up concepts:

  • Function and Pattern decision tree nodes
  • List/Map/Set patterns
  • Alias Patterns
  • Matching on string, float, boolean, fixed precision and arbitrary precision integer literals
  • Variable bindings
  • Matching modulo sort injections

Here we briefly explain at a high level how the algorithm deals with each of these concepts:

Function nodes

A Function decision tree node is fairly straightforward: it takes zero or more occurrences as arguments, and creates a new occurrence by calling the specified function on the terms represented by those occurrences.

Pattern Nodes

Here we introduce a new concept, called a "bound" pattern. A bound pattern is a pattern, but with all its variables bound to specific occurrences. A pattern decision tree node takes a bound pattern as argument and canstructs a new term representing the value of that pattern substituted with the current value of the variables represented by those occurrences. You can think of it like an expression, but it has been canonicalized into a format that is independent of which row in the pattern matching matrix it is referred to in.

List patterns

In K, the basic structure of a list pattern is zero or more patterns matching elements at the front of the list, followed by zero or one variable of the sort of the list, followed by zero or more patterns matching elements at the end of the list.

Below we define the values of the functions in the paper for list patterns:

  • Signature(List) = Union_i=0..infinity{List(I)}
  • arity(List(i)) = i
  • H(L(hd . tl)) = {List(|hd| + |tl|)}
  • H(L(hd _ tl)) = Union_i=0..(|hd| + |tl|){List(i)}
  • S(List(c), P -> A): 
    p_1j            Row j
    L(hd . tl)      if c = |hd| + |tl| then hd ++ tl ++ p_2j..p_nj else no row
    L(hd _ tl)      if c >= |hd| + |tl| then hd ++ (++_i=1..(c-|hd|-|tl|){_}) ++ tl ++ p_2j..p_nj else no row
    
    Furthermore, o1 . 1 .. o1 . c represent the c elements of the list
    
  • D(P):
    p_1j            Row j
    L(hd . tl)      no row
    L(hd _ tl)      hd ++ (++_i=1..(LD(p_1)-|hd|-|tl|){_}) ++ tl ++ p_2j..p_nj
    
    Where LD(col) = List(LH(col) + LT(col))
    LH(col) = max(|col_L_hd|) 
    LT(col) = max(|col_L_tl|))
    (ie the length of the longest list prefix plus the longest list suffix)
    
  • Finally, we make two amendments to the definition of CC when p_1 contains list patterns:
    1. CC(o, P -> A) = Function(o' = length(o_1))(Switch_o'(L)) where o' is a newly created unique occurrence.
    2. A_D = CC(o1 . 1 .. o1 . LD(p_1), D(P->A)) where o1 . 1 .. o1 . LH(p_1) are the first LH(p_1) elements of the list, and o1 . LH(p_1)+1 .. o1 . LD(p_1) are the last LT(p_1) elements of the list
  • One final note: the paper does not deal with variable bindings, but if a list pattern L(hd P tl) exists, P is bound to o1[|hd|..|o1|-|hd|-|tl|]

Map patterns

To be continued

Variables

A bound variable is like a wildcard except that it inhroduces a variable binding, ie, an entry in a multimap B contained within the clause for the row mapping the name of the variable to its occurrence. We also annotate each row with info on the variables used in the rhs of the row V_r, whether the row has a side condition V_s, and if it does, what variables it uses V_sv.

Then we modify the right hand side of the equation in the paper describing leaf nodes as follows:

  • First we compute the list of all variable names bound to more than one occurrence BN. Then we define
    CC2 = CC(o, P_12..P_mn -> A_2..A_m)
    L_11 = Leaf(a1, B[V_r][0])
    L_12 = Function(o11' = (BN_11 = BN_12))(Switch_o11'(false:CC2;true:L_11))
    ...
    L_1i = Function(o1i' = (BN_1(i-1) = BN_1i))(Switch_o1i'(false:CC2;true:L_1(i-1)))
    ...
    L_1|BN_1| = ...
    L_21 = L_1|BN_1|
    L_22 = Function(o21' = (BN_21 = BN_22))(Switch_o21'(false:CC2;true:L_21))
    ...
    L_|BN||BN_(|BN|)| = L_penultimate = ...
    L = if V_s exists then Function(o_final' = V_s(B[V_sv][0]))(Switch_(o_final')(0:CC2, 1:L_penultimate)) else L_penultimate
    
    (ie, we check that all the occurrences of the variable are equal, and the side condition evaluates to true, and if anything fails, we proceed trying to match the remainder of the matnix, otherwise we pass the values of the variables to the rhs.)
Clone this wiki locally