Evaluating Expressions, part 4 – Pattern Equations

In part 4 of our series implementing programming language constructs with actors, we extend our pattern matching behaviors to support pattern equations. These are true equations that express relationships between patterns. They form the basis for introducing LET and IF expressions.

The grammar for our extended language is shown below. Changes from part 3 are highlighted.

expr  ::= <term> ',' <expr>
        | <term>;
term  ::= <const>
        | <ident>
        | <term> '(' <expr>? ')'
        | 'LET' <eqtn> 'IN' <expr>
        | 'IF' <eqtn> <expr> {'ELIF' <eqtn> <expr>}* {'ELSE' <expr>}?
        | 'CASE' <expr> 'OF' {<ptrn> ':' <expr>}+ 'END'
        | '(' <expr>? ')';
eqtn ::= <ptrn> '=' <ptrn>
        | <ident> '(' <ptrn>? ')' '=' <expr>;
ptrn  ::= <pterm> ',' <ptrn>
        | <pterm>;
pterm ::= '_'
        | <const>
        | <ident>
        | '

Equality

The notion of equality can be a slippery one [1]. In many computer languages, equality comes in several “flavors” (identity, equivalences of various kinds, etc.) Equality is also confused with assignment, which interferes with referential transparency.

In our language we would like to represent true equations. These equations establish equality. When used in a conditional context, they do not evaluate to a Boolean TRUE or FALSE, but rather make a conditional assertion of equality.

Like true mathematical equations, our equations should be symmetric. In general, the two sides of an equation should be interchangeable without changing the meaning of the equation. Consider the equation a=42. It seems reasonable that 42=a should be equivalent. However, the usual lambda calculus trick is to define LET a=42 IN (a, a) as equivalent to (\a.(a, a))(42). If we consider the reversed equation LET 42=a IN (a, a) we get the nonsensical (\42.(a, a))(a).

The problem is that the left side of the equation is treated as a pattern while the right side is treated as a value, as discussed in part 2 of this series. If we want symmetric equations, we are left with two options. Either expect values on both sides, or expect patterns on both sides. If we expect values on both sides, our equations would test equality (presumably with a Boolean result), since variables would be replaced by their values. Instead we expect patterns on both sides, so our equations assert equality, possibly binding variables in the process.

LET and IF

A LET expression establishes an equality (usually binding one or more variables in the process) and then evaluates the IN expression in the environment extended by the new bindings.

# first draft
LET let_expr_beh(eqtn, expr) = \(cust, #eval, env).[
	SEND (k_env, #unify, env) TO eqtn
	CREATE k_env WITH \env'.[
		CASE env' OF
		? : [ SEND ? TO cust ]
		_ : [ SEND (cust, #eval, env') TO expr ]
		END
	]
]

When let_expr_beh receives an #eval request, it sends a #unify message to the equation eqtn. The customer in that message k_env receives the extended environment resulting from the unification of the equation. If the extended environment env’ is ? (the undefined value) then the undefined result value is sent to the customer cust. Otherwise, an #eval message (with the extended environment) is sent to the expression expr. The original customer cust eventually receives the result of evaluating the expression in the environment extended by unifying the equation.

An IF expression makes a conditional attempt to establish an equality. If the equation can be successfully unified, the expression is evaluated in the extended environment (just like LET). If unification fails, then the ELSE expression is evaluated in the original environment.

LET if_expr_beh(eqtn, expr, else) = \(cust, #eval, env).[
	SEND (k_env, #unify, env) TO eqtn
	CREATE k_env WITH \env'.[
		CASE env' OF
		? : [ SEND (cust, #eval, env) TO else ]
		_ : [ SEND (cust, #eval, env') TO expr ]
		END
	]
]

The if_expr_beh is nearly identical to let_expr_beh. The only difference is the handling of the ELSE clause. If the equation eqtn can’t be unified, an #eval message is sent to the alternative expression else, but the environment used is the original environment env. The alternative expression can be another if_expr_beh (representing an ELIF clause), or it can be any arbitrary expression (representing an ELSE clause).

Given the similarity between if_expr_beh and let_expr_beh we can refactor to remove duplication by observing that the a LET is equivalent to an IF where the ELSE clause is ? the undefined value.

# refactored
CREATE undefined WITH \(cust, req).[ SEND ? TO cust ]
LET let_expr_beh(eqtn, expr) = \(cust, #eval, env).[
	BECOME if_expr_beh(eqtn, expr, undefined)
	SEND (cust, #eval, env) TO SELF
]

The singleton actor undefined responds to all requests by sending ? the undefined value to the customer cust. With that definition in place, the let_expr_beh is refactored to simply transform itself into the equivalent if_expr_beh and reprocess the original message.

Unification

Equation expressions respond to #unify messages by sending the customer one of two things. Either an environment extended by any bindings introduced while matching patterns. Or ? the undefined value, if unification fails. Each pattern contains private data needed to perform the unification. We can’t reach inside the patterns, so we delegate the process to the patterns themselves.

LET eqtn_beh(left_ptrn, right_ptrn) = \(cust, #unify, env).[
	SEND (cust, #eq, right_ptrn, env) TO left_ptrn
]

When eqtn_beh receives a #unify message it sends an #eq message to the left pattern, including the right pattern as part of the message.

Pattern #eq

Unification of equations requires extending the protocol of messages understood by patterns to include an #eq message. For constant patterns, this extension is easy to handle. A constant pattern simply matches its value against the other pattern.

LET const_ptrn_beh(value) = \(cust, req).[
	CASE req OF
	(#match, $value, env) : [ SEND env TO cust ]
	(#eq, right, env) : [
		SEND (cust, #match, value, env) TO right
	]
	_ : [ SEND ? TO cust ]
	END
]

When const_ptrn_beh receives an #eq message is sends its value in a #match message to the right pattern.

A value pattern behaves in a similar way. It first evaluates its expression in the current environment, producing a value. That value is then handled as if it was the value of a constant pattern.

LET value_ptrn_beh(expr) = \(cust, cmd, arg, env).[
	SEND (k_val, #eval, env) TO expr
	CREATE k_val WITH \value.[
		SEND (cust, cmd, arg, env)
		TO NEW const_ptrn_beh(value)
	]
]

When value_ptrn_beh receives a command with an argument and an environment (both #match,value,env and #eq,right,env qualify), it sends an #eval message to the expression expr. The customer in that message k_val receives the value of the expression and creates a new constant pattern actor with the value. The original message is then sent to the new actor.

An identifier pattern has a little more work to do. An identifier usually introduces a binding into the environment, but it needs a value to bind to. So we need to extend the pattern protocol further to account for bindings.

LET ident_ptrn_beh(ident) = \(cust, req).[
	CASE req OF
	(#match, value, env) : [
		CREATE env' WITH env_beh(ident, value, env)
		SEND env' TO cust
	]
	(#eq, right, env) : [
		SEND (cust, #bind, SELF, env) TO right
	]
	_ : [ SEND ? TO cust ]
	END
]

When ident_ptrn_beh receives an #eq message it sends a #bind message to the right pattern.

The wildcard pattern behaves like an identifier pattern at this point in the process.

CREATE any_ptrn WITH \(cust, req).[
	CASE req OF
	(#match, _, env) : [ SEND env TO cust ]
	(#eq, right, env) : [
		SEND (cust, #bind, SELF, env) TO right
	]
	_ : [ SEND ? TO cust ]
	END
]

Pattern #bind

A #bind message tells a pattern that an identifier is available to bind, if the pattern has a value to bind to. A constant pattern has a value, so it can simply turn around and match its value to the identifier.

LET const_ptrn_beh(value) = \(cust, req).[
	CASE req OF
	(#match, $value, env) : [ SEND env TO cust ]
	(#eq, right, env) : [
		SEND (cust, #match, value, env) TO right
	]
	(#bind, left, env) : [
		SEND (cust, #match, value, env) TO left
	]
	_ : [ SEND ? TO cust ]
	END
]

When const_ptrn_beh receives a #bind message it sends its value in a #match message to the left (identifier) pattern. As we’ve already seen, the identifier pattern handles #match messages by extending the environment with a new binding, which is exactly what we want.

Value patterns, as shown above, create constant patterns to do the work, (and #bind,left,env matches the expected protocol) so there is nothing new to add for #bind.

Identifier patterns and the wildcard pattern are an interesting case. If there are identifiers on both sides of an equation, it might be reasonable to interpret that as aliasing. However, to keep things simple we will discourage that practice by making the result “undefined”. Again, this requires no further changes, since any “unknown” request results in sending ? the undefined value to the customer.

Pair Pattern

The previous sections show how to handle all of the patterns from part 2. Now we will extend the pair pattern from part 3. The first step is to add support for the #eq message. We use a strategy similar to the identifier pattern, extending the pattern protocol to account for pairs.

LET pair_ptrn_beh(head_ptrn, tail_ptrn) = \(cust, req).[
	CASE req OF
	(#match, (h, t), env) : [
		SEND (k_head, #match, h, env) TO head_ptrn
		CREATE k_head WITH \env'.[
			CASE env' OF
			? : [ SEND ? TO cust ]
			_ : [ SEND (cust, #match, t, env') TO tail_ptrn ]
			END
		]
	]
	(#eq, right, env) : [
		SEND (cust, #pair, SELF, env) TO right
	]
	_ : [ SEND ? TO cust ]
	END
]

When pair_ptrn_beh receives an #eq message it sends a #pair message to the right pattern.

Pattern #pair

A #pair message tells a pattern that the other pattern is a pair pattern. The receiving pattern must decide how to handle matching the pair pattern. Once again, a constant pattern can simply turn the tables and provide a value back to the pair pattern.

LET const_ptrn_beh(value) = \(cust, req).[
	CASE req OF
	(#match, $value, env) : [ SEND env TO cust ]
	(#eq, right, env) : [
		SEND (cust, #match, value, env) TO right
	]
	(#bind, left, env) : [
		SEND (cust, #match, value, env) TO left
	]
	(#pair, pair, env) : [
		SEND (cust, #match, value, env) TO pair
	]
	_ : [ SEND ? TO cust ]
	END
]

An identifier or wildcard pattern provides an opportunity for binding, so it also delegates back to the pair pattern, but with a #bind message.

LET ident_ptrn_beh(ident) = \(cust, req).[
	CASE req OF
	(#match, value, env) : [
		CREATE env' WITH env_beh(ident, value, env)
		SEND env' TO cust
	]
	(#eq, right, env) : [
		SEND (cust, #bind, SELF, env) TO right
	]
	(#pair, pair, env) : [
		SEND (cust, #bind, SELF, env) TO pair
	]
	_ : [ SEND ? TO cust ]
	END
]
CREATE any_ptrn WITH \(cust, req).[
	CASE req OF
	(#match, _, env) : [ SEND env TO cust ]
	(#eq, right, env) : [
		SEND (cust, #bind, SELF, env) TO right
	]
	(#pair, pair, env) : [
		SEND (cust, #bind, SELF, env) TO pair
	]
	_ : [ SEND ? TO cust ]
	END
]

This brings us right back to the pair pattern, to handle the #bind message. The pair pattern must produce a value to bind to the identifier, so it asks each sub-pattern for a value and combines them into a pair.

LET pair_ptrn_beh(head_ptrn, tail_ptrn) = \(cust, req).[
	CASE req OF
	(#match, (h, t), env) : [
		SEND (k_head, #match, h, env) TO head_ptrn
		CREATE k_head WITH \env'.[
			CASE env' OF
			? : [ SEND ? TO cust ]
			_ : [ SEND (cust, #match, t, env') TO tail_ptrn ]
			END
		]
	]
	(#eq, right, env) : [
		SEND (cust, #pair, SELF, env) TO right
	]
	(#bind, left, env) : [
		CREATE call WITH call_pair_beh(head_ptrn, tail_ptrn)
		SEND (k_pair, #value, NIL, env) TO call
		CREATE k_pair WITH pair_bind_beh(cust, left, env)
	]
	_ : [ SEND ? TO cust ]
	END
]
LET pair_bind_beh(cust, left, env) = \msg.[
	CASE msg OF
	(?, _) : [ SEND ? TO cust ]
	(_, ?) : [ SEND ? TO cust ]
	(h, t) : [ SEND (cust, #match, (h, t), env) TO left ]
	_ : [ SEND ? TO cust ]
	END
]

We will re-use the “call” behavior from part 3 to send a #value request to each sub-pattern. The resulting pair of values is received by a dynamically created actor k_pair, whose behavior is defined by code_bind_beh. If either component of the pair-value is undefined, that means one of the sub-patterns could not produce a value, so the result is also undefined. If both sub-patterns produced values, the pair-value is sent to the left pattern to complete the binding (by #matching the pair-value).

Pattern #value

A #value message requests that a pattern produce a value. Only a constant pattern can produce a value. But, remember that a value pattern will BECOME a constant pattern with the value of its expression. So a value pattern produces a value indirectly.

LET const_ptrn_beh(value) = \(cust, req).[
	CASE req OF
	(#match, $value, env) : [ SEND env TO cust ]
	(#eq, right, env) : [
		SEND (cust, #match, value, env) TO right
	]
	(#bind, left, env) : [
		SEND (cust, #match, value, env) TO left
	]
	(#pair, pair, env) : [
		SEND (cust, #match, value, env) TO pair
	]
	(#value, _, env) : [
		SEND value TO cust
	]
	_ : [ SEND ? TO cust ]
	END
]

When const_ptrn_beh receives a #value message, it simply sends its value to the customer cust. Note that the middle argument in the #value message ignored. It is a place-holder used to keep the message structure consistent with the general form expected by value_ptrn_beh.

Pattern #both

We must deal with one final case. When there is a pair pattern on both sides of an equation, we want to recursively match the sub-patterns of each pair pattern, combining any bindings created by the sub-pattern matches. Once again, this requires extending the protocol of our pattern behaviors. This time we introduce a #both message, which indicates that the other pattern is a pair, and provides access directly to the sub-patterns.

LET pair_ptrn_beh(head_ptrn, tail_ptrn) = \(cust, req).[
	CASE req OF
	(#match, (h, t), env) : [
		SEND (k_head, #match, h, env) TO head_ptrn
		CREATE k_head WITH \env'.[
			CASE env' OF
			? : [ SEND ? TO cust ]
			_ : [ SEND (cust, #match, t, env') TO tail_ptrn ]
			END
		]
	]
	(#eq, right, env) : [
		SEND (cust, #pair, SELF, env) TO right
	]
	(#bind, left, env) : [
		CREATE call WITH call_pair_beh(head_ptrn, tail_ptrn)
		SEND (k_pair, #value, NIL, env) TO call
		CREATE k_pair WITH pair_bind_beh(cust, left, env)
	]
	(#pair, pair, env) : [
		SEND (cust, #both, (head_ptrn, tail_ptrn), env)
		TO pair
	]
	(#both, (h_ptrn, t_ptrn), env) : [
		SEND (k_head, #eq, h_ptrn, env) TO head_ptrn
		CREATE k_head WITH \env'.[
			CASE env' OF
			? : [ SEND ? TO cust ]
			_ : [
				SEND (cust, #eq, t_ptrn, env')
				TO tail_ptrn
			]
			END
		]
	]
	_ : [ SEND ? TO cust ]
	END
]

When pair_ptrn_beh receives a #pair message, it sends a #both message to the other pair pattern. The pair of patterns held by this pattern is sent as the argument.

When pair_ptrn_beh receives a #both message, it sends a new #eq message to unify head_ptrn with h_ptrn, the head of the other pattern. The customer of this message k_head receives the extended environment env’. If the extended environment is undefined, the head match fails, to undefined is send to the customer. Otherwise, an #eq message is sent to unify tail_ptrn with t_ptrn, the tail of the other pattern. The further extended environment is sent to the original customer cust.

As you can see, handling pair patterns add significant complexity to the matching algorithm. The combinations of cases greatly increases, leading to several extensions to the protocol of patterns. However, it should be noted that pairs allow the construction of arbitrarily complex structures. All of the other patterns are simply leaves on the tree. Pairs are the branches that give it shape, so some additional complexity is to be expected.

Application Equation

So far, we have considered the unification of pattern equations. Our extended grammar adds one more equation form, the application equation. We will often use this kind of equation to define function abstractions. Consider the equation in this LET expression:

LET dec(x) = sub(x, 1) IN ...

This equation asserts an equality between two applications. It says that the application dec(x) is equal to the application sub(x, 1). Normally when we see this kind of equation, we have some context that tells us sub is already known, thus dec is the thing being defined. Without this contextual knowledge, we are back in the variable aliasing situation we explicitly avoided before. As a practical compromise, we will adopt the convention of putting the application being defined on the left side of the equation and the defining expression on the right. This breaks the symmetry of being able to reverse the two sides of an equation, but it gives us a way to resolve the ambiguity in a sensible way.

We implement this kind of equation by a source-to-source transformation, rewriting the equation as follows:

<ident> '(' <ptrn> ')' '=' <expr>

is rewritten as

<ident> '=' '\' <ptrn> '.' <expr>

This creates a constant pattern on the right side of the equation, since a lambda abstraction is a constant. After the transformation we have a regular pattern equation which is handled by the behaviors we have already seen. In the case of our example equation, the rewritten code is:

LET dec = \x.(sub(x, 1)) IN ...

Summary

We have explored the implementation of pattern equations using actors. These are true equations which establish, rather than test, equality. These equations form the basis for IF and LET expressions, representing conditional and unconditional assertions of equality. Unification of equations required several extensions to the protocol of pattern-matching actors. The resulting message flow is an excellent example of distributing responsibilities among a cooperating group of actors. Each pattern performs matching operations based on it own private data. The message contents and the state of dynamically-created continuations accumulate the results. Finally, we added a bit of syntactic sugar in the form of a source-to-source transform rewriting equations that define applications.

This completes our pure functional expression language. We’ve taken a few non-traditional branches in exploring this solution space. We used pattern matching to generalize parameter substitution in part 2. We used pairs (instead of Currying) to generalize multi-valued arguments in part 3. And we used pattern equations to generalize conditional definitions in part 4. In part 5 we will explore the implications of supporting recursive references.

References

[1]
H. Baker. Equal Rights for Functional Objects or, The More Things Change, The More They Are the Same. http://home.pipeline.com/~hbaker1/ObjectIdentity.html, 1992.


Tags: , , , , , , , , , ,
This entry was posted on Sunday, October 10th, 2010 at 10:29 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 Responses to “Evaluating Expressions, part 4 – Pattern Equations”

  1. Evaluating Expressions, part 2 – Conditional Special Form

    […] building block introducing pattern matching.  We will have more to say about pattern matching in part 4.  This simple case evaluates an expression and then attempts to match the resulting value against […]

  2. Robert Wilson

    thanks for the post

  3. livelybrowsers

    Thanks for good stuff

  4. roclafamilia

    Helpful blog, bookmarked the website with hopes to read more!

  5. Evaluating Expressions, part 5 – Recursion

    […] part 4 we introduced the LET/IN expression. This allowed us to extend our lexical environment with new […]

  6. Evaluating Expressions, part 3 – Pairs and Parallelism

    […] part 4 we will extend our pattern matching behaviors to support pattern equations. These are true […]

Leave a Reply

Your comment