Finger Tree: A Functional Value Object

A Finger Tree is a data-structure that supports amortized O(1) additional and removal of elements from either end [1]. It also can support a large number of common sequence operations, including concatenation, very efficiently. Our implementation is based on the Hinze-Paterson structure [2], simplified for use as a Deque. It is possible to implement a Finger Tree in Humus using pure functions, like most traditional implementations. However, the implementation shown here uses an object-oriented approach instead.

This will be the first blog post that doesn’t use actors in the implementation. Since we are working with value objects, there is no mutable state, thus no need for actors. However, Finger Trees do provide a safe immutable value type which can be passed in messages and safely shared among actors, so they are quite useful in an actor context. On another level, as previously discussed, actors can be used to evaluate arbitrarily complex/recursive functions (like those used to implement Finger Trees) without blocking the rest of the system.

Value Objects

An object combines data with methods that operate on that data. Generally, objects have mutable state, which may be changed by methods of the object. A value object is an immutable object. Object methods may access the object state, but do not change it. Object methods are associated with each instance, so the same method name may select a different implementation on a different object. This is known as polymorphism. Humus does not have any special syntax for object methods calls. Instead, we make method dispatch explicit by defining a function that makes a method call on our value-objects.

LET vo_call(self, req) = (
	LET (vo, _) = $self
	IN vo(self, req)
)

Our value-object representation is a tuple with two or more elements, where the first element is a function that dispatches object methods. Our objects are polymorphic, but they are not encapsulated. Their state is not protected from direct access. However, this will suffice for our purposes.

The vo_call function extracts the dispatch function vo from the object self. This function is applied to the object and the original request req. The result of this function call is the value returned.

Fingers

Before we can build a Finger Tree, we need to define the Fingers which represent prefix and suffix segments of the tree. There are four types of fingers, storing one, two, three or four elements respectively. Let’s consider the dispatch function for a one-finger value object.

LET 1F = \(self, req).(
	LET ($1F, a) = $self IN
	CASE req OF
	(#put, x) : (2F, a, x)
	(#push, x) : (2F, x, a)
	END
)

First, the dispatch function ensures that the object passed as self has the correct structure for a one-finger. Specifically, a tuple starting with the dispatch function 1F and holding a data element. If the structure is correct, then the request req is matched with the supported operations, #put and #push. The result of #put is a new two-finger holding the original element a and the new element x. Similarly, #push creates a two-finger, but with the new element before the original.

Fingers can not be empty, so the one-finger does not support any operations which remove elements. The two-finger, on the other hand, supports the full Deque protocol.

LET 2F = \(self, req).(
	LET ($2F, a, b) = $self IN
	CASE req OF
	(#put, x) : (3F, a, b, x)
	(#pop) : (a, (1F, b))
	(#push, x) : (3F, x, a, b)
	(#pull) : (b, (1F, a))
	END
)

The #put and #push operations create a new three-finger, with the new element at the end/beginning respectively. The #pop and #pull operations return the first/last element respectively, and a new one-finger holding the one element remaining.

The three-finger and four-finger object types follow the same pattern. Note that the four-finger does not support operations which add elements, because it is largest finger type.

LET 3F = \(self, req).(
	LET ($3F, a, b, c) = $self IN
	CASE req OF
	(#put, x) : (4F, a, b, c, x)
	(#pop) : (a, (2F, b, c))
	(#push, x) : (4F, x, a, b, c)
	(#pull) : (c, (2F, a, b))
	END
)

LET 4F = \(self, req).(
	LET ($4F, a, b, c, d) = $self IN
	CASE req OF
	(#pop) : (a, (3F, b, c, d))
	(#pull) : (d, (3F, a, b, c))
	END
)

Finger Trees

There are three types of Finger Trees, empty, singleton and full. Empty is, of course, the simplest case.

LET 0T = \(self, req).(
	CASE req OF
	(#put, x) : (ST, x)
	(#push, x) : (ST, x)
	END
)
LET EMPTY = $(0T, NIL)

The empty tree has no data and, since it is immutable, a single EMPTY instance can be shared by all. Figure 1 shows a Finger Tree initialized with a reference to EMPTY.

Figure 1 - Empty Tree

The empty tree, naturally, does not support removal of elements. Addition of an element creates a new singleton tree which holds the element. Figure 2 shows the singleton tree holding the symbol A after a #put.

Figure 2 - Singleton Tree

A singleton tree stores a single element and supports the full Deque protocol.

LET ST = \(self, req).(
	LET ($ST, v) = $self IN
	CASE req OF
	(#put, x) : (FT, (1F, v), (EMPTY), (1F, x))
	(#pop) : (v, (EMPTY))
	(#push, x) : (FT, (1F, x), (EMPTY), (1F, v))
	(#pull) : (v, (EMPTY))
	END
)

Unlike a one-finger object, elements can be removed from a singleton tree, resulting in the empty tree. Also, when an element is added, a full tree is created instead of a two-finger. Figure 3 shows the full tree holding the symbols A and B after a #put on the singleton tree from Figure 2.

Figure 3 - A-B Finger Tree

A full tree consists of a prefix-finger, a suffix-finger and a middle tree. Figure 3 shows a minimal full tree, with a one-finger prefix, a one-finger suffix, and an empty middle tree. The full-tree implementation is a little more complex, so we’ll start by considering only the #put operation, which returns the Finger Tree that results from adding the an element at the end of the sequence.

LET FT = \(self, req).(
	LET ($FT, p, q, r) = $self IN
	CASE req OF
	(#put, x) : (
		IF $r = ($4F, a, b, c, d) (
			LET q' = $vo_call(q, #put, (3F, a, b, c))
			IN (FT, p, q', (2F, d, x))
		) ELSE (
			(FT, p, q, vo_call(r, #put, x))
		)
	)
	...
	END
)

The #put operation has to handle two cases based on whether the suffix-finger r is full or not. If the suffix-finger has less than four elements, then there is room to put another element into the finger. This is done by calling #put on the finger. Figure 4 shows the result of doing this three times, resulting in a full four-finger suffix.

Figure 4 - A-E Finger Tree

When the suffix-finger is full, and we #put another element, we gather the first three elements of the suffix into a three-finger and #put it into the middle tree, and a new two-finger suffix is created from the fourth element and the new element. Since the middle tree is currently empty in this case, it responds to the #put by creating a singleton tree, as shown in Figure 5.

Figure 5 - A-F Finger Tree

After #put-ing two more elements, the suffix-finger is full again, as shown in Figure 6.

Figure 6 - A-H Finger Tree

Figure 7 shows the restructuring that results from #put-ing the ninth element into the tree. Notice that the elements managed at the second-level of depth in the tree are fingers, not just data elements. We saw this first in Figure 6, where a singleton referred to a three-finger value. Now, when the next three-finger propagates into the middle tree, the singleton is replaced by a full tree. However, in trees at this level, the elements are themselves fingers, so we have a one-finger prefix and a one-finger suffix, both referring to three-finger values.

Figure 7 - A-I Finger Tree

Two more elements #put again re-fill the top-level suffix-finger, as shown in Figure 8.

Figure 8 - A-K Finger Tree

Element twelve once again propagates a three-finger into the middle tree, but this time it simply makes a two-finger suffix at the next level in the tree, as shown in Figure 9.

Figure 9 - A-L Finger Tree

It takes twenty elements #put before two levels of suffix-fingers are filled, as shown in Figure 10. Notice how little work is done to continue expanding the sequence held by the tree. Two-thirds of the elements #put are handled directly by the top-level suffix-finger. One out of every three elements #put causes a three-finger to propagate into the middle tree. Each level of the tree manages larger and larger sub-structures, further reducing the work required to keep the tree properly organized.

Figure 10 - A-T Finger Tree

Figure 11 shows how this pattern continues to the third level of depth, with the #put of element twenty-one. When the second-level suffix is full, a three-finger is created as before, but this time the elements are themselves three-fingers. The previously empty second-level middle tree is replaced by a singleton tree holding this compound finger value.

Figure 11 - A-U Finger Tree

We have shown how the Finger Tree structure can be built up by incremental addition of elements. Now let’s consider element removal. Specifically, we will focus on the #pop operation, which returns a tuple consisting of the item previously at the beginning of the sequence, and the new Finger Tree resulting from the removal.

LET FT = \(self, req).(
	LET ($FT, p, q, r) = $self IN
	CASE req OF
	...
	(#pop) : (
		IF $p = ($1F, a) (
			IF $q = ($EMPTY) (
				IF $r = ($1F, x) (
					(a, (ST, x))
				) ELSE (
					LET (x, r') = $vo_call(r, #pop)
					IN (a, (FT, (1F, x), q, r'))
				)
			) ELSE (
				LET (p', q') = $vo_call(q, #pop)
				IN (a, (FT, p', q', r))
			)
		) ELSE (
			LET (x, p') = $vo_call(p, #pop)
			IN (x, (FT, p', q, r))
		)
	)
	...
	END
)

There are four cases to consider. If the tree is minimal (one-finger prefix, empty middle, one-finger suffix), then #pop produces a singleton tree holding the remaining element.

If there is a one-finger prefix, an empty middle, and more than one element in the suffix, then one element is #pop-ed from the suffix and used to create a new one-finger prefix.

If there is a one-finger prefix, and the middle is not empty, then one element is #pop-ed from the middle and becomes the new prefix.

Finally, if there is more than one element in the prefix, then one element is #pop-ed from the prefix and the prefix is updated. This is the least amount of work, and is the case that accounts for two-thirds of the #pop operations.

Figure 12 - B-U Finger Tree after "pop"

Figure 12 shows the result of a #pop operation on the Finger Tree from Figure 11. Notice how only a few internal nodes had to change to accomplish a relatively deep restructuring. As was true with element addition, element removal usually requires very few operations, with restructuring required more and more rarely as you go deeper into the tree.

The #push and #pull operations, which round out the rest of the Deque protocol, are mirror images of #put and #pop. If you’ve understood the code so far, the rest should be straight-forward.

LET FT = \(self, req).(
	LET ($FT, p, q, r) = $self IN
	CASE req OF
	...
	(#push, x) : (
		IF $p = ($4F, a, b, c, d) (
			LET q' = $vo_call(q, #push, (3F, b, c, d))
			IN (FT, (2F, x, a), q', r)
		) ELSE (
			(FT, vo_call(p, #push, x), q, r)
		)
	)
	(#pull) : (
		IF $r = ($1F, a) (
			IF $q = ($EMPTY) (
				IF $p = ($1F, x) (
					(a, (ST, x))
				) ELSE (
					LET (x, p') = $vo_call(p, #pull)
					IN (a, (FT, p', q, (1F, x)))
				)
			) ELSE (
				LET (r', q') = $vo_call(q, #pull)
				IN (a, (FT, p, q', r'))
			)
		) ELSE (
			LET (x, r') = $vo_call(r, #pull)
			IN (x, (FT, p, q, r'))
		)
	)
	END
)

The last three code fragments should be combined into a single FT definition, giving you the dispatch function for a full Finger Tree value-object.

Conclusion

We have shown how to implement a Finger Tree using a transparent representation of functional value-objects. This versatile and efficient immutable data-structure can be safely shared among collaborating actors. Operations on the tree produce a new tree that shares as much structure and data as possible with the old tree, without disturbing the old structure. This allows effortless versioning, since all previous versions remain available as long as there are references to them. Unreferenced versions are automatically reclaimed by garbage-collection.

References

[1]
C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998.
[2]
R. Hinze, R. Paterson. Finger Trees: A Simple General-purpose Data Structure. Journal of Functional Programming, 16(2):197–217, 2006.
This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

8 Comments

  1. Durgesh
    Posted 2011-10-15 at 9:35 am | Permalink

    This article helped me a lot about the concept of Finger Trees…
    Thanx a lot!

  2. Dale Schumacher
    Posted 2011-10-15 at 10:32 am | Permalink

    I’m glad it was helpful to you. Other explanations that I found didn’t build out an example far enough to make the nested type-structure clear, at least to me, so I wanted to take this example further.

  3. Durgesh
    Posted 2011-10-16 at 10:48 pm | Permalink

    Please give some examples of ‘#push’ operation.

  4. Dale Schumacher
    Posted 2011-10-18 at 10:58 am | Permalink

    I’m not sure how clear this will be without pictures, but I’ll give it a try. Consider a finger-tree built up from 8 #push operations. It would look like this:

    (FT, (4F, 8, 7, 6, 5), (ST, (3F, 4, 3, 2)), (1F, 1))

    If I (#push, 9) then the 4-finger prefix overflows, so the prefix becomes a 2-finger, and a 3-finger is pushed into the singleton-tree in the middle. This makes the middle a full finger-tree with a 1-finger prefix and a 1-finger suffix (each containing a 3-finger), like this:

    (FT, (2F, 9, 8), (FT, (1F, (3F, 7, 6, 5)), EMPTY, (1F, (3F, 4, 3, 2))), (1F, 1))

    Basically, the #push operation is a mirror image of #put, adding elements to the beginning of the prefix rather than the end of the suffix.

  5. Durgesh
    Posted 2011-10-19 at 12:29 am | Permalink

    Thanx for giving illustration for ‘#push’ operation…

    For the 4 cases of ‘#pop’ I can’t understand difference between ‘element’ & ‘finger’.
    Please provide some help to #pop ‘D’
    (after #pop-ed A,B, & C)

  6. Durgesh
    Posted 2011-10-19 at 12:33 am | Permalink

    Can u tell me which programming language u’ve used to illustrate these examples…?

  7. Dale Schumacher
    Posted 2011-10-19 at 8:04 am | Permalink

    A finger holds between 1 and 4 elements. At the top level elements are just the data you provide. At the second level, the elements are fingers of elements. At the third level, they are fingers of fingers of elements, and so on.

    If we start with the tree from Figure 12 and pop “B” and “C”, we have:

    (FT, (1F, #D), (FT, (3F, (3F, #E, #F, #G), (3F, #H, #I, #J), (3F, #K, #L, #M)), EMPTY, (2F, (3F, #N, #O, #P), (3F, #Q, #R, #S))), (2F, #T, #U))

    When we #pop “D”, we are in case three, where there is a one-finger prefix, and the middle is not empty. So one element is #pop-ed from the second level and becomes the new top-level prefix. Since, at the second level, “elements” are “fingers”, the element #pop-ed is the 3-finger (3F, #E, #F, #G). The resulting finger-tree is:

    (FT, (3F, #E, #F, #G), (FT, (2F, (3F, #H, #I, #J), (3F, #K, #L, #M)), EMPTY, (2F, (3F, #N, #O, #P), (3F, #Q, #R, #S))), (2F, #T, #U))

    The implementation language (mentioned in the first paragraph) is Humus.

  8. Durgesh
    Posted 2011-10-19 at 9:54 am | Permalink

    Thank u Sir!
    It’s your kind help to me to illustrate these concepts.
    I’m Assistant Professor in an Engineering College in Varanasi (India).
    I want to know about u, if u don’t mind…
    Once again thanx a lot !!!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*