Algebraic datatypes in Java

Being in bed with a little fever, I thought it might be a good time for a little easy work on functional programming in Java.

Haskell envy

One of the things functional programmers often use is algebraic datatypes. For instance, in Haskell you would define a binary tree as such:

type tree = Empty
          | Leaf of int
          | Node of tree * tree
  

This says that a tree is either the Empty tree, or a Leaf that carries an integer value, or a Node that carries two subtrees.

The next thing is to define functions, such as tree height; the nice thing is that the functions are then defined by cases. It’s kind of fun:

height :: Tree -> Int
height Empty = 0
height (Leaf n) = 1
height (Node l r) = 1 + max (height l) (height r)    
  

This says that an Empty tree has height 0, a Leaf has height 1, and a Node has height that is the maximum height of its subtrees, plus 1. Now we could have fun defining all sorts of other functions, like the contour (the list of the integers of all leafs), search for a particular value, addition of elements, and so on.

Additionally, when working in languages like Haskell, the compiler makes sure that when we define a function by cases, the program will not compile unless we remembered to define all cases.

This sort of data structures have all sort of applications; an obvious one is representing syntax trees, that is the output of a parser. Another one is representing the syntax rules themselves.

We can do it!

What’s to do if you have to work in a simpler language like Java? Do we have to give up the fun of algebraic data types? Of course not! It’s possible to define such structures in Java, even though it will take one or two orders of magnitude more lines of code.

Step zero: define the structure

public abstract class Tree {}

public class EmptyTree extends Tree {}

public class Leaf extends Tree {
    public Leaf(int value) {
    }
}

public class Node extends Tree {
    public Node(Tree left, Tree right) {
    }
}    

Now we can create our trees with the following deliciously clumsy syntax:

Tree tree = new Node(
    new Node(new Leaf(0), new EmptyTree()), 
    new EmptyTree());    

We could make it slightly less verbose with static functions.

Step one: define height

The easy way to add a function is to define it abstractly in Tree. The compiler then forces us to define it in all Tree subclasses (in all cases). First a test:

@Test
public void heightOfSimpleTree() throws Exception {     
    Tree tree = new Node(
        new Node(new Leaf(0), new EmptyTree()),
        new EmptyTree());
    assertEquals(3, tree.height());
}

Then the implementation:

public abstract class Tree {
    public abstract int height();
}


public class EmptyTree extends Tree {
    public int height() {
        return 0;
    }
}    


public class Leaf extends Tree {
    public Leaf(int value) {
    }

    public int height() {
        return 1;
    }
}


public class Node extends Tree {
    private final Tree left;
    private final Tree right;

    public Node(Tree left, Tree right) {
        this.left = left;
        this.right = right;
    }

    public int height() {
        return 1+ Math.max(left.height(), right.height());
    }
}

(I told you this was going to be verbose. You were warned :-)

Step two: respect the OCP, son!

Now we could go on defining all kinds of functions on trees this way; but this would mean that for every new function, we need to change all of our classes. That won’t do; we don’t want to keep changing and adding to the same old files. We want to close the Tree classes for modification, but keep them open for extension; this is the open/closed principle. What can we do?

A pattern to the rescue. Have you ever wondered where it’s reasonable to use Visitor? Well, this is it. As usual, we start with a test:

@Test
public void heightWithVisitor() throws Exception {
    Tree tree = new Node(
        new Node(new Leaf(0), new EmptyTree()),
        new EmptyTree());
    Visitor height = new Height();
    assertEquals(3, tree.accept(height));
}

The implementation of visitor can be:

public abstract class Tree {
    public abstract int accept(Visitor visitor);
}

public class EmptyTree extends Tree {
    public int accept(Visitor visitor) {
        return visitor.visitEmptyTree();
    }
}

public class Leaf extends Tree {
    private final int value;

    public Leaf(int value) {
        this.value = value;
    }

    public int accept(Visitor visitor) {
        return visitor.visitLeaf(value);
    }
}

public class Node extends Tree {
    private final Tree left;
    private final Tree right;

    public Node(Tree left, Tree right) {
        this.left = left;
        this.right = right;
    }

    public int accept(Visitor visitor) {
        return visitor.visitNode(left, right);
    }
}

Note that each tree subcase provides the visitor with all the information they contain. This way we avoid getters, which is nice. The definition of Height is very similar to the Haskell example:

public class Height implements Visitor {    
    public int visitEmptyTree() {
        return 0;
    }

    public int visitLeaf(int value) {
        return 1;
    }

    public int visitNode(Tree left, Tree right) {
        return 1 + Math.max(left.accept(this), right.accept(this));
    }
}

The nice thing is that now the Height function lives in its own little object. We were able to decouple it from the Tree classes.

We still can’t forget to define a case, as the compiler will force us to implement all three methods of Visitor.

Step three: generics

Of course, the big problem with this Visitor implementation is that it only works for functions that return integers. We can solve this with Java generics:

public interface Visitor<T> {
    T visitEmptyTree();
    T visitLeaf(int value);
    T visitNode(Tree left, Tree right);
}

public abstract class Tree {
    public abstract <T> T accept(Visitor<T> visitor);
}

public class EmptyTree extends Tree {
    public <T> T accept(Visitor<T> visitor) {
        return visitor.visitEmptyTree();
    }
}
// etc.

The other way to solve the genericity problem would be to define the Height visitor as a stateful object, that has a int result() method that is not part of the Visitor interface. But we’ll leave that to some other time.

A delightful little book on this very subject is A Little Java, A Few Patterns by Matthias Felleisen and Daniel P. Friedman. In fact I think I learned the meaning of Visitor by reading this book! The authors wrote a number of books on Lisp and Scheme in a similar vein; the first one was The Little Lisper and it takes a very unusual approach to teaching.

Hope it comes in handy!

12 Responses to “Algebraic datatypes in Java”

  1. Giorgio Sironi Says:

    Finally a good implementation of Visitor, thanks.
    Of course the trade-off is between closing against new operations over a set of classes (Visitor) vs. closing against the introduction of new classes like a TernaryNode or NAryNode (method in the interface). How would you decide between the two?

  2. Erik Says:

    // Scala version
    abstract sealed class Tree
    case object Empty extends Tree
    case class Leaf(value: Int) extends Tree
    case class Node(left: Tree, right: Tree) extends Tree

    val mytree = Node(Node(Leaf(1), Leaf(2)), Leaf(3))

    def height(tree: Tree): Int = tree match {
    case Empty => 0
    case Leaf(n) =>1
    case Node(l, r) => 1 + math.max(height(l), height(r))
    }

  3. matteo Says:

    @Erik: yes, Scala implements algebraic datatypes directly, at the cost of defining “case classes” as a different kind of class, together with particular ways to construct terms and functions. But as the Java example shows, you could implement them with ordinary classes.

    This is what I dislike about Scala: they want to have every language feature, and they add new, different concepts to the language to support them. What ever happened to the good old principle of defining a language around few concepts applied uniformly?

    The attitude that simplicity is not important is what makes working with Java Enterprise (and most Java frameworks) a pain. This is what makes C++ and Perl impractical engineering tools (because they require inordinate amounts of training to learn to use well.)

    I’m afraid Scala joins C++ and Perl in the class of languages where every developer only knows and uses a subset of the language, and no two developers use the same subset. This would make it a poor engineering tool.

  4. Erik Says:

    Case classes aren’t a ‘different kind’ of class; they are simply classes that will give you some stuff for free, like an extractor (for matching), a builder (so you won’t have to call new), #equals, etc. These are all implementable in standard Scala though, so there’s nothing ‘different’ about case classes. They’re ‘just’ syntactic sugar in that sense, but you can see for yourself what an enormous difference it makes.

    Scala is precisely about having a small language (e.g. no special constructs for enums, static methods, etc.) and applying OO/FP principles uniformly (e.g. the collections library). I don’t mean this in a pedantic way, but from your comment I suspect you might have looked at Scala somewhat superficially. (Sorry if this makes me sound like a douche.)

    I could imagine how it might make sense to agree on a subset of Scala, but that may depend a lot on your problem domain. Some problems are more amenable to a FP approach, some are more suitable for OO.

    It also puzzles me that someone would call Java ‘simple’. I suppose it depends on your definition of simple, and it may be down to taste as well, but Java certainly isn’t simple in any way that is meaningful to me.

    Anyway, thanks for sharing your thoughts!

  5. matteo Says:

    Erik,

    thank you for your sensible response. It’s true that I looked at Scala very superficially. It might well be that you’re right and I’m wrong. I will pospone judgment to when/if I get to work with Scala.

    I will answer about Java being simple: why, yes, it looks to me it’s a comparatively simple language. It’s much simpler than C++, and much safer than C. There are dark corners that most programmers don’t know, but the subset of what is well known is mostly the same for most people. I think it’s simple enough for the purposes of application programmers, which are the primary audience for this language.

  6. Erik Says:

    Hi Matteo, if you ever get round to working with Scala, I’d be interested to know how you fare. If you’re an XP aficionado as well as a sufferer from Haskell envy, you may well be chuffed to bits with it. Anyway, get well soon, and sorry about hijacking your comments. ;)

  7. Andrea Mariottini Says:

    Hi Matteo, with my team I came to the same solution in a situation where we have to turn “in memory filters” into database queries (in our case Hibernate queries). In this case the visitor is a generic and the getResult method returns a Query object but it could be implemented to return an sql string. This leave us the possibility to change Hibernate with Jdbc or whatever else, if needed.
    The choise seemed very natural and produced elegant code, but thinking to OCP the situation is reverted with respect to your example: it is more likely we need to add new filters rather than add new database implementations, so we’ll have to modify the visitor code violating OCP.
    However I prefer to not depend on the database layer so I choosed to close the code with respect to the database layer and let it open with respect to the need of new filters.
    I would like to have your opinion on this.

  8. matteo Says:

    @Andrea: if I understand correctly, what you have looks more like a builder than a visitor. With “builder” the intent is to map an abstract tree to something else, in a way that lets you change how you define the “something else”.

  9. Andrea Mariottini Says:

    So we wrote a builder implemented as a visitor!

    Off topic: I noted all posts of your blog have minutes = hour!

  10. matteo Says:

    Andrea: fixed, thanks! :-)

  11. Cláudia Says:

    Maybe you should take a look at the Tom language:
    http://tom.loria.fr
    Tom is embedded in Java and offers constructs for FP in OO. In Tom, we have two ways to define your ADT:
    1 – Define the Java classes you want to use and handwrite a mapping from an ADT to it by using constructs provided by Tom;
    2 – Write a BNF grammar of your ADT and use the Gom tool to generate both java classes and mapping.
    After that, we can apply pattern matching over the algebraic terms inside java methods. For instance, your example in Tom+Gom would be:

    import visitor.visitor.types.*;
    public class Visitor {
    %gom {
    module Visitor
    imports int
    abstract syntax
    Tree = EmptyTree()
    | Leaf(value:int)
    | Node(left: Tree, right: Tree)
    }

    public static void main(String[] args) {
    Visitor v = new Visitor();
    Tree tree = `Node(Node(Leaf(0),EmptyTree()),EmptyTree());
    int result = v.height(tree);
    System.out.println(result);
    }

    public int height(Tree t) {
    %match(t) {
    EmptyTree() -> { return 0; }
    Leaf(x) -> { return 1; }
    Node(l,r) -> { return 1 + Math.max(height(`l), height(`r)); }
    }
    return -1;
    }
    }

    Well, I don’t know if it’s interesting for you (yes, I saw your post is a little old now). Anyway, here are my five cents. =)

  12. Victor Nazarov Says:

    Here is my library [adt4j](https://github.com/sviperll/adt4j) that allows you to use visitor pattern like presented in the above examples, but spend yourself from writing an excess of boilerplate code required by Java language. Adt4j generates “sealed” subclasses with many convinience methods.

Leave a Reply