Sunday, October 26, 2008

Sorting in Groovy vs. Java

When you switch from Java to Groovy, one of the thing things you notice right away is how Closures affect things like sorting. As an example, lets pretend we have Things

public class Thing {
public int i1;
public String val;
public Thing(String s, int i) {
val = s;
i1 = i;
}
// for easy list printing
public String toString() {
return val +"-"+i1;
}
}

(Lets ignore the public member and all – this is just to make the code shorter)

Now lets make a list of Things in Java

List<Thing> l = new ArrayList<Thing>();
l.add(new Thing("hello", 3));
l.add(new Thing("there", 1));
l.add(new Thing("apple", 5));
l.add(new Thing("orange", 10));

and in Groovy

l = [new Thing("hello", 3),
new Thing("there", 1),
new Thing("apple", 5),
new Thing("orange", 10)];

Of course, the default way to sort in Java is fairly straightforward, though of course it doesn’t work here:

Collections.sort(l);

This won’t even compile since our Thing doesn’t implement Comparable. Groovy actually executes this no problem (though the order it sorts it in is anyone’s guess.) Running this snippet works:

println()
l.sort()
println “After Sort:”
println(l)

and prints the following result:

[hello-3, there-1, apple-5, orange-10]
After sort:
[apple-5, orange-10, hello-3, there-1]

But as you can see there is no way of knowing the sort order. So of course that is where implementing Comparable kicks in. I won’t go into that since that is basic Java and if we defined Thing as Comparable it would work as expected in Groovy as well.
Java, of course, also allows us to pass a Comparator to the sort function and this will work without changing Thing:

System.out.println(l);
Collections.sort(l, new Comparator<Thing> () {
public int compare(Thing o1, Thing o2) {
return o1.i1 - o2.i1; // sort by the int value
}
});
System.out.println("After Sorting:");
System.out.println(l);

Which will yield:

[hello-3, there-1, apple-5, orange-10]
After Sorting:
[there-1, hello-3, apple-5, orange-10]
Groovy has a similar option:

println(l)
l.sort([compare:{a,b-> Math.abs(a.i1) - Math.abs(b.i1) } ] as Comparator)
println "After sort:"
println(l)

Note the syntax for defining the Comparator on the fly in line 2. We are creating a map with functions and then converting it to a Comparator object. Very cool and powerful on the one hand but on the other kind of wacky for someone just out of Java. (Perhaps the topic for another post)You can also do this via a Closure syntax. (This is the way you would do this for these examples – the Comparator object is better for cases where you need to define a Comparator and pass it around). Here is the Closure way:

l.sort { o1, o2 ->
o1.i1 - o2.i1; // sort by the int value
}

This has been sorting with passing in a Comparator way. For that, Groovy just offers alternative (better?) syntax over Java. Groovy, however, does offer another way to use the sort function. There is version of sort that takes a much simplified Closure that won’t require us to remember exactly how to use Java’s Comparator – i.e. a positive or negative value for which way to sort.

The GDK Docs describe the function as:

sort
public List sort(Closure closure)

Sorts this Collection using the given closure as a comparator. The closure is passed each item from the collection, and is assumed to return a comparable value (i.e. an int).
Parameters:
closure - a Closure used as a comparator.
Returns:
a newly created sorted List

So, you pass a Closure to sort that turns the Object you want into a Comparable object. In our example if we want to sort our Things based on their int values you can just say:

l.sort { it.i1 }

Or for the String , just return it:

l.sort { i.val }
This works great to sort the list in the ascending order. Using this to sort the list in reverse order takes some maneuvering. For int this is still fairly straightforward.

l.sort { -1 * it.i1 }

I couldn’t get a good similar trick for Strings though – so you may need to use the Comparator or just call reverse on the returned list:

l.sort{ it.val }.reverse()

As you can see the Closure thing, can really shorten up the syntax and gives more power to what you can do in less code.

One downside of the whole Closure thing is that you can’t always tell what type of Closure a function expects to receive. If you use an editor with command completion you can only see that a function wants a Closure

image
and you have no way of knowing how many parameters will be passed to the Closure and what the function expects the Closure to return. In this case, the Javadocs explained what is needed but that is more difficult to look at or find sometimes rather than just being obvious from a method signature. Part of this is a function of a dynamic languages and not just the Closure issue.

Another technical question is that I am also not sure how Groovy distinguishes between the Closure version of the Comparator and the straight Closure which returns the Comparable object. My current guess is that there is code in the Closure version to check the number of parameters the Closure passed in expects to receive and calls it accordingly – because straight method dispatching should send them both to the same function. I will have to check into that.

Tuesday, October 7, 2008

Great tool to analyze a JVM memory dump

In the old days of Java, when you had an OutOfMemory exception, you had to begin a very painful exercise of trying to determine the source of the error. This was difficult to do since one had to reproduce and try to isolate the cause of the problem.

Now, we have heap dumps and there are a number of ways to get heap dumps out of a JVM but at a bare minimum you need to have Sun JVM's great flag on at all times:
-XX:+HeapDumpOnOutOfMemoryError

This causes the JVM to dump a snapshot of the current heap when there is an OutOfMemory exception.
This flag causes no runtime overhead, since it only comes into play when the OOM happens - the JVM then checks to see if it is set and if so will dump the heap.

There are a number of tools available to examine a heap dump.
The most basic comes with the JDK itself (starting with JDK 6): jhat
Its usage is fairly straightforward (don't forget the -J parameter to enlarge the heap of jhat itself) but its fairly basic.
It's UI is a basic HTML one.


It has a query language which is useful. It is based on Javascript for fancier queries but I had a hard time with the syntax whenever I tried to do anything fancy. It's documentation is really lacking in that regard.
However, since jhat is open source it's relatively easy to write queries in Java and compile them into jhat.

Another option is a profiler like Yourkit
It can open hprof files and its UI is pretty good as well (definitely better then jhat's)
It's commercial so that of course is a huge drawback. In addition, it is fairly slow in opening large dumps.

The best option is a relatively new tool. (and its the reason I decided to write this post - everyone needs to know about this tool) Actually it has been around for a bit but it has recently been added to Eclipse.
I am talking about Eclipse's Memory Analyzer Tool. It used to be known as SAP Memory Analyzer but was recently added to Eclipse. The new version is like its SAP predecessor but it just seems slightly more polished and ready for prime time.
It opens dumps fairly quickly and uses less memory too (It does create lots of index files on the disk but that's not that big of a deal)

It also has a query language and while it's similar to jhat's query language (they both call it OQL though this one is not Javascript based), it just seems more straightforward to use. Plus the built in queries just seem more useful and plentiful. I haven't tried adding queries to MAT though.

You can download a 32 bit version or a 64 bit version if you want to open larger heaps here.