Tuesday, November 18, 2008

Axis and Xerces can cause Perm problems

Away from the Groovy for a bit and back to my regular Java day job.

We have a Web Service that can occasionally return a large data structure. The server end runs inside a Tomcat and is served by Axis 1.4. Our client runs inside Tomcat in JBoss and also uses Axis 1.4 for its client code.

Recently, we experienced a situation that after the client called the web service to receive a large object a few times, we had a Out Of Memory (OOM) in the Perm of our JBoss (which as a reminder was the client side).

Since, we of course had the  crucial JVM flag (-XX:+HeapDumpOnOutOfMemoryError) on, so we got a memory dump to analyze.

Generally, analyzing perm issues is very difficult, but using Memory Analyzer Tool (did I mention how great that tool is?) we did find a suspiciously large object:

symboltable_full

You can see here that there was an instance of

org.apache.xerces.util.SymbolTable

that took up close to 225 MB.

A little drill down into the object showed that most of the entries in the SymbolTable (well since there were more then a million entries I can’t say that I really checked most of the entries but most of the entries I did check) were related to Soap. You can see one entry here:

symboltable_entry_zoom

You can see that the value in the entry is ns921375:severity which is a Soap wrapper type tag.

So, that object definitely seemed related to our OOM. The only problem was tying it to the Perm. Strings are suspects for perm problems, since in Sun’s JVMs interned strings get stored in the Perm space. The jmap command can give info on perm space used by a running jvm but unfortunately that functionality is only working on the Solaris version of the jvm. So I needed another way to try to tie the object to the Perm space and as the say: Use the Source, Luke

So, from looking at the Xerces code, what I learned is that Xerces uses the SymbolTable as a mechanism to reuse Strings. When the parser gets a new String it can pass it into the SymbolTable to get the canonical instance.

This makes sense in an XML Parser. A lot of tags will probably appear multiple times in an XML file and so a lot of memory can be saved if the parser can reuse the String objects. Also, (and possibly more importantly)  if the same String is continuously reused, then the parser can rely on the much faster == for comparing objects rather then calling equals.

The use of the term canonical instance was on purpose. The javadoc for String describes the intern function as:

public String intern()

Returns a canonical representation for the string object.

A pool of strings, initially empty, is maintained privately by the class String.



When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.




So, calling intern on a String basically does what I described the SymbolTable does.



The Xerces code in fact refers to this and says that:


The symbol table performs the same task as String.intern() with the following differences:


  • A new string object does not need to be created in order to retrieve a unique reference. Symbols can be added by using a series of characters in a character array.


  • Users of the symbol table can provide their own symbol hashing implementation. For example, a simple string hashing algorithm may fail to produce a balanced set of hashcodes for symbols that are mostly unique. Strings with similar leading characters are especially prone to this poor hashing behavior.



All this sounds good. The exact details of the reasons are less important to me but the point is someone there thought about intern and decided that it wasn’t good enough.



But a little digging into the Xerces source code turned up the following tidbit in the code for SymbolTable. SymbolTable has an inner class of type Entry which is where the Strings are put into to be stored in the SymbolTable. Entry’s constructor looks like this:



public Entry(String symbol, Entry next) {
this.symbol = symbol.intern();
characters = new char[symbol.length()];
symbol.getChars(0, characters.length, characters, 0);
this.next = next;
}


What we see here is that every Entry that gets created calls intern on the String as well.  After building a uniqueness mechanism in SymbolTable, they still relied on intern and as I mentioned before, intern Strings are stored in the perm space.



So, the bottom line here is that as large XML files with lots of unique Strings are parsed, the SymbolTable will slowly fill up your perm space as well. 



Soap envelopes are full of tags like ns921375:severity which are going to be unique and so will fill up your perm.



Final Note: We changed our soap function to return the object as an attachment and it seemed to solve the problem. 

2 comments:

Moti Karmona said...

There is nothing like a good out-of-memory-triage-war-story :)

yt said...

As long as the author hasn't pulled out all his hair out getting to the end.
Well actually that might make it a better story.