Tip of the day: analyzing extremely big heap dumps

Applications that run with multi-hundred-GB heaps are rare, but they do exist. However, most of the existing tools are unable to analyze heap dumps bigger than ~150G. That’s because internally in those tools, each object in the dump is assigned an integer id, with a maximum value of 2^31, or slightly more 2 billion. Statistically, in a typical heap dump an average object takes 70-80 bytes, which results in the above limitation. JXRay, however, can do significantly better: it is able to analyze dumps of up to 512G with any number of objects. How do we achieve that?

First, the maximum number of objects that JXRay can index is 2^32, or about 4 billion. More precisely, JXRay indexes ordinary instances and arrays separately, so technically it is able to handle up to 2^31 ordinary objects and up to 2^31 arrays.

Second, when JXRay hits the above limits, it (unlike the other tools) doesn’t stop. Instead, it gives a warning and skips the remainder of the heap dump during the read phase, and then analyzes only those objects that it read before reaching the limit. When you open a report generated from incomplete analysis, you see a clear warning on the top, explaining that the situation and indicating how many objects were skipped. From our experience, being able to analyze a (bigger) part of the dump is much better than nothing, and in most cases it yields useful results. That is, seeing even a part of problematic objects such as boxed numbers, and where they come from, is sufficient to optimize away all of them.

All this, along with the fact that JXRay can easily run on big, “headless” machines in a data center, makes it the most powerful tool for analyzing extremely big heap dumps. Just make sure that when you run JXRay, you give it an adequate heap (using -J-Xmx flag): around 1.5 times the size of the heap dump, or even bigger, if possible. Happy analyzing!

Leave a Reply

Your email address will not be published. Required fields are marked *