RSS Feed for This PostCurrent Article

Java: Analyze Performance Issue

This is a sequel to my previous post on performance troubleshooting by collecting profiling data without interrupting your running application.

By looking at the profiling data, it is found that garbage collection is the bottleneck of the performance.

The following parameters added to the server to collect the garbage collection data.

-Xloggc:gc.log -XX:+PrintGCDetails

After investigation the data, it is found that garbage collection in RMI is the cause of the performance issue.

For useful tools for performance troubleshooting, read here.

As per the Sun GC documentation, RMI’s distributed garbage collection (DGC) algorithm depends on the timeliness of local garbage collection (GC) activity to detect changes in the local reachability states of live remote references, and thus ultimately to permit collection of remote objects that become garbage. Local GC activity that results from normal application execution is often but not necessarily sufficient for effective DGC– a local GC implementation does not generally know that the detection of particular objects becoming unreachable in a timely fashion is desired for some reason, such as to collect garbage in another Java virtual machine (VM). Therefore, an RMI implementation may take steps to stimulate local GC to detect unreachable objects sooner than it otherwise would.

The local GC initiated is a Full GC, and the default period is 60000 ms. This can be postponed or disabled completely to avoid Full GCs by setting the following options:

-Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFFE
-Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFFE

To resolve this, the following parameters are added

java -server -Dsun.rmi.dgc.server.gcInterval=1800000 -Dsun.rmi.dgc.client.gcInterval=1800000 -Xms512m -Xmx512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC <class>

-XX:+UseParNewGC: Enables young generation parallel copying collector. Use with concurrent collector or default mark-sweep-compact collector.

XX:+UseConcMarkSweepGC: Enables old generation concurrent collection.

References:

Tuning Garbage Collection Outline

Tuning Garbage Collection with the 5.0 JavaTM Virtual Machine

How Garbage Collection Works

HP-UX Java HotSpot Tools and Commands

Java HotSpot VM Options

Tuning the Java Runtime System

GC Portal

When Java RMI and default garbage collection can make your server pause

Garbage Collection of Remote Objects

Fine-tuning Java garbage collection performance

VMFlags

HP-UX 11i Knowledge-on-Demand: performance optimization best practices

Improving Java Application Performance and Scalability by Reducing Garbage Collection Times and Sizing Memory

HP-UX Java Memory Management


Trackback URL


RSS Feed for This Post3 Comment(s)

  1. William Louth | Jun 23, 2008 | Reply

    I think you have the classification wrong here. GC is rarely the actual problem but the symptom of another problem and changing to other collectors and more exotic vm arguments is the majority of the time a masking of an issue with a short term temporary band aid.

    Kind regards,

    William

  2. admin | Jun 23, 2008 | Reply

    Actually I do agree with you. When this problem happened, I went through the code (which is actually written 8 years back and recently revamped partially), check the OS patches, Java patches, collect performance data, etc and did modify certain part of the code which I suspect is the cause of the problem. Unfortunately we still cannot resolve it.
    Reading through the Sun documentation on GC suggests that we can use the concurrent GC for best responsiveness, and RMI is doing a full distributed GC every minute by default. By tuning those parameters, I at least resolve the issue though I may not address the real problem which “maybe” is in the code.

    The strange thing is that this problem only happened over time.

    Our system is processing at 20 to 30 transaction per seconds, which means approx. 2 millions transactions in a day, and the problem happens only after a day…

  3. J.T. Wenting | Jun 24, 2008 | Reply

    That’s not actually strange at all.
    Somewhere in the code there’s a place where you’re loosing some object references in such a way that they become inaccessible but don’t become elligible for garbage collection.
    It takes time for that pool of objects to eat up enough memory to become a real problem.
    At 2 million transactions a day, if you’re loosing 100 bytes per transaction (not a lot), you’re still loosing 200 million bytes a day (or close to 200 megabytes), which is a lot.
    Those 100 bytes per transaction could be a single reference left somewhere in a static Collection that never gets released or cleaned out.

Sorry, comments for this entry are closed at this time.