Dave Landers

Dave’s thoughts (such as they are)

Leaks We Plugged

In this post is a dump of most of the leaks that we have found and plugged for our upcoming 10.2 release. I present it here not so you can see exactly what was leaking or whatever, but rather to give me a format where I can make some notes about what we encountered, and how we fixed things. Hopefully, that might prove useful to someone.

Note that many of these leaks were never released, but were only present in internal revisions. But the ones that did make it out are being back-ported to patches for 10.0 and 9.2.

I’ve also kind-of-categorized them into a few groupings, where similar patterns have emerged. Some did not fit clearly into just one category, but the breakdown should help other leak-hunters in their quests.

Clean Up After Yourself!

This class of problems is caused by creating references to objects without clearing them when you are done with them. The Thread and ThreadLocal issues I discussed in earlier posts generally fall under this category.

Timers not released: If an app created a Timer (via commonj.timers.TimerManager), and the timer did not stop before the application undeployed, it was not cleaned up (and thus leaked). The culprit was a misused WeakHashMap. The Timers were stored like this:

WeakHashMap timers = new WeakHashMap();
...
timers.put(theTimer, theTimer);

The map was just being used as a Set, but the insertion of the timer into the map value ensured that the same value in the keys were never unreferenced, thus loosing the entire benefit of the Weak keys in the first place (see the “Implementation note” in the WeakHashMap javadoc). The fix was simple:

timers.put(theTimer, null);

WSEE Runtime MBeans: WSEE (the web services stack) was creating MBeans to hold the configuration of handlers. They were not cleared when the application was undeployed, which in itself represents a minor leak. Unfortunately, the MBeans also (indirectly) held a reference to the Application’s ClassLoader - turning a small problem into a rather large one.

GroupSpace use of Beehive: GroupSpace was calling Beehive’s beginContext without a matching endContext. Behind the scenes, Beehive created some ThreadLocals, but it does the right thing and removes them when (if) you call endContext. This was fixed by balancing the context usage within a request.

Portal Framework service manager: - Internal class in the System ClassLoader has a Map that was holding references of services to be used. One of the services was loaded by the Application’s ClassLoader. This was fixed by unregistering services when the application was undeployed. Lesson: Keep track of what you are using and let it go when it is no longer needed.

Lesson: The Garbage Collector is not your Mom - you have to clean up after yourself around here.

Hold Me Tight!

Where the above references were necessary but simply not cleaned up, this grouping is more caches that are holding unnecessary or extra information.

Log Buffer: WLS Logging keeps a buffer that holds the last 50 or so log messages for efficient display by console. But the implementation was holding the application-scoped logger objects (rather than simply holding the text it needed to display). The fix was to copy information from the application-scoped objects to generic system-scoped objects for the buffer.

Resource initialization optimization: The ResourceBase class that is the base class for our Entitlement resources, was keeping a collection of previously created objects. It then used data from these objects to initialize fields in new instances referring to the same resource. But our Entitlement resource objects were application-scoped, and thus this collection caused leaks. We fixed this by overriding the base object with our own intermediate implementation. We kept the caches, but wrapped the objects in Weak and Soft references. This is not an ideal solution, so we have planned to revisit the whole design in a future release.

BeanELResolver cache: The javax.el.BeanELResolver (used when EL is used in a JSP) caches bean properties in a static Map. Since these bean objects are generally webapp-scoped, there is a leak. We fixed it the same way Glassfish fixed it by clearing the map on undeploy. Since BeanELResolver does not have a public method to clear the map or uncache entries, this had to be done by reflection of the private map field (ugh). The interesting bit is that BeanELResolver does have methods to provide the close/clear function, but they are private (and never referenced).

Commons Logging in System Classpath: Commons Logging holds a map of loggers, which it may load via the application or webapp ClassLoader. Thus simply having the commons logging jar in your System ClassPath will trigger a leak (if logging is used from an application - as it is with Beehive). For us, the jar was added to the system classloader inadvertently (via jar manifest entries), so the fix was to simply remove it. However, we really can’t do anything about it if a customer decides to stick commons logging in their classpath.

Glassfish JAXB: The Glassfish implementation of JAXB caches some reflection lookups. It has a WeakHashMap with Class objects as keys. Good try, however the values in that map are Constructor objects, which themselves hold an instance to the Class (which therefore means the keys are strongly held, and the Map’s weak keys can never be released). We fixed this in our local version by wrapping the Constructor in a WeakReference. We filed the ssue with Sun, and they adopted our fix in version 2.1.6.

LeaseManager holding a TimerManager: The LeaseManager (a new internal class) was holding a static reference to an app-scoped commonj.timers.TimerManager. This one was fun because it required tripping a race condition in order to trigger the leak. Also, in addition to being a leak, this was an error because the static reference tied the LeaseManager to a single deployed instance of a single application, so it would not work properly with multiple applications. We fixed this by looking up the TimerManager as needed (from JNDI, like you’re supposed to do). Yet another example of premature optimization (i.e. presuming that the JNDI lookup was too slow).

Lesson: You gotta know when to hold ‘em; Know when to fold ‘em.

The Ties that Bind

I’ve covered the Thread and ThreadLocal issues previously, but here are the details to complete the list.

ThreadLocals in Portal Framework: Some of the Framework JDBC code was using ThreadLocal but not cleaning up at end of request. This was fixed by simply removing the ThreadLocal (in a finally block) at the end of the request

new Threads in JSP compiler: See the previous post. It turned out that the thread was unnecessary: a leftover from when the code was ported to the server from a different environment. So it was just removed.

ThreadLocals in Content Management: Content was using ThreadLocal and not removing them after a request. The ThreadLocals were added as a (possibly unnecessary?) performance optimization, and the fix was simply to remove them.

Lesson: Don’t let your code get all tied up in knots - unravel those threads.

Final Thoughts

If you’ve taken the time to look at these issues, you can see that the problems spanned across multiple subsystems: WLP, WLS, WLW, Apache, and Sun. It was quite a project, and quite a fun challenge to track all these down.

Next, I’ll outline how we intend to keep this from happening again, and show you some tricks for your own memory leak testing.

Technorati Tags: , , ,

No comments yet. Be the first.

Leave a reply