Dave Landers

Dave’s thoughts (such as they are)

Archive for the 'Java' Category

Scheduling for Software Summit

The 3-each-session scheduling at Colorado Software Summit pays off again.

I looked at the schedule, and picked (based on content) the sessions I pretty much want to see this year. I ended up with 18. There are only 20 time blocks.

But because of the way they do the scheduling, I was able to fit in all 18 of my preferences into the 20 time slots. Just try that at other conferences!

Here’s how I do my schedule, in case you care:

Since the iCalendar format schedule is now also available, I subscribed to that (I use Apple’s iCal). Then, I created a new “myCSS” calendar and copied all the sessions I might go to (that’s 18 sessions at 3 times each). Some of the time blocks only had one session I wanted, so I deleted all the other instances of that session. That opened up yet more blocks. Lather, rinse, repeat. In the end I only had about 3 or 4 time slots to manually resolve. It probably took me less than 10 minutes total (not counting reading all the session abstracts to choose the 18 in the first place).

Thanks, Wayne & Peggy!

Technorati Tags: , , ,

Comments are off for this post

Colorado Software Summit – Detailed Schedule Posted

The full, detailed, daily schedule for Colorado Software Summit has just been posted.

If you are not familiar with the conference, you should take a look. One look at the full schedule and you will see that (like every year), they’ve got some world-class developers talking about some hot topics – REST, JavaScript, JSF, Linux, iPhone and Andriod development, scalability, Spring, and lots more.

In fact, this might just be the first conference to feature iPhone development since Apple has only this week lifted their NDA.

You should also note that, unlike a lot of other conferences, this one features every session three (3) times each. That’s quite a load on the speakers (I speak here from personal experience), but is a boon for attendees (again, personal experience). While there is much more content here than you can hope to take in during the week, the 3-times-each deal makes it pretty certain that you’ll be able to get to your top 10 or 15 sessions.

Some other things that make this conference different and better than the rest:

  • When you check in, you will get a CD with every single slide for every single session. So you can plan your schedule based on more than just the abstracts.
  • After you get home, you’ll receive a CD containing every single slide (again – since many speakers tweak things at the last minute), plus any example source code used in the sessions.
  • Every speaker is presenting two topics, so if you like one session, you might want to attend their other one.
  • This conference is a real community. You will (if you so choose) be able to hang out, eat lunch, etc. with the speakers (and many other smart developers). Networking, in-depth conversations, and lots of joking around are an integral part of Software Summit. If you don’t leave having made a couple of new friends, you haven’t done it right.
  • The food is absolutely fabulous.
  • The venue in the heart of Colorado’s Rocky mountains (at a major ski resort) is spectacular.

Technorati Tags: , , , , ,

1 comment

Leak Testing

So, here we are. We have found lots of memory leaks, and drastically improved the redeploy experience for WLP 10.2 (and the fixes are being backported to previous releases).

However, this is not the first time we’ve been here. But it is hopefully the last.

We are putting in place a test environment to keep these leaks plugged. Actually, to be more precise, we’re adding some instrumentation to our integration test suite that drives tests via the IDE. So this test will behave just like a developer – although it’s a rather anti-social and uninteresting developer :).

Note that this technique is not restricted to WLP or redeployment testing, but hopefully you can find it useful for other situations.

Counting ClassLoader Instances

The basis of this technique is that we know that the leaks we are interested in happen to always hold on to the application or webapp ClassLoader. Further, we also know that these ClassLoaders are instances of GenericClassLoader (for the app) or ChangeAwareClassLoader (for the webapp).

So, our instrumentation relies on being able to count the instances of these classes. Fortunately, JRockit can do this easily, without any extra messing about with the startup settings or whatnot.

JRockit comes with this handy jrcmd command. It lets you interact with a running JRockit JVM, like for example the WLS server. You can use it to do all sorts of interesting things:

  • find out the JVM’s command line (how it was started)
  • print thread dumps
  • change the verbosity of the JVM on the fly (things like garbage collector stats)
  • trigger a garbage collection
  • print a GC report
  • print diagnostics about memory usage

All sorts of cool things – and none of these require you to change anything about your running process. No command line switches or anything messy like that.

So, we are looking for a count of the instances of our ClassLoader objects. We do that by running:

jrcmd <wls server pid> heap_diagnostics

(You get the wls server pid by running just jrcmd – it will print the process ids of running JRockit instances.)

If you run that, you’ll see tons and tons of output – including a listing of every class loaded by the system and how many instances there are. So all we need to do is search for the class we are interested in:

jrcmd <wls server pid> heap_diagnostics | grep '%.*GenericClassLoader

That will grep for the one line we are interested in (turns out it contains a percent sign; why is not important here). We’ll get output something like this:

     0.0% 10k      150    +10k weblogic/utils/classloaders/GenericClassLoader

In the above, we see that there are 150 instances of the GenericClassLoader.

To get a more accurate count, we also want to trigger a full garbage collection and finalization before counting anything. We can do that with jrcmd also:

jrcmd <wls server pid> runfinalization

Using this information, here’s what we are doing for WLP testing:

  1. Start the server, with the application undeployed.
  2. Run finalization and count the GenericClassLoader and ChangeAwareClassLoader instances. This gives us a baseline for the number of instances used by the server itself.
  3. Deploy the application and use it (i.e. wander through the portal in a browser)
  4. Count instances again.
  5. Redeploy the application, use it, and recount.
  6. Repeat the redeploy step several times.
  7. Undeploy the application.
  8. Count instances.

All these counts (and additionally the [re-]deploy times) are all being recorded and analyzed. The most important metric is that the first and last counts are the same – after the application is undeployed, the ClassLoaders used by the application should be gone. We are collecting the intermediate results because we expect they might be useful in failure analysis.

The one last step is that we are running the JVM instrumented for YourKit, and configured to dump a memory snapshot on exit. So if the ClassLoader counts don’t look right, a developer will have the memory snapshot for analysis, without having to rerun the whole test suite.

Hopefully some of this information can be useful if you have leaks you are hunting.

Technorati Tags: , , , , ,

Comments are off for this post

Leaks We Plugged

In this post is a dump of most of the leaks that we have found and plugged for our upcoming 10.2 release. I present it here not so you can see exactly what was leaking or whatever, but rather to give me a format where I can make some notes about what we encountered, and how we fixed things. Hopefully, that might prove useful to someone.

Note that many of these leaks were never released, but were only present in internal revisions. But the ones that did make it out are being back-ported to patches for 10.0 and 9.2.

I’ve also kind-of-categorized them into a few groupings, where similar patterns have emerged. Some did not fit clearly into just one category, but the breakdown should help other leak-hunters in their quests.

Clean Up After Yourself!

This class of problems is caused by creating references to objects without clearing them when you are done with them. The Thread and ThreadLocal issues I discussed in earlier posts generally fall under this category.

Timers not released: If an app created a Timer (via commonj.timers.TimerManager), and the timer did not stop before the application undeployed, it was not cleaned up (and thus leaked). The culprit was a misused WeakHashMap. The Timers were stored like this:

WeakHashMap timers = new WeakHashMap();
...
timers.put(theTimer, theTimer); 

The map was just being used as a Set, but the insertion of the timer into the map value ensured that the same value in the keys were never unreferenced, thus loosing the entire benefit of the Weak keys in the first place (see the “Implementation note” in the WeakHashMap javadoc). The fix was simple:

timers.put(theTimer, null);

WSEE Runtime MBeans: WSEE (the web services stack) was creating MBeans to hold the configuration of handlers. They were not cleared when the application was undeployed, which in itself represents a minor leak. Unfortunately, the MBeans also (indirectly) held a reference to the Application’s ClassLoader – turning a small problem into a rather large one.

GroupSpace use of Beehive: GroupSpace was calling Beehive’s beginContext without a matching endContext. Behind the scenes, Beehive created some ThreadLocals, but it does the right thing and removes them when (if) you call endContext. This was fixed by balancing the context usage within a request.

Portal Framework service manager: – Internal class in the System ClassLoader has a Map that was holding references of services to be used. One of the services was loaded by the Application’s ClassLoader. This was fixed by unregistering services when the application was undeployed. Lesson: Keep track of what you are using and let it go when it is no longer needed.

Lesson: The Garbage Collector is not your Mom – you have to clean up after yourself around here.

Hold Me Tight!

Where the above references were necessary but simply not cleaned up, this grouping is more caches that are holding unnecessary or extra information.

Log Buffer: WLS Logging keeps a buffer that holds the last 50 or so log messages for efficient display by console. But the implementation was holding the application-scoped logger objects (rather than simply holding the text it needed to display). The fix was to copy information from the application-scoped objects to generic system-scoped objects for the buffer.

Resource initialization optimization: The ResourceBase class that is the base class for our Entitlement resources, was keeping a collection of previously created objects. It then used data from these objects to initialize fields in new instances referring to the same resource. But our Entitlement resource objects were application-scoped, and thus this collection caused leaks. We fixed this by overriding the base object with our own intermediate implementation. We kept the caches, but wrapped the objects in Weak and Soft references. This is not an ideal solution, so we have planned to revisit the whole design in a future release.

BeanELResolver cache: The javax.el.BeanELResolver (used when EL is used in a JSP) caches bean properties in a static Map. Since these bean objects are generally webapp-scoped, there is a leak. We fixed it the same way Glassfish fixed it by clearing the map on undeploy. Since BeanELResolver does not have a public method to clear the map or uncache entries, this had to be done by reflection of the private map field (ugh). The interesting bit is that BeanELResolver does have methods to provide the close/clear function, but they are private (and never referenced).

Commons Logging in System Classpath: Commons Logging holds a map of loggers, which it may load via the application or webapp ClassLoader. Thus simply having the commons logging jar in your System ClassPath will trigger a leak (if logging is used from an application – as it is with Beehive). For us, the jar was added to the system classloader inadvertently (via jar manifest entries), so the fix was to simply remove it. However, we really can’t do anything about it if a customer decides to stick commons logging in their classpath.

Glassfish JAXB: The Glassfish implementation of JAXB caches some reflection lookups. It has a WeakHashMap with Class objects as keys. Good try, however the values in that map are Constructor objects, which themselves hold an instance to the Class (which therefore means the keys are strongly held, and the Map’s weak keys can never be released). We fixed this in our local version by wrapping the Constructor in a WeakReference. We filed the ssue with Sun, and they adopted our fix in version 2.1.6.

LeaseManager holding a TimerManager: The LeaseManager (a new internal class) was holding a static reference to an app-scoped commonj.timers.TimerManager. This one was fun because it required tripping a race condition in order to trigger the leak. Also, in addition to being a leak, this was an error because the static reference tied the LeaseManager to a single deployed instance of a single application, so it would not work properly with multiple applications. We fixed this by looking up the TimerManager as needed (from JNDI, like you’re supposed to do). Yet another example of premature optimization (i.e. presuming that the JNDI lookup was too slow).

Lesson: You gotta know when to hold ’em; Know when to fold ’em.

The Ties that Bind

I’ve covered the Thread and ThreadLocal issues previously, but here are the details to complete the list.

ThreadLocals in Portal Framework: Some of the Framework JDBC code was using ThreadLocal but not cleaning up at end of request. This was fixed by simply removing the ThreadLocal (in a finally block) at the end of the request

new Threads in JSP compiler: See the previous post. It turned out that the thread was unnecessary: a leftover from when the code was ported to the server from a different environment. So it was just removed.

ThreadLocals in Content Management: Content was using ThreadLocal and not removing them after a request. The ThreadLocals were added as a (possibly unnecessary?) performance optimization, and the fix was simply to remove them.

Lesson: Don’t let your code get all tied up in knots – unravel those threads.

Final Thoughts

If you’ve taken the time to look at these issues, you can see that the problems spanned across multiple subsystems: WLP, WLS, WLW, Apache, and Sun. It was quite a project, and quite a fun challenge to track all these down.

Next, I’ll outline how we intend to keep this from happening again, and show you some tricks for your own memory leak testing.

Technorati Tags: , , ,

Comments are off for this post

LeakPatterns: ThreadLocal

Another common source of leaks we found was related to the use of ThreadLocal. A ThreadLocal is a way to attach some data to a Thread, so you can get at it later. They are really useful for a couple of cases.

First, you can cache some value (presumably expensive to compute) and reuse it later. Since the value is associated with the Thread, it is thread-safe (baring any issues with the attached objects themselves). This makes the ThreadLocal an important tool in performance optimizations.

Another good use of ThreadLocal is to hold some context-related information. For example, maybe you need some data that is on the HTTP Request (like URLs, attributes, query parameters) in order to make an authorization decision for access to certain data. Normally, this would mean that all that data (or the Request itself) would need to appear on the authorizer API – and on any API that called that one – and on any API calling those APIs, etc. Or, you could toss the Request data into a ThreadLocal and retrieve it in the authorizer. It becomes context for the call to the authorizer. This is another really useful technique for loosely tying subsystems together.

But in a server environment (like WebLogic), the Threads are usually pooled and reused for request after request. The ThreadLocal is associated with a Thread, and the Thread is associated with a Request – so that is good. Except that the Thread is only temporarily associated with the Request.

So, if ThreadLocal objects are not managed correctly (such that their life-cycle matches that of the Request), you can get all sorts of evil things happening. First, if you were using ThreadLocal for performance, it just might not work since one user session might be handled by many different Threads. Worse, you might leak data from one user to another (when a user’s Request reuses a Thread containing another user’s data).

And in the cases I saw, if that ThreadLocal holds objects that otherwise should be garbage collected, you will get a leak.

The solution is to ensure that your ThreadLocal usages are properly managed. The ThreadLocal should have the same life cycle as the Request, and you should use ThreadLocal.remove() — preferably in a proper finally block – to clean up at the end of your Request.

Technorati Tags: , , , ,

4 comments

« Previous PageNext Page »