Wednesday, 28 November 2012

WSE2: Trace Memory 'Leak'

Recently I had to deal with a production server that was chewing through memory at an astounding rate; The web service applications (asp and .net) would start up and idle around the 100MB mark which was acceptable, but after an indeterminate period of time the memory would suddenly grow out to 1.25gig (Server 2003 32bit) and users would start getting out of memory errors, timeouts, or generally degraded performance.

This wasn't exacly a new thing - the server had always used a lot of memory. But why was it suddenly a huge problem?

I thought it was a memory leak in one of my applications - we had just deployed a new feature that handled binary files in memory, perhaps that was leaking? Perhaps someone had dug out an old badly-written form that hadn't been used for years.

I was completely at a loss, and the usual fault finding methods were not cutting it, so I decided to delve into the mystical world of debugging via dump files.

These tools and tutorials set me up, and within a few hours I was away.
http://blogs.msdn.com/b/tess/archive/2008/02/04/net-debugging-demos-information-and-setup-instructions.aspx
http://blogs.msdn.com/b/tess/archive/2008/02/11/net-debugging-demos-lab-2-crash-review.aspx
http://blogs.msdn.com/b/tess/archive/2008/02/20/net-debugging-demos-lab-3-memory-review.aspx
http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx
http://msdn.microsoft.com/en-us/library/bb190764.aspx


After a solid 8 hours of investigation I discovered that there were 30,000+ instances of XML SOAP requests in memory (plus all their strings and encapsulating objects - 1.2 million tiny xml fragments). This is for a job management system and the XML data was all job requests/responses. No reason why these couldn't be in memory, but looking through them some of these requests were going back to the beginning of the year and our staff don't need to refer back to jobs that often.

Armed with this information and some handly debug data pointing the calling assembly of the XML fragments, I discovered two large .webinfo files in the web services directory. These are generated when WSE2 has trace enabled, and they are well-formed XML files.

My theory here is as follows.
When a request comes in WSE has to log it, so it opens the .webinfo file and creates an XML object in memory. This is manipulated, validated, and written back down to file system. When this file gets large enough it starts draining the resources available to the other applications on the server.

The solution, naturally, is to turn tracing OFF. And the problem magically went away.

Now it is interesting to note that the documentation for the trace element/feature mentions that
"Once an application is deployed to production, tracing should not be enabled. Doing so, may eventually deplete the available disk space on the computer running the application."
Considering that hard drive space is cheap this might not be a concern, but if you take into consideration that the log file needs to be manipulated in memory the scope of the choice changes dramatically.

Special thanks to Tess Ferrandez for her excellent tutorials - you helped me save the day!

No comments:

Post a Comment