Debugging connection pool leak in Apache HTTP Client

I recently had an issue using the Apache HTTP Client pooling library where after a while threads would just block when trying to open connections. It didn’t take too much to figure out that the thread pool was being exhausted, a quick thread dump revealed all the threads were waiting for a connection from the pool. The issue was, which method was not releasing the connection?

In the Apache HC library a connection is released to the pool when the response.getEntity().getContent() is closed (provided it isn’t already null). This is what EntityUtils.consume does – it checks if getContent() is null and if not then it calls .close().

I had a couple of strategies for trying to identify the culprit. First I tried looking at a heap dump to see if I could identify the last URL that each response had accessed, but this proved problematic. I’m not sure if it was because the old responses had been garbage collected or something, but there was no viable attributes to inspect on the objects that were lying around. The second approach was to switch on some logging in my code, and the HC library. What I wanted to see was my code being called, a connection being obtained, and then being released.

This approach worked perfectly, and to illustrate have a look at the following code blocks. First on live 5 of the log4j.properties see the logging being turned on for the connection manager. Then, in the test client code I have 2 methods: one that properly closes the connection, and one that doesn’t. In this example I call the good method twice, then the bad, then the good again. Finally, see in the log output, see the Get connection then Release connection… repeated twice (the 2 good calls) then two Get connection messages, relating to the bad call followed by the good. It is clear in this example that the getBad call is responsible for leaking the connections.

In my more complex, multi-threaded code is was equally easy given that I could group the messages by thread name, then follow these groups through to identify the faulty method. However, this should probably be diagnosed by looking at the sequential unit test logs as they exercise every method.

And just to round off, my leak was in a method that called POST, processed the response and then called another POST. I was cleaning up the second POST response, but not the first one as I was reusing the response variable.

log4j.properties

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} [%t] %-5p %c.%M %x - %m%n
log4j.rootLogger=INFO, stdout
log4j.logger.org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager=DEBUG

ServiceClient.java

public class ServiceClient {
	
	public static void main(String[] args) {
		ServiceClient client = new ServiceClient();
		String url = "http://localhost/";
		if (args.length > 0) url = args[0];
		client.getGood(url);
		client.getGood(url);
		client.getBad(url);
		client.getGood(url);
	}
	
	private HttpClient client;
	
	public ServiceClient() {
		this(2);
	}
	
	public ServiceClient(final int poolSize) {
		// Set up the multithreaded http client.
		SchemeRegistry schemeRegistry = new SchemeRegistry();
		schemeRegistry.register(
		         new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));
		schemeRegistry.register(
		         new Scheme("https", 443, SSLSocketFactory.getSocketFactory()));

		ThreadSafeClientConnManager cm = new ThreadSafeClientConnManager(schemeRegistry);
		cm.setDefaultMaxPerRoute(poolSize);
		cm.setMaxTotal(200);
		
		HttpParams params = new BasicHttpParams();
		params.setParameter(CoreConnectionPNames.SO_TIMEOUT, 60000);
		params.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 60000);
		client = new DefaultHttpClient(cm, params);
	}
	
	public String getGood(String url) {
		HttpGet get = new HttpGet(url);
		HttpResponse response;
		try {
			response = client.execute(get);
			if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
				EntityUtils.consume(response.getEntity());
				return null;
			}
			
			Reader is = new InputStreamReader(response.getEntity().getContent());
			StringBuffer buf = new StringBuffer();
			
			while (is.ready()) {
				char[] b = new char[1024];
				int c = is.read(b);
				buf.append(b);
			}
			is.close();
			
			return buf.toString();
		} catch (ClientProtocolException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
			get.abort();
			return null;
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
			get.abort();
			return null;
		}
	}

	
	public String getBad(String url) {
		HttpGet get = new HttpGet(url);
		HttpResponse response;
		try {
			response = client.execute(get);
			if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
				EntityUtils.consume(response.getEntity());
				return null;
			}
			
			Reader is = new InputStreamReader(response.getEntity().getContent());
			StringBuffer buf = new StringBuffer();
			
			while (is.ready()) {
				char[] b = new char[1024];
				int c = is.read(b);
				buf.append(b);
			}
			
			return buf.toString();
		} catch (ClientProtocolException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
			get.abort();
			return null;
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
			get.abort();
			return null;
		}
	}

Log output:

2011-06-17 21:59:07 [main] INFO  org.nigelsim.examples.httpleak.ServiceClient.getGood  - Getting http://www.abc.net.au
2011-06-17 21:59:07 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.getConnection  - Get connection: HttpRoute[{}->http://www.abc.net.au], timeout = 60000
2011-06-17 21:59:07 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection  - Released connection is reusable.
2011-06-17 21:59:07 [main] INFO  org.nigelsim.examples.httpleak.ServiceClient.getGood  - Getting http://www.abc.net.au
2011-06-17 21:59:07 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.getConnection  - Get connection: HttpRoute[{}->http://www.abc.net.au], timeout = 60000
2011-06-17 21:59:08 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection  - Released connection is reusable.
2011-06-17 21:59:08 [main] INFO  org.nigelsim.examples.httpleak.ServiceClient.getBad  - Getting http://www.abc.net.au
2011-06-17 21:59:08 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.getConnection  - Get connection: HttpRoute[{}->http://www.abc.net.au], timeout = 60000
2011-06-17 21:59:08 [main] INFO  org.nigelsim.examples.httpleak.ServiceClient.getGood  - Getting http://www.abc.net.au
2011-06-17 21:59:08 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.getConnection  - Get connection: HttpRoute[{}->http://www.abc.net.au], timeout = 60000
2011-06-17 21:59:08 [main] DEBUG org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection  - Released connection is reusable.

Posted in java | 1 Comment

Spring @Autowired – Use interfaces!

Here’s a little lesson that I had to relearn today: When using Spring use interfaces.

The premise was I had a DAO bean that was configured with Spring, and it was @Autowired into my controller (or test case in this instance). Because I only intended to have a single implementation of this class, and because this was the first iteration of the project, I made the DAO bean concrete. However, when I tried to inject it into the test case got an exception:

Caused by: org.springframework.beans.factory.NoSuchBeanDefinitionException: 
    No matching bean of type [userservices.jpa.TrackerEntryDao] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}

This didn’t make any sense, but after I tried to extract the bean from the context it all became clear. The DAO object is wrapped in a @Transactional annotation, which causes the actual bean to be proxied (so the transaction can be managed). This means that the bean passed into the @Autowired variable isn’t of the concrete class anymore. But, if we are expecting an interface, then the proxy will also implement that interface.

Posted in Spring | 1 Comment

Preauth in Spring Security 3.x

Sometimes in a webapp you will be in a situation where a filter/app/container other than Spring will be responsible for authenticating a user and setting the user principal, leaving the authz to the Spring webapp. A portlet container is a typical example. There is a few examples floating around showing how to do this in Spring 2.x, but it appears some thing (packages, etc) have changed for Spring 3.x, so here is how to make it work. Use the following in your applicationContext.xml

<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:sec="http://www.springframework.org/schema/security"
  xsi:schemaLocation="
  http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
  http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security.xsd
  ">

  <!-- Setup the security to be passed through from a higher level container -->
    <sec:http entry-point-ref="preAuthenticatedProcessingFilterEntryPoint">
    <sec:intercept-url pattern="/**" access="ROLE_STAFF" />
  </sec:http>
  <bean id="preAuthenticatedProcessingFilterEntryPoint"></bean>

  <sec:authentication-manager><sec:authentication-provider ref="preAuthenticatedProcessingFilter"></sec:authentication-provider></sec:authentication-manager>
  <bean id="preAuthenticatedProcessingFilter">
    <property name="preAuthenticatedUserDetailsService" ref="preAuthenticatedUserDetailsService"></property>
  </bean>
  <bean id="preAuthenticatedUserDetailsService"></bean>

</beans>
Posted in java, Spring | Leave a comment

App reviews: APN/3G/GPRS/EDGE kill switch

With a smartphone like the Android based devices a lot of programs continue running in the background, even when the screen is off. Email and calendar sync, Google Latitude, etc. So it is important to be able to kill the data connections for these services when they are not needed, to save battery power and money.

Of course you could just use flight mode, but that really defeats the purpose of a phone, doesn’t it.

The two apps under consideration are APNdroid and Toggle Data. Both are free from the Android Market.

APNdroid works by changing the name of your APN entries, invalidating them, making cellular data connections unavailable. The interface is an app which you can place as a shortcut on your home screen. Run the app once it disables the connection and places a message in your notification. Run it a second time and it enables the connection again.

Toggle Data is similar in interface, again with an app shortcut, but using a toast, or on screen message to indicate enabled or disabled. Also, Toggle Data does uses an alternative approach to disabling the data connection, perhaps the same as the flight mode does?

With both apps you need to make sure you enable your data connection again before removing them, else you might be in a little bit of trouble.

toggledata

For the record, I’ve stuck with Toggle Data because I prefer the toast to the notification (I have enough notifications as it is), and I have the feeling its method of disabling the connection is cleaner (but I can’t really back that up)

Posted in android | Tagged | Leave a comment

Lenny missing out

As a long time Debian user I have really come to appreciate the Debian repository system. With stable, testing, unstable and experimental points in the release cycle to choose from. Typically I run testing, with occasional packages from unstable and experimental. Or, put another way, I want to use the newest “stable” releases of software which are going to be in the next proper release of Debian.

However, as the package freeze for the next Debian release, Lenny, has been in place for some time it has prevented new packages from making their way into testing. I’m not sure if Lenny is actually over due or not, but what I am sure of is that it will put Debian behind the times with three very important pieces of desktop software: Openoffice, the kernel, and network management.

Openoffice.org (OOo) in Lenny is going to be set at 2.4, only missing out on Openoffice 3.0 by some weeks. The hassle here is that OOo 3.0 is significantly nicer to use than OOo 2.4. Firstly, because it supports a multipage view in writer meaning I can take advantage of a larger desktop screen. Secondly, because the multihead view actually works in Presenter, putting it back in the ring against Power Point, etc. As well it is faster and smoother to use.

The kernel will be 2.6.26. 2.6.28 has been in the wild for a few months now, and supports devices like the Atheros wireless network card which is found in MacBooks and many other laptops. This is significant because the previous option to get these cards working was to use MadWifi, which uses a closed source HAL object file, which isn’t in keeping with the free and open Debian ethos.

And finally, the network manager for Lenny will be 0.6.6. 0.7 is in experimental, and it supports GSM modems, and has better support for VPNs, which is very significant for the travelling business applications.

All of these newer packages will be available in the next release of Ubuntu, and are already in the latest Fedora Core, which leaves Debian behind the curve in this market. Of course people could, like me, use a mix of release packages, but a line should be drawn somewhere to prevent packages in a frozen-for-release repository from getting so out of date. Given the quality of Debian packages have become very good, and reliable, it would be better to unfreeze a release after a month to allow newer, tested, software in. This assumes that this won’t affect the “blockers” which have so far prevented the release.

In summary, experimental isn’t as scary as it sounds, and perhaps we need a different philosophy on the release cycle in this rapidly changing world of software.

Posted in debian, linux | Leave a comment

Projects for the new year

These are some project idea’s I’ve been sitting on all year and have not yet started, but hope to in the new year.

1. Python CMS framework in the vein of Drupal, based on Repoze BFG

So Drupal is quite a popular community building framework, with good plugin system, and a data model which seems to work well. But I don’t think PHP is the way of the future, at least for me. I use Python for many other projects, I understand how to debug, unit test, etc. And BFG brings the really nice framework features from Zope and Plone, which makes it an ideal starting point. I’ve got a project plan for this which I’ll reveal when I get around to cutting some code.

2. Google Gadget for Bugzilla

We use Bugzilla to do service and software task tracking. It would be really swell if I had a quick list of outstanding tasks, and the ability to quickly add new one. GG works on all platforms now, so I think it is the way to go. I’d prefer this over another FF plugin too, just because it fits more naturally into  this level of things.

3. Signing of public pages which you endorse.

Wikipedia isn’t the only place “anonymous” people post their ideas and understandings, but it’ll work for this example. There can be issues with trusting online source, but what if I could sign a version of a page (the content, not the entire layout). I might be an expert in the field for instance. Then someone browsing to the page can see that I endore it. If they don’t know who I am they can also see what else I endorse, and find out more about me. Then, through social networking, their friends who trust their judgement, perhaps just one a particular subject, can judge the quality of a post through the GPG/PGP web of trust.

Posted in programming, technology | Leave a comment

ZSI -> CXF: Parameters coming in as NULL

Recounting a strange little compatibility issue I had between ZSI 2.0 and CXF 2.0.x. I was using CXF as the server, running from Maven using Jetty, and ZSI as the client. The parameters from the ZSI were arriving at the service implementation as null. With the web services logging turned on I could see that the SOAP packet was arriving OK, and looked good.

A little bit of digging round the web turned up this message.

The issue was that ZSI wasn’t explicitly namespacing the elements, and so CXF was not seeing them. The solution was to add elementFormDefault=”qualified” to my WSDL definition, and rebuild the ZSI stubs.

<wsdl:definitions xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
...
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsd:elementFormDefault="qualified"
xsd:attributeFormDefault="unqualified">
Posted in java, python, Work | Leave a comment

Trac taking a hammering

At work we have one VM which hosts all our project management software like Git, SVN, Trac and Bugzilla. However, recently it has been taking a hammering and essentially crashing. The issue was it was running out of RAM, and swapping like crazy. A little investigation into the situation uncovered that there were some Trac 0.10 CGI processes which were using >500MB of memory.

Using the ps ewww -p <pid> command to look at the enviornment variables of the processes we determined that each of them was serving the same kind of request, SVN changesets. And essentially all of them were being hit by spiders.

Essentially the issue was that the spiders were managing to create a large number of overlapping changeset requests, which were chewing up all the RAM on the VM. Increasing the amount of RAM wouldn’t only offset the problem. So, we now just block the spiders from the changeset areas of our tracs (robots.txt):

User-agent: *
Request-rate: 1/5
Disallow: /projects/data-activities/changeset
...

In the weeks since then we’ve had no issues. Besides, who wants their gaffs showing up on google 😉

Interestingly we’ve not seen the same thing happen with Trac 0.11, so perhaps the issue is resolved now.

Posted in HPC, python, Work | Leave a comment

Zope3 Component Architecture (CA) style Adapters for Java

After programming for Zope3/Plone for the past year I’ve come to really
admire the flexibility and elegance that their implementation of the
adapter pattern gives us.
And, after Martin Aspeli put the call out almost a year ago, and it has not yet been answered, I thought it was time to give it a go.
How hard could it be.

So, what I’ve developed is a very simple, no dependency, maven available library which should greatly improve the flexibility of your code.

The project page is here
and the repository is at Git Hub.

Posted in java, plone, programming | 1 Comment

From the desk in 60 seconds

Quick roundup of whats what:
1) Wireless in UK hotels.

It’s bad. I went to Edinburgh, Manchester and London recently and stayed in some ok 3 and 4 star hotels which all advertised wifi/internet access. Here’s the rundown:
 a) Grassmarket hotel in Edinburgh: Free wireless only available in the pub/common rooms
 b) Fountain Court – Harris in Edinburgh: Really fast and reliable free wireless in the room
 c) The Palace Hotel in Manchester: Fast, expensive wired access in the room.
 d) St Mark Hotel, Earls Court in London: Didn’t actually advertise wifi, and didn’t have it 🙂

2) Inheriting templates with chameleon.zpt (1.0a1) and repoze.bfg (0.3.8)

This is only noteworthy because it isn’t documented. I’m assuming the reader knows how to do metal/tal in zope.

Create a master.pt in the usual way, using <html metal:define-macro=”master” and metal:define-slot=”main”.

Create a template pt with <html metal:use-macro=”main.macros[‘master’]” and metal:fill-slot=”main”.

The trick is to manually load the master and pass it to the render_to_response. In the Django PT renderer they scan the templates dir and load all the templates, making this unnecessary.


from repoze.bfg.chameleon_zpt import render_template_to_response, get_template

def my_view(context, request):
    main = get_template(‘templates/master.pt’)
    return render_template_to_response(‘templates/mytemplate.pt’, project = ‘test’, main=main)

3) Text to speech hotkey in Gnome

First, install festival and xclip.
Next, get the script off the gentoo wiki, and make it executable.
Now, fire up gconf-editor, browse to app/metacity/keybinding_commands
Find a spare command and put in the path to the script.
Finally browse to app/metacity/global_keybindings, find the run_command_# key and enter the hotkey you want.

Now test it out. It should work on selected text, ie no need to copy it explicitly. (Gnome gets notified when you change things in gconf so there is no service restarting).

Posted in programming, python, Work | Leave a comment