Earlier this year I started a big Java development project with my team. I did mostly Rails development for a long time, so when I came back to Web applications in Java I had some expectations. I expected that Tomcat would treat user sessions in much the same way as Rails does; but it doesn’t.
A bit of background. In a RESTful web application, all session data is maintained in the memory of clients, and nothing at all is saved in server-side sessions. In a well-written web application you might want to save just a few bytes on the server; ideally you should save no more than the id of the authenticated user. On the other hand, poorly-written application and frameworks tend to save lots of data in server-side sessions.
In a Rails application, there are essentially three ways to implement session data:
a) in encrypted cookies, which are entirely client-side;
b) in server-side files;
c) in a database table.
In a single-server application you might want to use a) for maximum efficiency, or b) to avoid the risk of someone tampering the cookies (not easy to do; it is necessary to steal the encryption key.) In a multi-server application you might choose a) for maximum efficiency, or c) for added security.
Now let me clarify this: servers-side sessions in a database table are extremely efficient. Most web applications make dozens of database queries per page view; adding one or two simple queries for getting the session data will *not* add a significant overhead.
This was my point of view when I came from Rails to Java again. Now we are writing an application that will run on Tomcat, and I was expecting that Tomcat, being an established product, would offer similar functionality; but it does not.
When it comes to session management, Tomcat offers three options:
i) The standard session manager, that keeps sessions in the Java heap.
ii) The “persistent” session manager, that keeps sessions in the Java heap and and might swap “inactive” sessions on file or on a database.
iii) The “clustering” session manager, that keeps sessions in the Java heap and keeps them synchronized with other instances of Tomcat using a custom protocol over TCP. In order to cater for the needs of poorly-written applications that save lots of data in sessions, this manager has a sophisticated algorithm for moving “deltas” of session data instead of moving full session contents.
At first I thought that I could use option ii) to replicate the functionality of Rails option c). But it turns out that it’s not possible; the “persistent” session manager insists that sessions are saved to disk in batches, not individually. This means that there is no guarantee that users will not lose their session, if they click quickly from a page to another.
I briefly considered option iii), but it requires to open new TCP ports. For my application, this would require complex firewall adjustments, since our servers are located in different geographic locations. I would feel like a stupid to ask people in Operations to open this many holes in their firewalls. And I would not bet a dime on the security implications. Besides, the whole concept seems a hugely complicated way to do stuff that is best left to specialized services like memcached. The fact that the whole thing is optimized for poorly-written applications makes me like it even less.
In addition, all three options store session data in the Java heap. This is an incredibly bad idea. Memory is the most important resource for a Java application. You don’t want to waste Java heap memory for session data that will be used at best once every few minutes, and at worst will never be needed again. Doing this means that memory usage will grow linearly with the number of sessions, and most of that memory will be unused for most of the time. As a consequence, we will be forced to set a maximum number of concurrent users.
A RESTful application, on the other hand, will not waste memory on sessions, so its memory consumption will depend on the number of concurrent *accesses*, not sessions. The application only allocates the memory it needs to fulfill the requests it is currently serving, and no more. As a result, a RESTful application can handle many more concurrent requests, all other things being equal.
I can imagine that the reasoning behind the Tomcat session managers is that keeping sessions in memory is “more efficient”. For most common applications, this reasoning is wrong. For a scalable (and robust) application, the efficiency that matters most is memory efficiency; adding a few milliseconds of latency per request does not matter. Tomcat reminds me of what this very good article calls “1975 programming”; adding complications when the simpler algorithm would be much better.
In conclusion, how do I think I will solve the problem with Tomcat? I could write my own session manager; but that requires to interact with complex interfaces that I’m not confident to be able to understand correctly in a short time. I could use one of the several alternative session managers that are based on alternative stores; but I don’t really need to save milliseconds on latency, and I don’t want to install new services, and what’s more important I don’t know how reliable these things are.
The solution I decided to use is to roll my own session management in the application. I will apply to all requests a filter that does the following:
- Look for a session cookie. If it’s missing, redirect to the authentication page; otherwise find the session id in the cookie and use it to fetch session data from the database.
- If the session data is not found or if it’s expired, then redirect to the authentication page.
- Otherwise find the id of the authenticated user in the session.
The authentication page asks the user for credentials; if they are correct, then it
- Generates a session id using a secure random number generator;
- save the session id in a cookie;
- save the user id in the database.
It’s simple and very efficient. There’s not much more needed. This scheme has a nice property: it only creates a session for users who supplied valid credentials. This avoids the problem of generating lots of useless sessions for requests coming from search engines. It’s also more secure, because it makes it much more difficult for a sophisticated attacker to guess existing session numbers. This is better than what most application servers do :-)
Another nice property is that session data never changes. The only thing we store in it is the user id, and there is no way or reason for it to change.