Tuesday, March 18, 2008

The computer and the network

Social networks are interesting. They represent the evolution of the computer and network.

First there were PCs. Then came local networks with printer and file sharing and e-mail. Originally the networks were to extend the capabilities of the PC. They were originally called "computer networks", i.e. computers linked together. This changed over time and the network became more important. Instead of the network being useful to the PC, the PCs purpose was to enable a network.

This continued as the networks extended out to the Internet and Web by the late 1990s.

Today we are seeing the next step in the evolution. The underlying network is becoming less important. The social network is becoming more important. Instead of social networking being an application of the Internet network, the network's purpose is becoming to enable social networking.

Social networking like Facebook is interesting because the network on its own is just inanimate technology. Instead of connecting computers to other computers, social networks are about enabling people to connect with other people. It becomes less about the technology and more about people meeting and interacting.

Sunday, March 02, 2008

Application performance testing and optimization

I've been assigned to do performance testing for an upcoming application release. We made some architectural and database changes so we want to be sure our hardware estimation process is still valid. Also we want to try it out on the newer Sun T Series servers.

Along with measuring performance I can identify areas for performance improvement and optimize where possible.

Application performance is a bit like navigating through water of unknown depth. With a canoe you can paddle happily along on a shallow river. With software the canoe corresponds to one developer or tester clicking along through screens with very small data sets.

Moderate load is like navigating a 40 foot yacht. With the yacht there's more draft so if the water is very shallow then you'll run aground. The 40 foot boat would be like around 5-10 developers using the application at the same time with a modest size data set.

Heavy load is like the aircraft carrier. You need to be very deep to be able to handle this very large boat. Heavy load is when you simulate large numbers of simultaneous users and have a large data set. The load and data set size should be the same as what you plan to use in production using the same hardware.

With software like with the waterway, you don't really know about how it performs until you test it under load. You can't tell by looking at it. Taking a canoe or a small pleasure craft through a harbour does not tell you if the water is deep enough for a massive freighter ship.

Optimizing performance is fairly straightforward especially at the start. It is an 80/20 situation. 80% of the resources are consumed by 20% of the features. So using tools like JProbe it is easy to find the hot spots. Typically optimizing the small number of trouble spots will dramatically improve performance and then you're done.

This can be frustrating to the programmer because the same optimization patterns can be applied throughout the code base but the other areas don't use enough resources to justify the investment required to refactor for performance.

Although after the first iterations when the biggest resource hogs are dealt with some of the other problem areas that were hidden by the original optimized modules now become part of the 80 in the 80/20 rule and they can then be optimized.

In most applications, the biggest performance issues are around the database. This can be caused by inefficient queries that the DBMS cannot execute quickly. It can also be caused by a poor indexing strategy (or no indexes!) on tables which have large data sets.

A good free tool for DB analysis is Oracle SQL developer. I find you can learn a lot by just copying and pasting application queries into SQL developer. In addition to seeing the execution time you can also get the explain plan in an excellent graphical view.

Any intermediate level or higher professional software developer should be aware of database execution plans and how to interpret them and optimize them. Even at the junior level a programmer should understand how indexes impact query performance in large datasets.

One of the many eternal performance headaches with EJB is around static data from the DB that is requested often but changes infrequently or never. If a trip to the DB is required every single time then this redundancy will consume a lot of resources and really slow down system performance and responsiveness.

While caching is an apparent answer to this, EJB and caching basically don't seem to go together. We've had good success though using ehcache to deal with this shortcoming of EJB. I recommend ehcache based on my experience with it in this project.