software developer: 2008

Sunday, December 07, 2008

User Stories Applied

I just finished reading the book User Stories Applied. It was interesting. A co-worker lent me the book and suggested I read it. I can't remember the last time I read a work related book on my own. It's been a long time. I guess my interests outside of work are more, well, "interesting" than studying up further on what I do during the daytime to get paid.

It's good for me to get more of a grasp on what all this agile stuff I hear about all the time is about. I still don't really get it. As I read through User Stories Applied I kept thinking to myself did you actually build real software this way? Although the author is careful to consistently back up his talk with examples from real projects. So it apparently has been used on real projects and isn't just some academic thought experiment.

I guess it's hard to get comfortable with something that is such a departure from the projects I've always worked on. Still over the years people have attempted to introduce some of the concepts of agile into the software development project management. I think that's the thing with agile among established companies. We want to pick and choose the more interesting and useful aspects without going whole hog and having say 2 programmers to a workstation.

For example iterative development with feedback is good. We've all known from hard experience for a long time that the "late and large" integration at the very end of a software project is a bad way to go. So we try to build earlier and keep it in a usable working state where people outside the team can use it and give feedback. That said we were using milestones back at PRIOR on the CMS SB3 project in 1998.

One thing that really struck me was pretty much every chapter kept going on about this individual called the "customer". The customer writes the user stories (NOT the programmers!). The customer writes the acceptance tests (hmmm). The customer provides feedback between iterations and determines which stories are to be done in each sprint. This is a certainly a departure from the past with that level of customer involvement. In my experience I'm reading this and thinking, Who is this customer? I don't recall seeing him around very much in my 12 years in the cubes.

So that's something that's important and different. Agile isn't some internal fad or thing that the programmers do on their own that is opaque to everyone else. The customers have to be more involved. That involves organizational change. If the organizational change doesn't happen and it ends up being the programmers writing the stories and the acceptance tests and planning the sprints, etc. then agile probably won't be very successful.

It was a good book. I did learn a lot more about agile. It's part of the Kent Beck signature series. I think I've grasped enough for now and I'm not planning to read the rest of that Beck series. I've got some other books outside of work related lined up to read.

I remember back at TUNS in the software engineering course. The prof did a series on 4GL and code generators. He showed us this demo of some TI system called IAF or something like that. After you set up all of the screens and described the data and workflows it set about generating C code at a rate of about 100,000 lines and hour.

The prof made an interesting observation. IAF was very good at the traditional integrated order entry - inventory - shipping problem for mid to large manufacturing companies. That one specific scenario. However you wouldn't build a compiler with IAF. You wouldn't build IAF itself using IAF.

I think that would also apply to user stories as well. For example I wouldn't specify the emergency shutdown software for a nuclear power plant using user stories. I wouldn't design air traffic control software with user stories. I think user stories and agile may be useful in certain categories of software development; and inappropriate for others.

Software development is a large and diverse space. Internal IT department. Systems integrators. Shrink wrap software installed at a remote site. Web apps running off one big live website. Within this large space there are probably some situations where user stories and agile work well.

Friday, November 21, 2008

Colour printing in the office

We put in some new printers a little while ago at the office. Usually that's nothing special. Stuff doesn't work for a couple of hours while they get the IP and printer name and such set up then business as usual.

But it was a bit different this time. We now have colour printing in the office for the first time. And it's great. I like to print code and it's great being able to get the same printout as is on my screen. So I'm really enjoying it.

It makes sense to have colour printing. Our monitors are in colour, powerpoints are colour, so you would logically expect to print in the same colour as you get on the screen. Plus we've had colour printing in our cheap home printers as completely standard for like 10 years now.

So why has it taken so long for colour printing to arrive in the office. Some history might be useful here. Back around 1997-1998 at PRIOR Data there was this Y2K project for the military. PRIOR was providing office space and some services and staff to a group who were doing this huge Y2K readiness project for the military, checking the readiness of all kinds of stuff like elevator systems and whatever else they could think of that might fail.

They were mostly downstairs from the main PRIOR office on Spring Garden Road. As part of Y2K they were producing a lot of documentation of course. And there was a rumour they had this special printer called 'Howe' that was a big HP machine that cost $27,000. And it had double sided printing.

It made sense for them for their project and the amount of documents they had to produce. For the rest of us upstairs we asked the sysadmin once in a while about double sided printing. The response was usually something like "that's really expensive; we would need to install special hardware kits on our printers; we would need to install special drivers on everyone's PCs". There was a perception that double sided printing was an exotic thing that was too difficult and expensive to make widely available.

But by 2001 when I joined Core it was just there, completely standard, even on the cheap Lexmark office printers. Double sided printing was just something that came out on the next generation of printers and it didn't cost any more when getting new printers to get double sided [although you saved a fortune on printer paper] so that was that. And we've had double sided without thinking about it for years now.

For a long time colour printing was where double sided was around 1997-1998. People were suspicious that the cost per sheet was too high. It was perceived as unnecessary. Plus colour printing was a status symbol, a perq of the important and powerful who had access too it.

But I think going forward when businesses go to replace their existing workhorse office printers they will find that colour is no longer an exotic expensive option just for the executive level. It will just be standard and the cost per page will be the pretty much same as the legacy dinosaur black and white printers. The employees will be very happy with it. Colour printing at work is great.

Tuesday, September 09, 2008

The zen of software maintenance

We're kind of in a middle phase of our current software project. In the last few weeks we've been focused on fixing as many defects as we can and getting it as stable as possible. Thus new feature work has been neglected a bit.

So pretty much each day was come to work and knock off as many trouble tickets as we could. Now some might find this distressing, focusing on fixing things over new design. But I find I kind of liked it. There's something good about hardening a system, making it better and more stable and enjoyable to work with.

In fact sometimes I could convince myself that it wouldn't be bad to do software maintenance all the time. There's some nice things about maintenance. First of all the objectives are generally clear. The defect is known and reproducible. Fix it. That's nice. Plus you know when you're done. If it goes from not working properly to working properly then you're on the right track. There's a certain satisfaction in going from non-working to working which I find pleasant.

With maintenance it's generally easier. No one is looking for grand sweeping architectural visions or elegant designs. It's just modify the design that was done, the decisions that were made, so that the execution is corrected. Rewrite is costly and generally frowned upon [as it generally should be]. So it's less taxing in some ways.

It's a bit strange for me because in the past I was always focused on being involved with new applications, new code, new systems, new technologies and tools, big new fresh things. But now I'm seeing enjoyment in caretaking the big things other people may have built in the past.

I guess I'm getting older and probably mellowing out some. I'd say based on if my health can hold up and my finances I have 10-30 productive years left. Maybe I'm at least thinking about transitioning to new types of tasks for the upcoming phases in my career.

Now that I've picked up some JSP in the last few months, it is possible that with my background in Ada, Oracle Forms/Reports, and more recently Java that there might be enough legacy code around that I could work with these technologies the rest of my career and not have to learn any big new sweeping technology or paradigm. Especially with Java which hasn't peaked yet and major new blocks of tomorrow's legacy code are still being created today.

But why just think of it in terms of code to write? There may be other roles available that I'm a good match for that are a good transition from where I've come from the last decade+. It's at least worth thinking about.

Friday, August 15, 2008

Translating JSP pages

I've been doing some JSP work the last few months. We use struts. It's not too bad I've found.

One thing struts does pretty well is supporting translation to different languages through the use of keys [aka tokens] into a properties file. This generally works well for the programmer and we generally don't have to worry a whole lot about translation of the user interface screens to different languages.

A challenge can occur when developing a new screen or adding a section to an existing screen. The developer wants to focus on getting the JSP, JavaScript and HTML right. He will typically just hard code the labels right on the JSP and leave it to later when things are working to cycle back and convert the hard coded labels in the GUI to proper keys which have been added or already exist in the .properties file.

The problem can be that under tight deadlines the developer can forget to do this or miss a label or two, especially when there is conditional processing and everything doesn't necessarily appear every time. This can be a big problem because if the test team doesn't specifically test for this then the application can be in the field before the problem is detected. That could cause an expensive patch into the field.

One thing the developer can to do deal with this is to not use the true labels during the screen development when hard coding them. What I use is the prefix "zz". So Name, Address, Number becomes zzName, zzAddress, zzNumber in the hard coded development stage GUI. That way it's obvious looking at the screen I'm not using the tokenized labels and it's a reminder to fix it before shipping. For the programmer you can be sure you have everything by doing a search in Eclipse on "zz" in your code base and it will be obvious if you've missed anything.

For the test team, this can be a somewhat difficult problem to detect. One test case the test team can do is open up the tokens .properties file and prepend "zz" to every single label - this is pretty fast to do using a macro or script. Then restart the app and go through all the screens, including the dropdown lists. If you see anything appear on the new GUI without the zz prefix you know it's a raw, untokenized GUI element, a defect. This is important to test especially in screens where some elements may only occur under certain conditions, or when the screen content can be generated dynamically. Also error messages are a common source of untokenized GUI labels.

Monday, July 14, 2008

technical debt

Technical debt is an interesting term we seem to hear a bit more lately.

I think anyone with experience in software development knows instinctively at least what it is. It's good that people are finally attaching a label to it. It's an important start.

Sometimes a team will seem to make incredible progress on early releases of some product, then may seem to slow down later. There can be a number of reasons for this slow down. One of them is productizing. Another is that with more users people will find more bugs and the team has to backtrack to address them, impeding progress on new features in the process.

But I think what often slows a team down later on is technical debt. In the early releases the team may have used unsound practices such as poor to no source documentation, poor to no unit testing, copy and paste code, heavily intertwined modules, death march project schedules. These shortcuts can yield seemingly rapid progress on a small team with a small code base over a small period of time.

However that progress comes at a price. Like an irresponsible man who lives beyond his means and uses credit cards to fund a flashy lifestyle. He seems prosperous to an outside observer but actually he's heading for a crash and what he has been doing is unsustainable. Technical debt is very similar to that type of financial debt. Much of the early progress is illusory, financed through technical debt. The result is an unmaintainable code structure that will require costly and time consuming maintenance and rewrites.

There are some hazards around technical debt, especially when a new team is asked to take over code originally developed by a different team. If the original team is too willing to part with the original code base then that may mean the code has a lot of technical debt which is coming due and the original team wants to put it on someone else to deal with it. Also with technical debt, if the later team seems to make less progress than the original team, that may be because the new team is being badly slowed by the burden of servicing the technical debt the original team ran up.

One good thing about technical debt now we have defined this "problem with no name" and put a name to it. From there it would be good if there was a way to take the next step and devise ways to measure the technical debt in a code base and in an automated way assign some debt level to the code and estimate the costs of servicing it. I think for some graduate students there may be some interesting and valuable original research that can be done in this area.

Sunday, June 29, 2008

Query results as SQL function arguments

I kind of discovered this by accident. I was going through a manual multi step process to convert some numeric IP addresses in the database to a text dotted quad format, for input into a command line load testing script. I was finding it tedious.

The thing is I had SQL functions available to convert numeric to dotted quad and dotted quad to numeric IP addresses. In the database device table we have the IP address as numeric.

The device table has a unique index on serial number. I was using serial number to identify the devices. What I wanted to do was for a given serial number get the IP address in dotted quad format.

Originally I used a two step approach. First get the numeric IP address for the device serial number, in this example "serial1000".


select ipaddress from device where serial_number = 'serial1000';

Then copy that numeric IP address query result into a second query.


select numberToDottedQuad( result_from_above_query  ) from dual;

That worked well enough but it was slow, manual and tedious all of the copying and
pasting. I wanted to combine them into a single query so I only had to enter the serial number and could get the dotted quad IP from that.

I couldn't figure out how to get the original query result to be the argument to the numberToDottedQuad() function. I tried a bit then gave up. This is what the DB wouldn't allow, syntax error.


select numberToDottedQuad(select ipaddress from device where serial_number = 'serial1000')  from dual;

After a while though it kind of just came to me. Use the same style syntax as insert .. select statements. Nest the inner result in parenthesis. That was the aha moment and I set back to it. This worked.


select numberToDottedQuad((select ipaddress from device where serial_number = 'serial1000')) from dual;

The trick is that you have to use parenthesis on the inner query so the SQL engine evaluates that first, then it can feed it as the input to the outer query. Which makes sense. This was nice because essentially the same syntax can be used for both Oracle and SQL Server [except the dual part but that's trivial to take out].

If I had created a separate stored function to run the first query and feed the result into the second function then it would have meant separate code for Oracle and SQL Server. This way I didn't have to create a stored function (get the database to do the work) and I can use the same SQL query on Oracle and SQL Server.

Thursday, June 12, 2008

Struts and JSP

I've spent the last couple of weeks doing something that I'd avoided for the last four years. I developed some Web pages in Java using Struts and JSP.

You have to know the history on it. Back at Core Networks in 2004 there was this core of around 5 Java developers who had been with the company for a little over a year on a next generation product. Additionally there were around 12 existing pre Java team who were working on various legacy products like CoreOS and point solutions like CoreMeter and others.

In 2004 we had this important business opportunity around a big Java/J2EE project. So the old guard joined together with the newer Java developers to create a unified team. What then happened was some of the people who joined the Java team had to create the GUI pages. One of the new Java people had some stuff set up using this exciting technology we'd heard of called Struts.

Well the Struts/JSP web pages were a disaster for the new developers. Previously we used PHP to create pages for our GUIs and it took around 1 developer day to create a basic page with data and a form. We used SOAP calls to both obtain the data to display and to invoke the change actions when the user clicked Submit on the HTML forms. It worked pretty well. The PHP UI developers and SOAP API backend developers could work independantly and in parallel.

Anyway the switch to Struts has horrible for the new Java developers. It was now taking a week or more to create each page in the GUI! The output was flakey and nobody seemed able to understand how it worked or how to get anything that was broken fixed. I would go by the printer for something of my own and see these e-mail printouts with the title "Struts hell"

But on this project with a new company I had to do some web pages so I delved into struts for the first time with some trepidation. I started off by reading the Struts Survival Guide which was very good, especially chapter 2. From there I was able to grasp enough about MVC to get started and understand that Struts is generally not concerned with the presentation aspect of things.

There were some rough parts and it was hard at times but I was able to get the pages to come up. It took a while due to all of the learning curve. Besides the struff you need to learn around struts like Actions, Forms, execute(), validate(), struts-config.xml there is so much more you have to grasp at least some of like JSP, HTML, JavaScript. We aren't using JSTL a whole lot yet in the existing code which is a bit surprising but at least I didn't have to grasp that as well.

I'd forgotten how enjoyable JavaScript is. Ahhh, scripting. It's so nice at times to be running the code right inside the browser and JavaScript is great to work with. I can understand why Google is so up on it.

So all in all it was a positive first real experience with struts. It was quite satisfying when I entered some stuff with known bad data and the page redisplayed properly with the error message and the original input data preserved.

I can't imagine how things were so messed up back in 2004 at Core that it was so painful but I'm glad I missed that. Historically I always thought I liked being in the back end server side processing away from the user interface. However after doing some GUI stuff it was kind of good in a way. We had a nice separation of concerns on this project where I only had to get the JSP and navigation working and another guy who wasn't interested in the GUI side had to do the validation and business processing part. That back end part now seems less interesting than it might have in the past.

Saturday, May 24, 2008

Problems with auto increment and foreign keys

A while ago we came across a puzzling problem. We had a situation at a customer site where under some seemingly random circumstances some database records were not getting cleaned up as they should.

It was hard to reproduce the problem either on the customer side or in our own test sites. After some investigation we realized the problem. We had two tables, which I'll call "site" and "site_control". When a "site" is created the associated "site_control" is created at the same time. Both tables had integer id key fields which were both based on an auto increment number (which is a sequence in Oracle) starting at 1000.

Typically the id fields in the site and associated site_control records are the same because of the auto increment. The problem occurred when in certain unusual circumstances they can get out of sync in valid ways where a site record is created but not site_control.

The defect was in our code and it had been in the code undetected for several years. When a site is to be deleted the code is supposed to call deleteSite(int siteId). Deleting a site also deletes site_control as designed due to foreign key referential integrity.

What we were doing was incorrectly passing the site_control id, calling deleteSite(siteControlId). Although incorrect, this still works most of the time since the site and site_control id's were the same. When they got out of sync, the problems occurred. The fix was pretty simple, just call deleteSite() using the correct siteId which we were holding.

This demonstrates one of the real problems with integer auto increment key fields. If the key for the wrong table is used in the code then the DB selects, inserts, updates, and deletes, may still work or seem to work in terms of foreign key checks and such. However the code is wrong and it can lead to very difficult to diagnose errors and problems after the code ships.

If there are bugs in the code then we definitely want to find them as soon as possible, certainly before the code ships. The use of auto increment with possibly shared ids is dangerous as it can allow incorrect code to pass testing and thus for defects to escape detection until the code is in the field and the bugs become extremely expensive to address.

There are a few strategies to counter this. One approach which I like is to use GUID instead of numeric keys. That way every key is unique not just within tables but across all tables. So if the code is using they key from the wrong table then the foreign key constraint will fail and it will be obvious that there is a problem.

Another approach is to "stagger" your starting point with the auto increment ids. That is a good approach too because if there is an error that it will very rarely just happen to work since the parent and child tables virtually never have the same id values. Again if there is a code issue where it is using the id from the wrong table then it will fail immediately in developer or QA testing and the code defect will be fixed before the product ships.

So in your schema definition use an offset, even 50 would be plenty. So the site table ids would start at 1000, site_control at 1050, table_c at 1100, table_d 1150, etc.

Friday, May 02, 2008

Save as PDF

A little while back I had to create a PDF version of an install guide for a maintenance patch release I was putting together.

I hadn't generated a PDF before. Historically the tech writers just took care of it in some mysterious process. Due to some company reorganization it now fell to the developer who was creating the code patch to also put together the PDF patch install guide.

The document was already done and ready to convert from Word. I asked a couple of developer coworkers about how to save as PDF. One of them pointed me in the direction of PDF 995.

It turns out this PDF 995 thing is a very handy utility. Install is fast and easy. It works as a printer driver of all things. So when you click Print the PDF 995 comes up as a "Printer" in your printer list. That's very clever.

It works fine and it generates a nice clean glossy PDF just like you expect. Then you click Save As to save the PDF on your file system and you're done. It's very easy to use.

So +1 for PDF 995, a well designed product.

Wednesday, April 16, 2008

Changing jobs

I'm doing something that many people in high tech do. I'm leaving my job to join a new company. I've been with SupportSoft and its acquired predecessor Core Networks since 2001. I'm joining Research In Motion later this month.

For me a job switch is a big move. I've only done one before, going from xwave to Core Networks in 2001 after 3 years at xwave and its acquired predecessor PRIOR Data Sciences. So many acquisitions in tech. Some people jump all the time, around every two years on average. But that's not my style and I'm personally a little bit suspicious of serial jumpers.

It's pretty wacky how networking and chance events can lead to new things. Last Christmas break I went to an annual road hockey game. There I met a guy who I knew over the years and worked with at PRIOR in the late 1990s. He's director level now. He was finishing up at his job after several years and was looking for a new job. We agreed to keep in touch over LinkedIn when he found a new job in case he was hiring additional people. As it turns out he landed at RIM in the first part of the year and that lead to me getting interviewed. I don't normally go to road hockey games expecting a new job to come of it but it just shows what can happen.

I don't have anything bad to say about SupportSoft. The local job market has been pretty good since around 2004 so if I was unhappy then I could have chosen to leave at any time. This just looks like a promising opportunity which I believe is in my best interest to take.

Every job has its ups and downs. Good things and little annoyances. I think expatriates mostly look back well on previous employers; even if they grumbled a bit when they worked there. I look back kindly on my time with PRIOR and xwave and I'll look back well on my seven years with Core and SupportSoft.

Saturday, April 05, 2008

Java GUI coming of age

Somewhat quietly the Java JDK GUI has improved. Especially since Java 5.

I'd kind of given up on swing over the years. Everyone has bad memories of traditional Java GUIs. Battleship gray, clunky. Control-C didn't work on Windows, instead it used Unix semantics like Control-Insert for copy and paste. Select a piece of text and right click the mouse. Nothing happens.

If there was one thing which made Eclipse it was developing SWT as a much superior end user experience for Java GUI. With native widgets Eclipse was so much better. Suddenly Ctrl-C works properly on Windows. Select a piece of text and right click the mouse and the context menu comes up. Finally the GUI just looks and works the way we expect. At the time it came out SWT was the obvious and superior choice.

I have to give Sun credit. They didn't give up on JFC. They obviously worked hard to improve it. Now in Java 1.5 they pass the "10 foot test" for the first time. That is, standing 10 feet away from the computer, you can't tell that it's a Java GUI. It looks and acts much smoother now.

One of my favorite programs that I use most every day is Oracle SQL developer. This uses the JDK GUI and it is just fine to work with. It looks and feels great. The standard keyboard and mouse actions all work the way you expect. It's plenty fast.

For the first time developers can consider using swing for serious Java GUI applications.

Tuesday, March 18, 2008

The computer and the network

Social networks are interesting. They represent the evolution of the computer and network.

First there were PCs. Then came local networks with printer and file sharing and e-mail. Originally the networks were to extend the capabilities of the PC. They were originally called "computer networks", i.e. computers linked together. This changed over time and the network became more important. Instead of the network being useful to the PC, the PCs purpose was to enable a network.

This continued as the networks extended out to the Internet and Web by the late 1990s.

Today we are seeing the next step in the evolution. The underlying network is becoming less important. The social network is becoming more important. Instead of social networking being an application of the Internet network, the network's purpose is becoming to enable social networking.

Social networking like Facebook is interesting because the network on its own is just inanimate technology. Instead of connecting computers to other computers, social networks are about enabling people to connect with other people. It becomes less about the technology and more about people meeting and interacting.

Sunday, March 02, 2008

Application performance testing and optimization

I've been assigned to do performance testing for an upcoming application release. We made some architectural and database changes so we want to be sure our hardware estimation process is still valid. Also we want to try it out on the newer Sun T Series servers.

Along with measuring performance I can identify areas for performance improvement and optimize where possible.

Application performance is a bit like navigating through water of unknown depth. With a canoe you can paddle happily along on a shallow river. With software the canoe corresponds to one developer or tester clicking along through screens with very small data sets.

Moderate load is like navigating a 40 foot yacht. With the yacht there's more draft so if the water is very shallow then you'll run aground. The 40 foot boat would be like around 5-10 developers using the application at the same time with a modest size data set.

Heavy load is like the aircraft carrier. You need to be very deep to be able to handle this very large boat. Heavy load is when you simulate large numbers of simultaneous users and have a large data set. The load and data set size should be the same as what you plan to use in production using the same hardware.

With software like with the waterway, you don't really know about how it performs until you test it under load. You can't tell by looking at it. Taking a canoe or a small pleasure craft through a harbour does not tell you if the water is deep enough for a massive freighter ship.

Optimizing performance is fairly straightforward especially at the start. It is an 80/20 situation. 80% of the resources are consumed by 20% of the features. So using tools like JProbe it is easy to find the hot spots. Typically optimizing the small number of trouble spots will dramatically improve performance and then you're done.

This can be frustrating to the programmer because the same optimization patterns can be applied throughout the code base but the other areas don't use enough resources to justify the investment required to refactor for performance.

Although after the first iterations when the biggest resource hogs are dealt with some of the other problem areas that were hidden by the original optimized modules now become part of the 80 in the 80/20 rule and they can then be optimized.

In most applications, the biggest performance issues are around the database. This can be caused by inefficient queries that the DBMS cannot execute quickly. It can also be caused by a poor indexing strategy (or no indexes!) on tables which have large data sets.

A good free tool for DB analysis is Oracle SQL developer. I find you can learn a lot by just copying and pasting application queries into SQL developer. In addition to seeing the execution time you can also get the explain plan in an excellent graphical view.

Any intermediate level or higher professional software developer should be aware of database execution plans and how to interpret them and optimize them. Even at the junior level a programmer should understand how indexes impact query performance in large datasets.

One of the many eternal performance headaches with EJB is around static data from the DB that is requested often but changes infrequently or never. If a trip to the DB is required every single time then this redundancy will consume a lot of resources and really slow down system performance and responsiveness.

While caching is an apparent answer to this, EJB and caching basically don't seem to go together. We've had good success though using ehcache to deal with this shortcoming of EJB. I recommend ehcache based on my experience with it in this project.

Wednesday, February 20, 2008

End of the Line for ComputerWorld Canada?

I just got my ComputerWorld Canada last week. It seems like it's been a while since the previous one. I noticed they have gotten thinner the last few months.

I think January 2008 is the slimmest I've seen. It was only 18 pages. While the content is still good, the seeming gaps between issues and the slimming is a concern. I wonder if they will cease publication soon.

It would be too bad if that happens. I've been reading ComputerWorld Canada and its merged predecessors like InfoWorld Canada as long as I've been in tech full time. They are useful to get an idea what's going on at a high level outside of your own company and projects. Although ComputerWorld Canada has always been more focused on the corporate IT departments than on my roles in systems integration and development in an ISV.

I was getting electronic delivery for a while. Around the middle of last year I switched back to delivery of the printed issues and I've been reading them when they arrive.

The tech magazines have had it tough in recent years probably due to the rise of the Internet. It would be unfortunate if ComputerWorld Canada goes away but I guess I could find a site or feed with similar content easily enough on the Web.

Thursday, February 07, 2008

The cost of bugs in the field

It's definitely true that the cost of dealing with software bugs that make it into the field is orders of magnitude larger than finding and fixing them pre production.

After our recent reorg I'm now responsible for maintenance on earlier releases of the flagship product out of our office. On Monday I got a trouble ticket about a customer who was experiencing problems after upgrading between minor releases. They wisely trial the upgrades in their lab before going live.

The customer helpfully provided a detailed description of the symptoms, the log file excerpts, as well as a packet capture. It turns out the packet capture was particularly helpful. In the wire trace I could see we were unexpectedly doing an HTTP 500 response in a certain common valid configuration. Embedded in the HTTP 500 response was a stack trace generated by Tomcat. The stack trace pointed directly to the issue. It was a bug in our code introduced by the previous maintenance developer in the minor release.

Looking at the code diff from revision history against the stack trace, it was obvious what the error was. The code fix was just a couple of lines, add a null check the original developer missed and it would be good again. However getting that change "done" on a system installed on a customer site is a tremendous amount of effort.

First I had to set up my development environment for the earlier code base including Eclipse, the application server, database, Perforce, Tomcat and all of that. Task switching between releases is tedious and that burned pretty much a day. There was an extra requirement to reproduce the issue where there had to be more than one Tomcat instance so I couldn't just run everything off my own PC. It took a while to get a separate Tomcat up with the correct Tomcat version and maintenance version of the application source code. Altogether it took more than a day just to get set up and reproduce the issue.

After reproducing the issue the actual code change only took a few minutes to implement. Then I had to install the fixed code and verify it was now working properly.

All done right? With shipped code, far from it. Then I had to package up a new release using the official procedures. Then assemble a patch to upload to the customer along with patch install instructions I had to write. The code fix also has to be merged to other later releases that will need it. The Wiki sites tracking this all had to be updated.

In addition to my own time spent which was several days for about a 3 line code change, there was the support rep in my company who had to manage the ticket and communicate with the customer, as well he had to update his own running site for that release. Additionally the customer lost a lot of time diagnosing this issue and now they have to lose more time doing the upgrade.

All in all more than one person week has been consumed by a code error that was 5 minutes work to correct. That's what happens when code bugs go into the wild.

Compare this to the cost of finding and fixing it earlier. If it had been detected by the original developer or a peer during code review it would have been about 5 minutes to fix.

If the developer had found it in his testing it would have cost around an hour to do the fix, rebuild, redeploy and rerun the test.

If the test team had found it it would have been about half a day to do the fix, do another baseline build, update the ticket tracking system and the testers rerun their test, and close the ticket.

So at every stage it gets progressively more expensive to fix serious bugs. That's why it's so important to find the bugs before they get into production.

Sunday, January 27, 2008

Sun T series servers

We've been measuring how our server software performs on Sun T series servers against the older V series.

The results with the T series have been very encouraging. For comparable servers the T series performance is around double the throughput of the V series. We're very pleased with this. This is good news because comparable T series cost less than V series and power consumption is lower. With Sun SPARC binary compatibility all the applications still run the same.

I'm very impressed with the T series. One component of our application runs on Tomcat. I wasn't sure how a busy Tomcat would run on a T1000 with a load test. It ran really well. I guess the JVM threading implementation does a good job of utilizing the T series multicore architecture.

The Sun T series is a very interesting architecture. Just one physical CPU with a clock speed in the modest range of 1-1.2 Ghz. The multicore architecture with many CPUs and independent execution threads on the single chip is remarkable.

You can now try out some advanced Sun technology like the T series servers for a free trial using Sun try and buy program.

I've worked with Sun hardware and over the last decade and I've always been a fan. I hope Sun and Solaris can continue to innovate prosper and stay around.

Friday, January 11, 2008

Private domain name registration

I bought a domain name from Yahoo domains recently. It was for a site I was helping my son set up. It's a basic setup. It just uses Blogger to post content. It's working out OK so far.

I didn't realize that you can just buy a domain name from within Blogger now so knowing that I might have done things differently. Still Yahoo domains has a good user interface and it was interesting and not difficult to get it all set up.

One thing that was interesting about domains that I noticed from Yahoo and other providers I checked out like GoDaddy is this private registration option. When you register a domain you have to provide whois information about the domain owner. The whois information includes address, phone number, and e-mail. Registrars like Yahoo warn you that this is public information and spammers and other undesirables will be able to find you when it is posted.

The registrars have a service called private domain information. In this case they offer to substitute their own information in the whois so that you do not have to provide your own personal data. This is an extra cost service. For Yahoo private registration costs almost as much as the domain name itself so it's not really cheap.

I decided not to purchase private registration. First of all I realized that my personal information is public anyway. My phone number is listed and I can be looked up in canada411.ca and elsewhere easily enough. So there's nothing in the whois which isn't already knowable.

The other reason I didn't purchase it is because it is unnecessary. With Yahoo the whois stuff is just a form you fill in. It seems you can enter whatever you want into this form and just manually obscure your identity that way. There doesn't seem to be anything preventing this.

So my tip is to save your money on domain registration and pass on the private registration option since you can just directly type in whatever you want anyway for the whois so you don't need to purchase separate private registration.

software developer