A while ago we came across a puzzling problem. We had a situation at a customer site where under some seemingly random circumstances some database records were not getting cleaned up as they should.
It was hard to reproduce the problem either on the customer side or in our own test sites. After some investigation we realized the problem. We had two tables, which I'll call "site" and "site_control". When a "site" is created the associated "site_control" is created at the same time. Both tables had integer id key fields which were both based on an auto increment number (which is a sequence in Oracle) starting at 1000.
Typically the id fields in the site and associated site_control records are the same because of the auto increment. The problem occurred when in certain unusual circumstances they can get out of sync in valid ways where a site record is created but not site_control.
The defect was in our code and it had been in the code undetected for several years. When a site is to be deleted the code is supposed to call deleteSite(int siteId). Deleting a site also deletes site_control as designed due to foreign key referential integrity.
What we were doing was incorrectly passing the site_control id, calling deleteSite(siteControlId). Although incorrect, this still works most of the time since the site and site_control id's were the same. When they got out of sync, the problems occurred. The fix was pretty simple, just call deleteSite() using the correct siteId which we were holding.
This demonstrates one of the real problems with integer auto increment key fields. If the key for the wrong table is used in the code then the DB selects, inserts, updates, and deletes, may still work or seem to work in terms of foreign key checks and such. However the code is wrong and it can lead to very difficult to diagnose errors and problems after the code ships.
If there are bugs in the code then we definitely want to find them as soon as possible, certainly before the code ships. The use of auto increment with possibly shared ids is dangerous as it can allow incorrect code to pass testing and thus for defects to escape detection until the code is in the field and the bugs become extremely expensive to address.
There are a few strategies to counter this. One approach which I like is to use GUID instead of numeric keys. That way every key is unique not just within tables but across all tables. So if the code is using they key from the wrong table then the foreign key constraint will fail and it will be obvious that there is a problem.
Another approach is to "stagger" your starting point with the auto increment ids. That is a good approach too because if there is an error that it will very rarely just happen to work since the parent and child tables virtually never have the same id values. Again if there is a code issue where it is using the id from the wrong table then it will fail immediately in developer or QA testing and the code defect will be fixed before the product ships.
So in your schema definition use an offset, even 50 would be plenty. So the site table ids would start at 1000, site_control at 1050, table_c at 1100, table_d 1150, etc.
No comments:
Post a Comment