Testing Has a Point of Diminishing Returns

I don’t imagine it’s a stunning revelation to say testing has a point of diminishing returns (which isn’t to say testing is not vital, even central to software development). This is what I’ve come up based on my experience – hopefully someone out there can point me to more official research:

Note that there is a geometric progression here. The first few defects do not take very long to find. Then it gets harder and harder to find more defects. Bugs 1-10 might take you an hour. The 100th bug by itself might take you 10 hours.

Where exactly the point of diminishing returns lands on this curve depends on a bunch of things of course. First and foremost would be if there are any lives at stake. For example, if you’re testing body armor for the US military, you would think it would be quite a ways up and to the right. (Evidently though – it’s not.) In software, medical applications that could result in someone’s life being saved might qualify.

Next might be money issues, such as banking applications. You wouldn’t want to go live with a beta version of software that, for example, manages bank accounts, account transfers, and the like. (These people know that now.)

Most Web applications on the other hand are a little to the left of the far-right edge of the chart. No one is going to live or die if someone can or cannot post a MySpace comment. Ecommerce applications such as SoftSlate Commerce do have to deal with money, and so – especially those parts of it involved in payment transactions and the like – should rightfully take special attention.

But besides how mission-critical the code in question is, there are other factors:

  • Would defects in the code be easy to recover from? (Some defects come out of left field and take forever to fix, but most of the time, the developer has a gut feeling about what the nature of the defects would be and can make a judgement.)
  • Is the code isolated enough that it would not have cascading effects? (An argument for modular development.)
  • What is the client’s tolerance for risk on a human level? (Well, it matters!)
  • Would additional testing delay deploying the feature beyond an important date?
  • Would additional testing cause the feature to suffer from marketing amnesia? (In which case, maybe it shouldn’t have been developed in the first place!)
  • Are all hands going to be available if a defect is discovered? (There is a tendency to want to deploy big projects in the middle of the night, when there are as few users as possible. While that makes sense sometimes, we prefer to launch big features at the beginning of a regular workday, when everybody is around, alert, and ready to help. Definitely not on Friday at 5:30pm!)
  • Is the feature being launched with a lot of ballyhoo? (Prefer “soft launches” if you don’t have as much time to test.)

You might think it’s professional negligence to say all this about testing – that one should always strive for perfection, zero defects, in which case the above factors shouldn’t matter. Our time, and our clients’ money is too valuable to waste it testing to an extreme. Yes, our clients deserve excellent software, but they also deserve us to be smart about how we achieve it. For example, a couple of the most cost-effective testing techniques we do regularly include parallel ops and automated functional testing. Those are good ways to catch more defects without going beyond the point of diminishing returns. With the time left over, we can make the software better in much more certain, tangible ways.

Posted in Deep Thoughts, Programming, Web Ops | 9 Comments

Automated Functional Testing with JMeter

Jmeter bills itself more as a load testing tool, but we’ve used it extensively for functional testing as well. For us, functional testing amounts to pretending you are a (single) user hitting the application with various requests on the way to performing a task. For example, in an ecommerce application, adding a product to your cart, going through checkout and placing an order. In other words, making sure the thing works the way it’s supposed to.

Testing of course is vital, if not central, to development. That said, there is a point of diminishing returns where it takes up more time than its worth. The line is crossed at different points depending on the situation: how mission critical the function is, what would happen if something goes wrong, the client’s tolerance for risk and instability, etc. We rarely do unit testing, which I know will cause conniptions in some. But we have to do functional testing. Someone has to sit down and see if the thing works! JMeter allows us to automate part of this process, making our testing more reliable and less time-consuming.

We have a “place an order” functional test set up to run on an hourly basis for every client we work with. We run it not only during development, but continuously on their live sites. The test script initiates HTTP requests for the client’s website, adds an item to its cart, and goes through the checkout process. Some clients have it skip the last step of checkout so an order is not actually placed, other clients have it actually place a small order. The script is tailored for each client based on their specific configuration (for example, how they have configured the flow of screens during checkout).

In addition, we have expanded the hourly script for some clients to test custom functions we’ve developed for them. In one case, the script logs in as an administrative user and goes through 90% of the functions we developed for them to assist in processing the order. For them this includes creating FedEx shipping labels, sending out emails, changing the order’s status, canceling the order, charging refunds, and so on.

In this light, the script operates as a continuous regression test of the system, not to mention a monitor that the site is up in the first place. If anything is introduced that breaks something, there’s a decent chance the script will detect it and email us. It also serves to let us know if any of the services we integrate with are down, such as FedEx’s web services, or our Authorize.net integration.

Our Basic, No-Frills Test Script

So I’d thought I’d share our generic SoftSlate Commerce JMeter functional test script. Feel free to run it. Although, you will be placing a test order on our demo store, so please don’t go crazy and DOS us. (This is functional test, not a load test.)

To view this the best way is to download JMeter, unzip it, and run

bin/jmeter &

on unix or

bin/jmeter.bat

on Windows. A GUI will pop up, and you can then open up the above .jmx file using File -> Open to look at it. The .jmx file is simply an XML file, which can edit via some other editor directly too.

I will leave it up you to explore the script and point you to the JMeter user manual for reference. But I should just point out a couple things. First, to run the script, it’s useful to highlight the “View Results Tree” so you can watch the action as it unfolds:

Second, note the two “Assertions” in the script. One checks to make sure the footer.jsp file was included in every request. That tells you the response got through to the bottom of the page, which is a good thing. The other tests to make sure no errors were displayed during the course of the test, also a good thing:

Automate Me Please

Now the fun part is to attach this test to a script and run it with a cron so you can rest easier that your website is up and running at all hours (or at least know when it wasn’t). I don’t have directions for doing this on a Windows platform, sorry. 90% of our clients are on some flavor of Linux. With that said, here’s a real basic shell script that will invoke the JMeter test script:

#! /bin/sh
/usr/local/jmeter/bin/jmeter-dt -n
                           -t /usr/local/jmeter/scripts/softslate.com.jmx
                           -l /usr/local/jmeter/scripts/softslate.comLog.jtl

Essentially, you put the above into a file and put that file into /etc/cron.hourly, and you’re off and running with an hourly, automated functional test. For all the options, check out out Non-GUI mode in the JMeter docs. In short, -n signifies running JMeter as a command, -t points to the .jmx script and -l points to the log file.

You might notice I tweaked the default jmeter command and named my version jmeter-dt. The only reason I did this was that by default the jmeter command’s memory parameters are too low for us. Here is a diff of those relevant changes:

[dtobey@centos5 bin]$ diff jmeter jmeter-dt
< HEAP="-Xms256m -Xmx256m"
---
> HEAP="-Xms16m -Xmx32m"
49c42
< NEW="-XX:NewSize=128m -XX:MaxNewSize=128m"
---
> NEW="-XX:NewSize=8m -XX:MaxNewSize=8m"
78c71
< PERM="-XX:PermSize=64m -XX:MaxPermSize=64m"
---
> PERM="-XX:PermSize=16m -XX:MaxPermSize=16m"
87c82
< java $JVM_ARGS $ARGS -jar `dirname $0`/ApacheJMeter.jar "$@"
---
> /usr/java/j2sdk/bin/java $ARGS -jar `dirname $0`/ApacheJMeter.jar "$@"

One more missing piece is email notification of all failures. Fortunately JMeter has a thing called a Mailer Visualizer. Set up is pretty easy; here is a screenshot:

(The above .jmx file has this in there, but just disabled.)

If I have one complaint about the Mailer Visualizer, it’s that it only sends you the URL of the page that failed, not anything about the Assertion that failed. Alas, JMeter is all open source, so I should just dive into the code and add the feature. Call it laziness, I don’t mind.

I hope that was helpful. Thanks for reading!

Update – Getting JMeter to Deal with HTTPS Requests

One thing I just noticed is I uncommented a couple lines in bin/jmeter.properties related to SSL. You may need to try this in order to get JMeter to handle making HTTPS requests. Here is a diff of the relevant changes:

[dtobey@centos5 bin]$ diff jmeter.properties jmeter.properties.orig
57c57
< ssl.provider=com.sun.net.ssl.internal.ssl.Provider
---
> #ssl.provider=com.sun.net.ssl.internal.ssl.Provider
63c63
< javax.net.ssl.keyStore=/usr/java/jdk1.5.0_02/jre/lib/security/cacerts
---
> #javax.net.ssl.keyStore=/usr/java/jdk1.5.0_02/jre/lib/security/cacerts
66c66
< javax.net.ssl.keyStorePassword=changeit
---
> #javax.net.ssl.keyStorePassword=changeit

Enjoy!

Posted in How To's, Programming, SoftSlate Commerce, Web Ops | 4 Comments

Apache mod_cache in the Real World

I thought I’d share our experiences implementing Apache’s mod_cache. We wanted to implement caching of product and category pages for the SoftSlate Commerce Java shopping cart application of one of our clients. The product and category pages of an ecommerce storefront don’t often change so they are good candidates for caching. If Apache can serve them directly it saves Tomcat from having to deal with them and things go much more smoothly under super-heavy load. We were already using Hibernate’s second-level cache to cache the database interaction, and believe me that helps tremendously but we wanted even faster responses. At times we have made static .html files manually from key pages like the home page and some key category pages. mod_cache seemed like a better way.

Typically we deploy SoftSlate Commerce under Tomcat where Apache serves the requests initially and hands them off to Tomcat using mod_jk or mod_proxy. Obviously having Apache in the mix in this way is a prerequisite for using mod_cache.

Here’s the config we ended up with:

<IfModule mod_cache.c>

# 300 = 5 minutes
CacheDefaultExpire 300

# With CacheIgnoreNoLastMod On set, we don't need to
# define expires headers for the pages to be cached by the
# server. And we don't want to because we'll want to control
# the cache on the server. We don't want browsers to cache.
CacheIgnoreNoLastMod On

# Ignore the query string - newsletter links have tracking
# info attached to them. We want to ignore those parameters.
# Take care if this is a store that has sorting enabled on
# category pages - this will also ignore the sorting parameters!
CacheIgnoreQueryString On

# Do not store Set-Cookie headers with the cache or you'll
# get session poisoning!!!
CacheIgnoreHeaders Set-Cookie
<IfModule mod_disk_cache.c>

# Must be writable by apache
CacheRoot /var/local/apache_cache
CacheEnable disk /product
CacheEnable disk /category
CacheDirLevels 1
CacheDirLength 1
</IfModule>
</IfModule>

Apache: Please Cache This, Browsers: Please Don’t

The Apache Cache Guide was a great help but it left open a lot of questions. First off, we wanted complete control over the cache from the Tomcat application on the server. We wanted to be able to signal Apache to refresh a certain page when critical information changed. And for this reason we did not want browsers to ever cache the pages. We wanted complete control.

As it turns out CacheIgnoreNoLastMod On helped us with this, combined with not defining an Expires header for the pages. Typically mod_cache requires you to define an Expires header for the pages, which is how it determines how long to cache the page. The problem is browsers also look at the Expires headers and will cache the page themselves based on it. CacheIgnoreNoLastMod On tells mod_cache to cache the pages even if there is no Expires header. So we tell mod_cache, yes, cache this page, but we are not telling the browser to cache it. This is what we wanted because we wanted to maintain control of the cache on the server, within our Tomcat/SoftSlate Commerce code.

OK, Now, Apache, Refresh This Page Please

So, how to signal mod_cache to delete the page from the cache and refresh it? Well it turns out by default mod_cache does this any time it receives a request with headers like this:

cache-control: max-age=0

or these:

cache-control: no-cache
pragma: no-cache

The first example is what Firefox sends when you submit CTRL-R to reload the page. The second is what Java’s HttpURLConnection class will send when you do setUseCaches(false). As far as I can tell they are equivalent. They have the effect of telling Apache to clear the page out its cache and refresh it.

Yes, it’s true: Apache will refresh the page’s cache each time any user hits reload in his browser. and I know IE at least allows you to see the refreshed version with every click. As an aside, you ought to know there is a way to tell mod_cache to ignore the above headers, and serve the content from the cache always:

CacheIgnoreCacheControl On

But leaving this at the default Off value has the advantage of serving as our method of signaling Apache to refresh the cache for a given page from our Java code:

// Create the url based on SoftSlate's SEO settings
String code = "categoryCode";
String urlString = baseForm.getSettings().getValue("customerURL")
+ "/Category.do?code=" + code;
AppLinkTag apt = new AppLinkTag();
urlString = apt.createSEOURL(urlString, baseForm.getSettings());

// Send the request, but don't wait for the reply. useCaches=false
// to trigger Apache to refresh its cache
URL url = new URL(urlString);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setUseCaches(false);
con.setDoOutput(false);
con.setDoInput(true);
con.getInputStream();

There is our friend setUseCaches(false), which sets the headers telling Apache to refresh the page. I am no expert with HttpURLConnection but the above code seems to behave the way we want it to. It sends the request off without waiting for a response. So the application is not hung up in any way making the connection. Apache gets the request and knows to clear its cache and send the request along to Tomcat, and cache the new result. Danger: be careful that you don’t trigger this request recursively and end up with a nasty infinite loop of requests that trigger requests that trigger more requests!

Bitten by the Cookie

Now you may be wondering about this in our configuration:

CacheIgnoreHeaders Set-Cookie

We started out without this configuration in place and boy did it bite us. It’s surprising to me mod_cache does not have this in place by default. What this is doing is telling Apache not to store the Set-Cookie header in the cache with the page’s content. You definitely do not want to do that in most typical web applications where session identifiers are defined via cookies! Doing so meant that if a user happened to request a page that had expired from the cache as the first hit of his session, his session ID was being stored with the cached page along with the page’s content. Anyone else hitting the page then gets the same cookie! Needless to say mayhem ensues as people get assigned the same session identifier. So, please, please consider adding this little line. I’m not sure when you wouldn’t want to add it. I mean really, how often would you want to send the exact same cookie to everyone?

Metrics, Please?

The last thing we wanted to do of course was to find out how many pages are being cached and how many times Apache is requesting it from Tomcat. (You kind of have to know how the thing is working for you.) First we added the Age header to our Apache logging configuration:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
               \"%{User-Agent}i\" \"%{Age}o\"" combined

The above outputs “-” for the Age header if the page is not being served from the cache. Otherwise it outputs the number of seconds old the page is. What that in mind, a little grepping will show you how many cache hits there are for a particular day:

grep 21\/Jul access_log | grep -v .*\"-\"

And a little more grepping will tell you how many misses there were:

grep 21\/Jul access_log | grep GET\ \/category.*\"-\"

In our case, 84% of requests to our first category page we tried it on were served from the cache. Pretty good, and it sure beats writing static files manually!

Update – Caching the Home Page

After writing up the above, I had a bear of a time trying to figure out how to cache the home page so thought I would share how I did it in the end. First there is the issue that mod_cache’s regular configuration is not very easy to use when it comes to just caching the home page and not the entire website. When you do this, it caches the entire site:

CacheEnable disk /

Since we only want to cache the home page, category pages, and product pages, we have to fall back on using the no-cache environment variable, then unset that variable for the specific paths we want to cache:

CacheEnable disk /
SetEnv no-cache

# Cache the home page
<LocationMatch "^/$">
 UnsetEnv no-cache
</LocationMatch>

# Cache category pages
<Location /category>
 UnsetEnv no-cache
</Location>

# Cache product pages
<Location /product>
 UnsetEnv no-cache
</Location>

OK, so that’s in place now, but one problem – the home page was still not being cached. (The way I tell is by installing the Web Developer Firefox plug-in, and looking at the response headers. No Age header, no caching.)

Turns out Dr. Google tells me there is an issue with mod_cache and Apache’s DirectoryIndex directive. We had this in place in our configuration, to add index.jsp to the default list of files invoked on a request to a directory (such as the home page, or /):

DirectoryIndex index.html index.htm index.jsp index.php

I’m a little fuzzy but I believe the issue is Apache might cache index.jsp if you tell it to, but mod_cache would store the cache under the index.jsp path, rather than /, which is what we really wanted cached. I tried replacing the above with a RewriteRule directive but that had the effect of Apache spitting out the raw contents of index.jsp. It was not forwarding the request to Tomcat. The other wrinkle here was we’re using mod_jk, but the JkMount directives are apparently processed before RewriteRule. Alas, the eventual solution was to add a JkMount directive for / itself:

JkMount / ajp13

Bingo, now we have home-page caching, which is a good thing because about 17% our client’s website hits are to their home page.

Soooooo, to summarize, here’s what we really ended up with for our overall mod_cache configuration:

# Comment out DirectoryIndex index.jsp!
# DirectoryIndex index.html index.htm index.jsp index.php

# URL patterns that Apache should hand off to Tomcat - add / so Apache
# forwards the home page to Tomcat (who already knows to use index.jsp).
JkMount / ajp13

...

<IfModule mod_cache.c>

# 300 = 5 minutes
CacheDefaultExpire 300

# With CacheIgnoreNoLastMod On set, we don't need to define expires headers
# for the pages to be cached by the server. And we don't want to because we'll
# want to control the cache on the server. We don't want browsers to cache.
CacheIgnoreNoLastMod On

# Ignore the query string - newsletter links have tracking info attached to them.
# We want to ignore those parameters. Take care if this is a store that has sorting
# enabled on category pages - this will also ignore the sorting parameters!
CacheIgnoreQueryString On

# Do not store Set-Cookie headers with the cache or you'll get session poisoning!!!
CacheIgnoreHeaders Set-Cookie
<IfModule mod_disk_cache.c>

# Must be writable by apache
CacheRoot /var/local/apache_cache
CacheDirLevels 1
CacheDirLength 1
CacheEnable disk /
SetEnv no-cache
<LocationMatch "^/$">
UnsetEnv no-cache
</LocationMatch>
<Location /category>
UnsetEnv no-cache
</Location>
<Location /product>
UnsetEnv no-cache
</Location>
</IfModule>
</IfModule>
Posted in How To's, SoftSlate Commerce, Web Ops | 3 Comments

Dave’s Lazy Testing Technique

I often find there’s an underlying assumption that when software breaks it’s because it wasn’t tested enough. Granted, there’s a lot of crappy, buggy software, but finding any one particular bug doesn’t signify that the app needed to be tested more. Testing is a huge expense and it has diminishing returns. To find 90% of the defects might take x amount of effort. To find 99% would take 10x. So you do have to draw the line and just push it out the door at some point. It’s actually as much a business decision about the tolerance you have for risk and the consequences of something going bad, as it is a technical decision.

What if there was a way to eliminate testing almost altogether, and yet know with virtual certainty there are no bugs? Best of all possible worlds right? In some circumstances, there is. I’ve been at this for 10 years, and I don’t know why it took me so long to find this technique. It doesn’t apply in every situation, but it does more often that you might think. I’ll explain by way of two real-world examples.

The New Foreign Key

We help develop a custom SoftSlate Commerce Java ecommerce website that records the customer’s selected shipping method along with each order. We wanted to move from storing the shipping method in a simple text field, to storing it with a foreign key pointing to a separate database table. Nice, useful refactoring. It occurred to me, why not deploy the project, but still use the text field for a little while afterwards? Populate the foreign key, but have the code still use the original field. We did this and after a period of time I was able to run a simple query to see if the new table was being populated and modified correctly. Sure enough, it revealed a bug, where the foreign key was not getting updated when it should have. So I fixed that and waited. A couple weeks later I ran the same query. Everything matches up. Now I have a tremendous amount of confidence that I can use the foreign key and it will be 100% accurate. I used live usage and data to test my code in a manner that was extremely safe. Deploying now become a matter of updating the code to use my new foreign key instead of the clunky text field.

The New Initialization Function

In a very similar way, we needed to optimize the way a particular object was being initialized. The object represented a rental schedule, but it doesn’t really matter what it was used for. Any bean essentially whose state has to be initialized would qualify. Again, our plan was to keep the old way of initializing it alone, and add the new way. Any client code using the object would still use the old way, but in parallel, we’d initialize the schedule the new way as well, right alongside it. Simply add a toString() method to the object and write both object’s string representations to the log. Let it run for a couple weeks, or however long you like until you’re satisfied, and then go back and check the logs. If the toString() output matches, you can be highly confident the new technique is working exactly as the old one was.

Possibly the commonality here is we were refactoring or optimizing existing code. In general terms, when you are replacing an existing function with a new, improved version that does the same thing, you should be able to use this technique. You do need to have a way to check the old technique against the new technique, either via logs or with a database query. Essentially you are using the live, running application to create your test cases and you are performing your assertions against these real-life test cases. Not only is it an extremely robust test, it doesn’t take any effort on your part. You let the application do your testing for you. Again, I don’t know why it didn’t occur to me years ago.

Anyone ever heard of this technique and if so, what’s it called? If not, I hereby dub it “Dave’s Lazy Testing Method”.

Posted in Deep Thoughts, How To's, Programming, Web Ops | 6 Comments

Phone Number Formatting Fun

In my spare time, I enjoy researching how best to format phone numbers. That should give you some indication of the depths of the problems in my social life.

Seriously, how many different ways are there to write down a phone number? If truth be told, I don’t even know how I do it most of the time.

  • 800-555-1234 (50.8%)
  • 8005551234 (28.3%)
  • 800.555.1234 (5.1%)
  • 800 555 1234 (4.3%)
  • (800) 555-1234 (3.9%)

seem to be the most common ways.

The percentages are from the dataset of pubic website users for a small project I just had normalizing one of my client’s phone numbers. In SoftSlate Commerce, we just provide one input box and the user can enter the number however he likes. And you’d be surprised at how some people do it:

  • 1-800-555-1234 (0.18%)
  • 18005551234 (0.16%)

and even

  • 800+555+1234 (0.01%)
  • 800*555*1234 (0.004%)

And this is just US/Canadian style numbers.

On a website, then, it would make sense to enforce a format by making the user type it in the same way as everyone else. Just have three input boxes:

But now you’ve got some annoying usability issues. If the user doesn’t know about tabbing to different form fields, you’re now asking him to pick up his mouse, point to the first input, fill it in, pick up his mouse, point to the second input, fill it in, pick up his mouse, point to the third, and fill it in. Surely that’s going to take easily twice as long as single input box.

No problem you say, just add a little magic Javascript. When he’s finished entering the first three digits, automatically hop him over to the next input box, and ditto when he’s done entering the second box.

That’s great for users who don’t know about tabbing, but it sucks for users who do. I’m in this category and now whenever I’m faced with the three little input boxes, I hesitate. Is it going to hop me over to the next one automatically, or am I ok to tab? Heaven forbid it hops me over and I tab at the same time and skip right over the second box! Remember it’s the little things in life that contribute to high blood pressure and can shorten your life significantly (not really).

So now, I’m back on the one-input box bandwagon. Let them enter it however they wish. as this post explains, it’s better usability:

Be reasonable; are we so afraid of regular expressions that we can’t strip extraneous characters from a single input field? Let the users type their telephone numbers in whatever they please. We can use a little quick programming to filter out what we don’t need.

I discovered with this latest project that you can format the numbers after the fact using the following algorithm, and it correctly formats 99.6% of the numbers:

  1. Strip all non-digits.
  2. If the result is 10 digits, you’re set.
  3. If the result is 11 digits with an initial 1, strip the initial 1 and you’re set.
  4. If the result is anything else, go with what they entered originally because who knows what else is in there (probably an extension number or a little note like “(Home)”).

No fuss, no muss, no fancy Perl regular expressions. Yeah, it’s not 100%. You have a better idea? Let’s hear it.

Posted in Deep Thoughts, Programming, UI Stuff | Leave a comment

Google Suffers from Home Page-itis

I had to look it up. It was Henry David Thoreau who said, “Simplify, simplify, simplify.” I always wondered why he said it three times. Would it have been simpler to say it just once?

Google broke the rule in a big way last week with their Background Image Fiasco. It was a warm, hot blast of not-simple blown right in their users faces. We all know about the ensuing rage from said users. This was tempered a tiny bit by a blog from someone trying to argue it wasn’t so bad – while the comments on that very blog seemed to confirm no, it really was horrible.

Here were my thoughts, moment by moment, when I first encountered a striking image of sand dunes in the background of google.com:

1. Moment one: WTF is this?
2. Moment two: Did I do this somehow accidentally myself, or are they doing this to me?
3. Moment three: This must be like one of their Doodles. If I hover over their logo, will they explain it to me?
4. Moment four: (Hovering.) No, they are not explaining it to me.
5. Moment five: How do I get rid of it? All I see is this link for “change background image” and I barely see that because of the background image.
6. Moment six: Google wouldn’t have done this, would they? Is this a hack from some third party playing a joke?
7. Moment seven: No way I’m going to click that “change background image” link. This whole thing is sketchy. I wouldn’t be surprised if this is spam.
8. Moment eight: What was I searching for again? Damn you, Google, why did you have to waste my time?

As you can see, I’m somewhat paranoid. (But shouldn’t I be?) I think I ended up doing my search in my Google toolbar out of fear that someone had hijacked google.com. I never clicked “change background image” and thankfully I didn’t have to because they took it off for me, and for everyone else. Later on I read a couple articles about it. It dawned on me that the epidemic that has infected website after website across the internet for years and years had now set in for real at google.com: I call it Home Page-itis.

Home Page-itis is the irrepressible urge to add crap you think is cool but no one else really cares about to your home page. I can’t count how many times I’ve seen it. In Google’s case, they started out with a world-famous design. Practically the epitome of “Simplify, simplify, simplify.” One input box, two buttons, that’s it. (Although I still have a beef about the “I’m Feeling Lucky” button. I mean really, who uses that?) But then came Home Page-itis. First it was the Doodles, which are somewhat cool, but, quite honestly, not worth the time. Then came Sidewiki. Sidewiki! I still refuse to learn what Sidewiki is. Something about making a comment on a web page that everyone else sees. Spare me. I do not come to Google to make comments on web pages that everyone else sees.

Then, the worst new feature of them all: they added animation to the links across the top to Google News, GMail, etc. The first time I came across that, I was literally lost. I wanted Google News, and the link was gone. I literally stared at my screen. Where did the link go? What is wrong with my computer? Am I going to have to do a Google search for Google News now? Why, oh, why, do I have to spend time thinking about this? Then I picked up my mouse, not even sure what I was going to do. Check my email or something, and presto! the links appeared. I clicked the News link in utter confusion as to what just happened. This went on for days before, after having to read an article about the feature, I learned that the links appear when you move your mouse on the page. I also learned they did this to make it look like their home page was … wait for it … simpler! Too bad it didn’t actually have the effect of making it simpler.

My guess is the people who run Google think that magically-appearing links are cool. My guess is they are wrong.

It’s not just Google of course. I see it all the time. More often, Home Page-itis takes the form of little sunbursts that link to announcements of something no one cares about. I think the scenario goes something like this:

Marketing person 1: Hey, did you see that great write up we got in Marketing Magazine?
Marketing person 2: Yeah! Hey, we should put that on our home page somewhere.
Marketing person 1: It’s not on our home page?! Oh my gosh, we have to get it up there now!

Whenever you add something to your home page, you should ask yourself, what am I going to take off the home page to make up for it? But that’s not nearly as fun as just adding more and more stuff. The result is more and more content and headlines and links get on the home page. More often than not, the content seen as very important, so it’s emphasized with big red letters and flashing animated GIFs.

The problem is, as each new link, or thumbnail, or product announcement, is emphasized on the home page, every other item on the home page is in turn deemphasized, to the point that it all becomes a mind-numbing, meaningless pile of clutter like GoDaddy.com. Which is what Google’s background images amounted to: mind-numbing clutter.

Users only have so much attention to give to your home page. It’s not nearly as much attention as you think. So please, please, be careful what you put there. And Google, yes, that goes for you, too.

Posted in Deep Thoughts, Rants, UI Stuff | 2 Comments

In Defense of the Legacy Application

It was an exciting time. At the small web development firm I worked at, we got the go ahead to rewrite from scratch one of our client’s websites. The task was to replace a Content Management System written in Cold Fusion that we inherited, with a new application of our own creation. Of course with Cold Fusion being *so* 1990s we aimed to use Java, Struts, Hibernate, and JSPs. It was one of those precious moments of my career when you were running with an open field in front of you and could really create something special soup-to-nuts. This new application would be both powerful and simple; elegant and flexible; able to toast bread and brew coffee.

Three months later when I stood back to see what we had done, I saw a complicated jumble of new technologies splattered with just as many ugly compromises and hidden pratfalls as the original.

It’s the easiest thing in the world to look at a legacy application, especially one that you did not create, and argue for it to be replaced. Most often, developers do this with a bit of attitude for good measure. (“Who would want to run that piece of crap technology?”) The egotism is interesting in these situations. At any given point in time most developers are going to have strong preferences about how to develop an application. Sometimes, they can be outright ideological, with respect to the technology. So it is only natural that when presented with a legacy application that they didn’t write, they’ll demand a replacement written the “right” way.

Problem is, there is one thing the legacy application does that is often overlooked: it works. Granted, many legacy applications don’t work very well, and some don’t work at all in some areas. But if it is actually being used, it is working in some capacity and providing some benefit.

Part of what makes it work is the effort it took to develop features and fix bugs that are now taken for granted. Any application goes through a maturation process once users actually begin to use it. There is a huge gulf, most times, between the way an application works when first developed and the way the users actually need it or want it to work. No amount of up-front analysis can anticipate it fully. Once in the real world, bugs happen and adjustments have to be made. The amazing thing is how easy it is to forget all this effort, or even to count it against the application under the assumption that bugs won’t exist with a replacement. The exact same bugs may not exist, but other ones certainly will. The new application will have to start over with the process of maturing and stabilizing. What makes this worse is users may have bad memories of the bug and it will contribute to giving the legacy application a bad reputation in their minds. Developers, too, will have bad memories of having to fix the bugs, especially if they inherited the application. Yet, a bug that has been found and subsequently fixed should count in *favor* of the application because it contributes to its maturity, not against it.

Another psychological factor at play is the siren call of new technologies, and the assumption that they will make life far easier than they really will. New technologies are often over-hyped and even if they are not, many folks think they will help dramatically. Improvements in technology are sometimes important, and can lead to some critical efficiencies. But in the world of software development, they are almost always only marginal improvements. Granted, over time they can add up. But to replace a reasonably modern legacy application with a brand new technology is often not worth it. Perl was one of the first languages used in Web applications. I can remember using it to do all sorts of crazy text parsing during one of my first professional projects, which was to convert a slew of data about books from my company’s proprietary format, to HTML. Looking back at that code, on the one hand, it is a bit of a mess, as many procedural programs become. But despite this, strangely, I am impressed by how easy it is simply to follow what happens from start to finish.

For developers, there should be a golden rule about replacing legacy applications: you aren’t allowed to say it should be replaced until you understand it. For one thing, you’re going to have to understand all its functions anyway, if you hope to replace them. Second, in going through the process of reviewing the legacy code, you are probably going to be surprised by how much it does. Your impressions of the application are almost certainly more simplified than the reality of it. And your estimates about how hard it will be to replace are almost certainly too low.

I’m making this argument for legacy applications with the full understanding that it’s fruitless. This is far too emotional an issue most times. Humans being what they are, the urge to throw something away and start from scratch is too strong. And the one thing that you can say about replacing legacy application is, the developers who write the replacement are going to own it and be invested in it. And that’s certainly important. At least until they leave and the next developers come along. Give the new folks a few months. Just wait. Sooner or later, they’ll start arguing for the replacement to be replaced.

Posted in Deep Thoughts, Programming | Leave a comment

I Want to Invent a Term

The term I want to invent is “Feature Amnesia”. I don’t know if I’m the true inventor, but I don’t see it anywhere after Googling, so I hearby call dibs. I’ve never invented a term before, so this is exciting.

You might know its more famous cousin, Feature Creep. I would guess Feature Creep is so common most software developers with more than a couple years of experience have probably seen it. When you start with a few features and as you go along discover you really, really need this, that, and the other feature too, you’ve got Feature Creep. If it ends up making the whole project late, bloated, and unenjoyable, you definitely have it.

Feature Amnesia is in some ways the opposite of Feature Creep. It happens when the people who ask for a feature forget about it. Most often, other priorities supplant it in their minds and it goes on the back burner, or no burner at all. If you’re lucky, it happens before you actually spend any time on it. If you’re not, it happens just as you deliver it.

I first became aware of the phenomenon while working at a company that sold consumer products over their website. The company was run and controlled by folks with a marketing background. (You could tell this by both the number of marketing people on staff and the size of their cubicles.) One day during a large meeting the marketing department explained their plan to target a certain audience in a new way, which would require some functional changes to the website. In the meeting we were talking about the idea and what would be required for IT to implement it. The project was presented to us as: (1) being very critical, (2) having support from the highest levels in the company, and (3) one we would need to pay special, urgent attention to in order to meet a very important deadline.

Sitting there I couldn’t help but think of the last meeting we’d had in the same conference room, with many of the same staff, to talk about another marketing idea. It was presented to us as (1) being very critical, (2) having support from the highest levels in the company, and (3) one we would need to pay special, urgent attention to in order to meet a very important deadline.

In fact, we were just finishing up that project and would be deploying it to the website soon. But there was no talk of it at this new meeting, no mention of its urgency. What had happened to that project?

Later I discovered it had died of Feature Amnesia.

In short, by the time the development of the feature was finished, the marketing staff had moved on. They had, almost literally, forgotten about it. It took repeated prompting to trigger the memory, at which point the response was, “Oh, yeah, we don’t need that anymore.”

Granted you have to have a pretty dysfunctional situation to see Feature Amnesia in full effect. In this case, the company’s management were entirely marketing focused, and really interested in trying all sorts of new, fresh ideas. That in and of itself is not a bad thing. Of course the problem was there was no follow-through. There really was a short memory for these ideas. And I think the atmosphere was one where the marketing staff were rewarded more for coming up with an interesting idea, than actually seeing the idea through to fruition.

If truth be told, there are plenty of times when developers can encourage Feature Amnesia and turn it to their advantage, so it’s not always the marketing staff who are at fault. Plenty of times developers stonewall or even ignore requests they don’t like for one reason or another. The hope being that the person asking for it will forget about it in time. This happened plenty at the company mentioned above, and indeed, it may have created a climate where follow-through was seen as unimportant or fruitless. Of course one would hope features would each get a fair shake and misgivings aired in the open rather than ignored and forgotten in some sort of passive-aggressive dance. Sometimes Feature Amnesia is honest; sometimes it’s willful.

Posted in Deep Thoughts, Rants | 1 Comment

Software Is Complicated

It may seem obvious that software is complicated. But whether you’re a programmer or not, it’s probably more complicated than you think it is. I recently reread Fredrick Brooks’ The Mythical Man-Month, now over 30 years old but still remarkably relevant. Brooks warned about how easy it is to think software is not as complex as it really is, and the field of software development hasn’t changed much in this respect.

Imagining a piece of software and what it can do, and making it a reality, are of course two very different things. For programmers, Brooks warned about the tendency to think a piece of programming will be easy to do when thought about in the abstract—in his words, the tendency to assume “all will go well”. If a feature is easy to imagine, our instincts tell us, it can’t be too hard to implement. This tendency affects even the most experienced and cynical developers. You want to think it’ll be easy, everyone else wants you to tell them it’s easy, you can’t think of any concrete problems, and so you underestimate the complexity.

Non-technical folks struggle even more to grasp software’s complexity. In addition to suffering from the same instincts to believe “all will go well”, they don’t have the benefit of experience that comes with grappling with software first-hand. An entertaining thread on slashdot lists a number of techniques techies have devised to explain software’s complexity to non-techies. My favorite is still the car analogy. Say a typical car has 10,000 parts that must fit together and work in harmony for the machine to run. SoftSlate Commerce, the Java shopping cart application I developed over the course of the last four years, has well over 50,000 lines of code, each of which must fit together and work in harmony for the application to run. Further, the vast majority of those lines of code is unique, whereas in a car, many basic parts such as nuts and bolts are duplicates. (In contrast, duplicate lines of software code are usually abstracted into a subroutine, eliminating the duplication.) I will grant there are plenty of issues involved getting a car made in the physical world. But as a measure of complexity alone, I think the analogy is useful.

In addition, I suspect that for non-technical people, using popular desktop applications colors their impressions of software in general. If you’re asking for an enhancement to a custom application you use in your business, and your main experience with software is Microsoft Word, or Firefox, and so on, you’ll think of analogies to what you’re asking for in those applications. In your experience, the feature doesn’t seem too complicated, because you use something like it in these other applications and it works flawlessly (most of the time). But you may not realize that tens of thousands of hours were spent writing the applications you’re familiar with, hundreds of millions of dollars were spent on their budgets, and, yes, even the simplest features to use are extremely complicated.

Another insight Brooks touches on is the fact that as a software program gets larger, its complexity gets larger exponentially. The first 1000 lines of code may take two days to write. Lines 99,000 to 100,000 will surely take longer, since it’s likely they will affect a number of the program’s different aspects and introduce side effects that have to be anticipated and tested against. Certainly I’ve felt this effect and all its pain first hand. In one customization of SoftSlate Commerce, quite a number of custom features related to the core order processing have been added over the years. With each new requirement, we must now take into account how it might affect every single other feature, to ensure we don’t break something.

So software is complicated. So what do we do? There are a number of good things you can do to reduce the impact of software’s complexity, but nothing you can do to eliminate it. First and foremost is to employ modularization. If you could divide a software program into independent modules, you could eliminate the exponential effect described above, where new features take longer and longer to implement. In the Java world, the term “decoupling” identifies the same concept. If you could sit around writing completely independent pieces of code and then use a framework to put them together, your job would certainly be much simpler.

Problem is, you can never, ever achieve pure modular separation without sacrificing control. It’s just not possible, no framework is up to the task. In my early career, I worked a lot with the Miva Merchant shopping cart application, and I learned how to write Miva Merchant “modules”. Their framework was reasonably good, with a decent API for writing modules of different types, such as payment processors, shipping processors, and so on. Problem is, it wasn’t enough. Too often you wanted to hook into this screen over here to display something, and then this piece over there to process some new user input, and so on. A sort of uber-module came along called Open UI, which, when installed over Miva Merchant, provided you all sorts of these hook points to make more granular modules. Sure enough, all sorts of “Open UI” modules came along. The opposite problem soon surfaced, however, where the modules attempting to modify the same areas of the core application would interfere with each other. One module would say, “I want to add a new input box here”, and another module would say “No, I want to display a fancy table here instead”. It made the process simpler to an extent, but then a new category of particularly hellish and complicated disasters came along.

One modern answer to modularization, Aspect Oriented Programming, I fear falls into the same category: it helps, but it doesn’t solve the problem by any means. In Java using Spring 2.0 or AspectJ, you can use an AOP framework and write “aspects” that can hook into the beginning and end of every method the application runs. It’s a great tool for modularization, but don’t believe for a second you won’t run into cases where one aspect interferes with another.

At this point you might say, screw it, I don’t want complicated software, so let’s just write software that has fewer features. Ah, yes, now you have hit on a true solution to the problem, but I’m not sure how satisfying it is: avoid complicated applications by refusing to develop features for them. Sure enough, someone’s already put this concept into practice, the web development firm 37signals, who claim their products “do less than the competition—intentionally.”

I have to admit it’s an intriguing approach. It’s true that too much software is “too complicated”, even if all of its features were deemed necessary at one point in time. Remember that the tendency is for the users asking for the features, and the programmers actually developing them, both to underestimate the complexity of software. When that’s the case, inevitably features that have no business being attempted are forced into an application. If folks would stop and really appreciate the complexity of what they’re contemplating, often common sense would convince them to just move on and try something else. Or to explore ways to work *with* the software, instead of asking the software to work *for* them.

Of course from my perspective refusing to develop features is a form of a professional cop-out, and it certainly doesn’t help if your particular feature really is essential to your business, regardless of how much more complicated it makes the software. Joel Spolsky’s essay on simplicity is a good rebuff of the 37signals credo. Would that we could all live in a world where reasonable users had reasonable needs, and asked programmers for reasonable features that made their applications only reasonably complicated. Alas, I don’t live in that world, not yet anyway. Maybe once I buy that boat, sell all my other belongings, and live the simple life on the open seas I will.

Posted in Deep Thoughts, Programming | 2 Comments