Apr. 16th, 2008

johnridley: (Default)
We bought a lot of hardware last year in preparation for this year. It paid off.
The big thing was that we got our own NetApp filers, two of them configured as a metro cluster (in separate buildings, mirroring).

Previously we were sharing a metro cluster with a bunch of Windows boxes running Citrix and 10,000+ apps, using CIFS shares, which are much harder on the NetApp CPUs than the NFS we use. And those boxes were at peak utilization just when we were. This meant that we were hugely I/O bound almost all the time when we were getting crunched.

Last year we struggled with a peak day of 420,000 inbound tax returns. This year we ALMOST made it through a peak day of 680,000 returns without implementing ANY of the performance boosts we could have. Last year we had 8 inbound queue processors, we shut down user-generated reports, and we did several other measures to try to cut utilization, and we still had a 4 hour backlog at one point on the busiest day.

This year we were running a 2 to 4 SECOND backlog (as good as immediate) with default config, except that at the peak 2 hours I had to start 2 more inbound queue processors when the backlog climbed steadily to 75 seconds. Once the extra processes were started, the delay dropped back to 2 seconds within a minute. Full services were maintained all day without breaking a sweat, and from the data we gathered, I think we could handle at least 3, possibly 4 or 5 times the load we saw this year without significant impact, just by tweaking a few things. We could do even more than that if we had to by doing some quick rearranging and possibly suspending a few non-priority services.

The important bit is that I added a TON of instrumentation this year and gathered a lot of data, and I intentionally kept the processing systems turned down so that we would hit their limits and we would really know what the capabilities were in real world situations. Simulating April 15 has turned out to be very difficult, but now we have real numbers.

I haven't heard of any huge meltdowns among our competitors, but I did get a little rumbling that one or two of them were really struggling to keep up. Nothing like what Intuit did last year though.

As I told a coworker last week, it's a good sign when it's April 10 and I'm working on next year's enhancements because I've got nothing better to do.

April 15 is always a pain because I am committed to watching the systems until at least 2AM, just in case something happens. The only thing that happened this year was the usual idiotic accountants that decided to pull some spectacularly stupid things on a few hundred returns, transmit them and immediately go home so we couldn't contact them and have them fix it.

August 2025

S M T W T F S
     12
3 456789
10111213141516
17181920212223
24252627282930
31      

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 1st, 2025 06:20 pm
Powered by Dreamwidth Studios