MySQL has a nifty little clause called limit. When used in select statements you can limit the number of rows that come back in the result set. It has an optional parameter, offset, that lets you skip X number of rows in the result.
I love MySQL, not just because of its cost, or the fact that it is open source, or the entrepreneurial drive behind it, but because it works. I finally got sick of MS SQL Server few years ago when our application had so many deadlock problems and a consultant from Microsoft told us after a week of going through our app that SQL Server was not meant to be used in the “this fashion”. I will spare you the details of what “this fashion” is but suffice to say it was not “dumb code” or bad architecture. It was things that should’ve just worked.
Any how, so I use MySQL for all my applications now, and I write plenty of hardcore applications. The amount of work and number of users I can support with a single machine is mind boggling for those used to setting up server clusters and farms to make things responsive in MS SQL or Oracle environments.
Limit is one of those MySQL clauses that I use a lot. Most often I use it to setup paging, instead of retrieving a complete result set and then looping in code to skip to the required page, like many Grid Views do by default, I just retrieve the specific set of rows in my page and have MySQL do the heavy lifting for me.
Another use of Limit is when i want to throttle processing. For example, one of my systems sends out email notifications for our work flow. We have thousands of active users on our system and thus end up with thousands of notification emails going out every day. Since our servers do so much, I throttle the emails, sending a batch every X number of minutes. There is a bunch of logic to process email priorities and retries. To throttle the emails, I just grab the top X number of emails in each queue, but using the Limit Clause.
If the email succeeds, then I mark the message as successful. If it fails then I retry it once, and then retry it twice, before finally deeming it as failed. This all worked well for a long time, till all of the sudden we started to realize that some important emails were not being sent out. Many of our emails are very time critical per the nature of the business. I started to look into the queue and realized that some other programmers were taking the liberty of putting regular notification messages in the highest priority queue. So I initially thought that we were just clogged with too many emails in the queue. I moved the messages to a different queue and everything ran well for a day or two, and then we started to miss important emails again.
Further investigation shows that it was a comedy of errors that results in this scenario. As I said, we try to send our emails couple of times before we mark it as failed. The general logic is like this:
Setup the Mail subject, recipient and carbon copies.
Try to send the message.
If success then mark the message as sent.
If Exception then mark the message as retry, or failed, if it had been retried multiple times.
All looks good to me. But why is the email system getting clogged. Turns out that somebody set up their email address in the interface by copying/pasting from another application. When he pasted, a Carriage Return character came along for the ride. So the email address was saved with a CR at the end of it.
When trying to setup the email in the code above, we setup the subject and recipients outside of the Try/Catch. So every time we setup the recipient, we had an exception because his was not a valid email address. We do a sanity check on email addresses, but a trailing CR is not one of them.
Since the exception triggers before our Try/Catch block, the email message is not marked for failure. The external calling function then catches the exception, logs it and moves on with life. But the message remains in the queue as if it has never been processed before.
Since we throttle the email processing, using the Limit clause, we only get X number of messages to process per run. When we had the first email for this person, his message kept coming back every time, and it was never processed due to the issue above, so we only got to process X-1 messages per run. When he got another message in the queue, we ended up processing X-2 per run. Every time we got X messages to process, two of them were his, they failed without being marked as so, and thus took two spots in the queue.
Eventually he had enough messages queued that all what we were getting every time was his messages. So we never processed any more emails from that queue. When dealing with the immediate problem, I initially thought that we simply got so busy we needed a bigger queue, bigger fire hose if we will. So I upped the X dramatically and emails started to flow again.
Once the immediate problem was resolved I moved on with life. But that night, I woke up in the middle of the night with a nagging feeling. I said that the calling function was catching the exception that we failed to catch in the mail send routine and logging the exception. I saw the messages in the logs but did not realize what was going on. I just assumed that indeed we had people who put in a bad email address and forgot about it. But at night, that log file kept showing up in my dreams. I could not go to sleep and let it go.
So, I got up, fired up my computer and started going through everything with a fine comb again. And that’s where I realized what’s going on. It was not really the fault of the “limit” clause, which had served us very well thus far in scheduling our resources. It was a combination of things that got together at the “perfect time” and caused a perfect storm. Or, a Murphy’s Law moment as I like to call them.
We missed few emails, but we had managed to use a band-aid for the immediate problem, and then we had the real fix in within 12 hours. Now I sanitize the email addresses more vigilantly and we put the whole email setup in the inner Try/Catch block. If the message fails for any reason, we mark it as so, and do not let it clog our system.
This sort of thing happens all the time. In the worlds of Speculating, dumb people call it a Six Sigma event, or more recently A Black Swan event. What they try to say is that, per their analysis, such a thing would happen only once every million years. They were so unlucky that this once happened on their watch and thus their Hedge Fund blew up, or their CDS portfolio crashed.
Truth of the matter is, in programming, much like in speculating, there are no Black Swans. There are no Six Sigma Events. These things happen all the time. You can either understand that, take responsibility and manage your risk. Or you can destroy other people’s fortunes and lose Other People’s Money because you refused to manage risk and instead blame the occurrence on a once in a million years event.
Most people that read Taleb’s books, The Black Swan, and Fooled By Randomness, completely and utterly miss the points of his books. I am sure he finds it humorous that so many people waste their time reading his books yet completely missing the point. Most people believe Taleb is talking about Black Swan Events. They do not realize that he is telling us that all those events that we mismanaged risk for were NOT BLACK SWANS. His point is that we let ordinary, and even frequent, events crash us completely due to lack of proper risk management. He is not describing Black Swans… He is pointing out that these events are indeed not black swans.