Regarding "Seek Out Problems and Iterate", it's a bit of an understatement how i...

Dumblydorr · on Aug 19, 2019

Can you explain the 3pm on Tuesdays issue? My sister works for LLS and she said their servers get very slow at a precise time every Tuesday. Not saying it's the same bug, but what was the solution in your specific case?

codetrotter · on Aug 19, 2019

The next sentence suggested that the cause of the problem in this probably hypothetical situation might be “a cronjob kicking off and hogging disk IO for a few minutes”.

So in that case, I guess either run the job with a lower priority and see if that helps, or execute the job more often so it doesn’t have to catch-up all at once one time per week, or rewrite it so that it performs I/O with smaller chunks of data at a time and sleeps for a little while in-between reading or writing chunks of data. Basically, do something so that you no longer have this one huge job consuming all of the IO bandwidth for several minutes every week.

steventhedev · on Aug 19, 2019

I can't get into too much detail, but there were increased failure rates during a few jobs. In one case, we added ionice. In another it was a matter of adding a missing index to the DB (full table scan instead of looking at records from the last week).

There was one periodic job that we moved from the production server to work off the daily backups instead of the live server.

tempguy9999 · on Aug 19, 2019

Database doing some housekeeping or backup; virus scan; perhaps automated check for windows updates (patch tuesday is every 2nd tuesday of every month so prob not that); completely separate task fighting the DB or other application layer you sis uses. Something else. Anything else.

It's not something anyone can diagnose from what you say, it could be anything, even weirdness such as a hardware fault kicked off by something else (office cleaner plugging something in?) causing power spike RF interference affecting the network causing mass packet drops and retries (ok, unlikely but it's not impossible, I've heard of such).