This month the T-SQL Tuesday invitation is from Rick Krueger (B|T), who want us to dust out the closet and tell how how we escaped from the jungles of Borneo, swam across the vast Indian ocean and saved the tiger with nothing but a few clips and rubber band a.k.a Rube Goldberg Machine.
While working for a Big blue box electronic retailer, we had to support a few SQL 2000 instances. And we had to figure out a way to accurately determine if certain jobs had finished or were still executing. There were other jobs that needed to be put on hold until the job of interest was executing.
The constraints placed were:
- Conclusively determine if the job has started execution.
- Without the use of undocumented features we should be able to determine the current execution state of the job.
- Once a job’s current state is determined, we should wait until the state changes.
- Depending on the execution state, we should decide either to wait for a further delta minutes and check the state or determine if the job finished successfully.
- Return the final result to the calling code block.
This is how I went about it:
- Wait for a delay of 1 minute before even checking.
- Checking if we actually had a running job, was done in a roundabout way. The last step_name of the previous execution of the job always contains ‘(Job outcome)’ in sysjobhistory table. Because of this, we can pick an instance_id of this job that has an instance_id greater than the MAX(instance_id) when the step_name was ‘(Job outcome)’.
- I had to BCP the results of sp_help_job to a file because I kept getting ‘Error Message:Msg 8164, Level 16, State 1, Procedure InsertIntoSp, Line 10’ – “An INSERT EXEC statement cannot be nested.”
- Use BULK INSERT to load data into a temp table.
- Check if the execution status of the job was 4 i.e. waiting for next scheduled execution.
- If the status was not 4, then wait for a minute and then repeat steps 1-5.
- If the status was 4 then move to next stage.
- If we find that the job completed successfully then validate with sysjobhistory and check if the job really succeeded. Didn’t want false positivies.
- If we did not find a job executing on Step 2., return a non-zero value.
This whole thing, was put in a store procedure and then called from a job step for a long running job. For example, like a database backup job. And additional logic was written around the return value to write proper error numbers to application log. These event-ids where then picked up by monitoring applications and alerts where raised.
I have mixed feelings about this solution. Although, it worked without a problem (I am secretly proud that I was able to write something like this); I would have loved to just upgrade the servers to SQL 2008 and be done with it.