Posted on May 12th, 2008 |
Over the weekend I ran into a problem with two user’s interactive sessions taking up 100% of the system resources. Usually the 400 is very good at managing run away processes but these particular jobs caused the system to stop responding to all user requests, even for new sign on sessions. The console was still operational and I was able to pull up WRKACTJOB and see the two jobs that were killing the system. I wanted to view the logs before I killed the processes and once I select a 5 to view the job my console also froze. I was dead in the water. After 20 minutes of waiting for the console to do something I caved in and performed a hard shutdown of the system by holding the power button which resulted in 30 damaged data queues that I had to manually recreate once the system came back up.
Lesson learned: put the jobs killing the system on hold before trying to diagnose them.
Posted on March 28th, 2008 |
I wanted to be more proactive when a job on our System i crashed so I started looking into options to send an email notification and SMS text message if something needed attention. After searching for a bit I came across a program on code400.com that was very close to what I wanted. This is an RPGLE program that uses the QUSLJOB API to list any job that is in *MSGW status and then send an email with the details of the job’s error using SNDDST. I made a few modifications to the code after trying it to better fit my needs. The program only had the capability to send to one email address yet I wanted to send it to an email distribution list on our Exchange server as well as send a text message to my phone for after hours support. The program also sent EVERY job in *MSGW, including any printer alignment messages. With my very basic knowledge of RPG I was able to add multiple email address variables and exclude any job that runs in the QSPL subsystem from being included.
Once the program ran to my liking I wrote a simple CL program that would call the program every 5 min and added the CL as an auto start job entry to our ITDEPT subsystem. I’ve had this configuration running for the past 6 months and it’s a great way to stay on top of any jobs that may be holding up QBATCH. Critical system jobs are usually fixed before end users even know something was broken.
Download the code here. The two variables for email are ’emailaddress’ and ’emailaddress1′. Simply change the variables to the email addresses you would like the messages sent to and compile.