Designing command-line applications for scheduled execution
Command-line applications are still running amongst us. They are the backbone of large corporations. They are those faithful workers that converts the raw data into beautiful reports while we are asleep. They are those tireless gatekeepers that constantly help us check whether our mission-critical servers are alive and notify us when they detect any anomalies.
Part of what I do for my job is to ensure that those scheduled jobs in my department run smoothly, and in the event that they don't, run them manually to get the neccessary output for the day and make sure that they continue to run automatically for subsequent days. These scheduled jobs consist of running command-line applications hourly, daily or monthly.
To anticipate future needs, I had spent some time to come out with some guidelines on how I will design my command-line applications meant for scheduled execution.
Make the command-line application non-interactive
A non-interactive command-line application does not ask for inputs after it is executed. Any information that such applications need will be made known at the point of execution.
For instance, that command-line application that churns out electronic statements for all customers of the bank will know which database to extract data from and which month to base the statements on, when it is executed by the scheduler.
Well, perhaps we can write another program to interact with interactive command-line applications, but why make things so difficult?
Keeping the command-line application non-interactive reduces the programming effort required.
Do the work, record the outcome, terminate
Our command-line application should follow three major steps in its life cycle.
Doing the work
The work performed by our command-line application should be made up of codes that anticipate error conditions so that it can try to perform as much work as possible. For instance, that same command-line application should continue to generate the statements for other customers even when it had failed to generate the electronic statement for one of the customers.
In the event that crucial resources cannot be access, the application should also try to access them a couple to times before it declared its execution run a failure.
The probability of network failure is much higher than a local hard disk failure. Hence, our command-line application should also avoid writing any output directly across the network. Any output should be generated on the local hard disk first before they are copied to another location in the network. We certainly do not want to waste a night's worth of effort just because that shared drive was not reachable for a couple of hours in the night.
Recording the outcome
The application should record the outcomes of its execution steps in a file location close to its executing binary and preferably in a text file. This is to facilitate debugging when failed execution is detected via 1st level support. We should not need to open up the codes to know what had went wrong with the previous run of our command-line application. A text file can be opened up quickly, and we do not even have to write another application to read its contents.
The outcome traces should be accompanied with a when, a what and a why. A current timestamp will enable us to know when the application had encountered a particular error. A what will highlight the point in the code execution did our command-line application encounter the error. A why will give us the reason as to why our command-line application had failed.
An example of outcome trace can be as follows:
[2012-06-03 03:33:33] Could not access the database > Invalid user name / password combination
From the above outcome trace, we know that on 3rd June 2012 at around 3:33:33am (when), our command-line application was trying to access the database but could not gain access (what). The reason was that the database could not authenticate our command-line application based on the username and password that was supplied (why). Hmm, perhaps somebody had changed the password to the database, let's give a call to our database administrator to resolve this issue right away.
Should we log down successful outcomes? Yes, because we need to know whether our application had ran according to schedule. However, we may not want to provide a why in the outcome trace to prevent cluttering our log files.
How about those one or two failed attempts to upload the generated files to the ftp server before a successful upload? Log them, as they can be valuable signals for as to decide whether we should upgrade our system infrastructure.
Terminating
Our command-line application should always terminate at some point in time so that it doesn't hog system resources that may prevent other scheduled applications from running properly. Also, by convention, our application should return a non zero value when it does not run successfully. What is a successful run? An application that is considered to have run successfully, does not encounter any errors for its full stack execution steps. Can't get the electronic statement for just one customer? Sorry, we have to return a value other than zero.
Ability to terminate when a previous run was unsuccessful
Earlier, I mentioned that our command-line application should try to perform as much work as possible. However, this guideline depends on the nature of the work to be done.
For command-line applications like our electronic statement generator, pdf reports for different customers are independent of one another.
However, there are applications which contain action steps that are incremental in nature. An example will be an application that synchronizes the data set of system A to system B. And because both systems contain terabytes of data, it is not efficient to perform a full sync between both systems. For efficiency purposes, differences made to system A was recorded in a separate file for our application to read and apply to system B.
For such applications, if we miss the run for an earlier date we want to prevent any runs that comes along at later dates. Inconsistencies between the data sets of both systems may occur if we apply updates from a later date when updates from a previous date were not applied.
Ability to resume failed runs
There are many factors that could prevent our application from running successfully. The system which our application is running on could encounter technical faults. The harddisk could fail, the network cable could come loose, somebody in the server room may accidentally pull out the power cable and forget to push the power button of that black cold box after connecting back the power cable. The system could even restart itself automatically when our application was running.
As such, our application should always be ready to resume failed runs upon user manual intervention.
Our application should, as much as possible, save information needed to resume failed runs. Our application could choose to write such information as text to file, save and load such information as objects to file or even use SQLite to manage such information.
At the very least, it should accept optional command-line arguments from the user so as to rerun a failed schedule.
Include a --help option
Because of our due diligence in designing our command line application before we put it to production usage, we had not encountered any situations in which our application fail to run for the past four months. Out of the blue, one customer called in to complain that he cannot see his electronic statement for the month of May when he logged into our internet portal. Now how do we use our command-line application manually? To aid us in remembering how to use our command-line application, it is favorable to include some help information embedded within the application and make available via a --help option.
We may choose to place such information in a separate document. However, imagine that we have to remote to a server without any graphical user interface and we suddenly realized that we do not remember the syntax to use the command line application to generate the electronic statement by month and customer ID. We have to switch to our file explorer to read from the separate document.
Or perhaps we have a new hire in the company who needs to learn business as usual stuff on the job. Now, this seems to be a good opportunity for our new boy (or girl) to learn how to support our command-line application. Did I create some documentation on some word document in the five terabyte shared drive? Wait, I am pretty sure that I had created it a few days after I had deployed the command-line application. But where is it now?
Silence means all went well
Whereas we log all kinds of outcomes to a log file, it doesn't mean that we have to output the same set of information to standard output. We should avoid writing too much to standard output. Why? There is simply no point in doing so when we have those information recorded in the log file. Furthermore, writing to standard output will significantly slow down our command-line application. Printing one line of message to standard output just before our command-line application terminates is good enough:
Electronic statement for John was generated successfully.
or
Errors encountered. Please check the log file for more details.
That lot seemed to apply to manual executions of our application. However, since our command-line application runs automatically most of the time, we should program it to send us email when it encounters failures. We should avoid making our application send us email for successful runs because we tend to filter information that appears too often - one lesson that I learnt from 100 things every designer needs to know about people. Receiving an email everyday from our application saying that it had ran successfully consecutively for a few days will condition us to believe that it will run successfully forever. Furthermore, if we have to support 99 other applications which are equally "talkative", our mailbox will be flooded.
Do one thing and do it well
This is part of a quote that I always follow to make my programming life easier. Always keep our command-line applications from doing more than one main task. If we need to delete electronic statements that are more than a year old from our file storage, we should create a separate command-line application to do that, instead of putting the deletion logic into our same electronic statement generator.
Trying to make our command-line applications do too much things will make maintenance more difficult. We certainly should not have to mentally process codes that delete statements in order to find the place to edit our statement generator code to put in a new column for our electronic statements.
In fact, by focusing on that one thing that our command-line application do, we tend to do it well too.