Task Parallel Computing

There are two primary parallel computing techniques: task parallel and data parallel.  Some key aspects to consider if tasks can be run in parallel:
  • Are there dependencies between tasks?
  • Does the result of the operations change if you change the execution order?
  • Will there be contention for data or some other resource?
  • Will we get a return on investment for the overhead of configuring and running in parallel?
  • Do we have a significant number of tasks to run?  We should have more tasks to run than we have cores.

Batch processing

I created a BatchProcessor solution with a console application called BPRun.  This project runs a setup portion, the parallel tasks and then the teardown jobs.   The tasks to run are specified in corresponding configuration files.  I use code similar to this in production for batch processing ~20 DOS .bat jobs.  Being able to have jobs run in a parallel fashion has resulted in a significant time savings.

Setup and Teardown

The setup and teardown portions of the application are run in a serial fashion.  The items in the configuration files are executed one after another.  Comments may be placed in the file with // prefixing.

Task Parallel

The task parallel section executes the items in the corresponding configuration file.  As a task finishes the scheduler picks up and runs the next job on that process.  The code that does this is in a corresponding Processor class.

Tasks with dependencies

What if you have a task that has a dependency?  You can chain them.  That is, have the first process or task call the second or dependent task upon completion.  For instance, you may have a job that does some data prep and then chain it to call a task that does analysis.

Keeping Busy

On *nix systems we have the sleep command which we can use to pause for the given # of seconds.  However, this doesn’t place any load on the machine.  I wanted the machine to have a load placed on it by the tasks so that we could see that it is really taxing the machine.  To that end I created a small console application called Busy.  This takes the number of seconds to be busy as a parameter.

Download

You can get the source code for the Batch Processor and the Busy application here: https://github.com/k0emt/BatchProcessor

Video

See a video tour of the code for Busy and the Batch Processor along with demos of the applications here:

No comments:

Post a Comment