There are two primary parallel computing techniques: task parallel and data parallel. Some key aspects to consider if tasks can be run in parallel:
- Are there dependencies between tasks?
- Does the result of the operations change if you change the execution order?
- Will there be contention for data or some other resource?
- Will we get a return on investment for the overhead of configuring and running in parallel?
- Do we have a significant number of tasks to run? We should have more tasks to run than we have cores.
Batch processing
I created a BatchProcessor solution with a console application called BPRun. This project runs a setup portion, the parallel tasks and then the teardown jobs. The tasks to run are specified in corresponding configuration files. I use code similar to this in production for batch processing ~20 DOS .bat jobs. Being able to have jobs run in a parallel fashion has resulted in a significant time savings.
Setup and Teardown
The setup and teardown portions of the application are run in a serial fashion. The items in the configuration files are executed one after another. Comments may be placed in the file with // prefixing.
Task Parallel
The task parallel section executes the items in the corresponding configuration file. As a task finishes the scheduler picks up and runs the next job on that process. The code that does this is in a corresponding Processor class.
Tasks with dependencies
What if you have a task that has a dependency? You can chain them. That is, have the first process or task call the second or dependent task upon completion. For instance, you may have a job that does some data prep and then chain it to call a task that does analysis.
Keeping Busy
On *nix systems we have the sleep command which we can use to pause for the given # of seconds. However, this doesn’t place any load on the machine. I wanted the machine to have a load placed on it by the tasks so that we could see that it is really taxing the machine. To that end I created a small console application called Busy. This takes the number of seconds to be busy as a parameter.
Download
You can get the source code for the Batch Processor and the Busy application here:
https://github.com/k0emt/BatchProcessor
Video
See a video tour of the code for Busy and the Batch Processor along with demos of the applications here:
No comments:
Post a Comment