Data Parallel Computing

Data Decomposition Parallel computing works by using a shared data object with multiple “processors” working on it. image

the toy problem

To learn more about Data Decomposition type parallel computing I created a toy application. This entailed creating a demo application that utilized a basic stopwatch instrument, a serial computing class and a parallel computing class.  The application will perform row or column based operations on an 8K by 8K array.

Since I needed to time runs I created a basic stopwatch class. This class provides for starting the stop watch, checking elapsed time and stopping it.

the serial code

I created a serial computing class that initializes an 8K by 8K array of integers and has methods for computing across rows and computing down columns. I created these two methods so that I could see potential differences in working with the array via row or column.

The serial class is used as the baseline which the parallel class is compared against.

I wanted to make sure that the results from the serial and parallel calculations were the same. Therefore, I added methods to the serial class that can be used to check that the parallel results match.

the parallel code

Because the implementation language was Java, the parallel code class extends the Thread class. The Thread class works by using a run() method that takes no arguments. To cope with this, I use a private constructor to determine which method to run. The parallel methods create a new instance which takes arguments for if the row or column method should be called and the start and end value of the domain. These arguments set internal variables which are then used within the run() method.

Psuedo-code for the parallel computation:

computeRowsDomainLimited(int startRow, int endRow)


Partsize = size / # threads

Remainder = size % # threads


For o = 0 to < # threads


Start = o * size

End = start + partSize



computeRowsDomain(end, end+remainder)


Set up threads with the above

Then run them

Then join/wait for all to finish



The development machine was a powered by an Intel I7-720 with 6Gig of RAM.  The baseline serial methods and the parallel methods of varying partition/thread size were each run 10 times.  The best run time is used in the calculations.



The code seems to scale well and perform well for the row based computation up to 8 processing threads. Then it seems to drop off a bit and then gradually taper off in performance. I think this corresponds to the machine having 8 logical cores. After 8 the overhead of the additional threads is greater than any potential benefit.

However, the performance the columns seems to peak out around 4 or 5 threads. This could be because of the CPU having 4 real cores with Hyper-Threading. Or it may be that the overhead of accessing non-continuous memory is the determining factor.

Source Code

Source code for the demo application that utilizes the stopwatch class and both the serial and parallel computing classes can be found here:

In the future I will implement the same computations using OpenCL.

Shim it!

The lock on my front door was tough to turn.  I was afraid to turn the key too hard for fear of snapping it off.  I figured out that if I lifted the handle up that allowed the bolt to properly seat.  I noticed where the bolt was rubbing against the striker plate and the cut out in the door frame.  I bored out the opening in the wood slightly.  Minimal improvement, but still not satisfactory.  I tried moving the striker plate.  I didn’t want to force it too much since the screw holes would have to be filled and re-drilled for any major adjustment.  I even tried moving the locking mechanism to alter the angle of entry into the striker plate.  I did notice that the whole door seemed like it was a bit out of square with the frame.

Then it occurred to me to place a shim under one of the hinges on the door to adjust the whole door.  This worked!  The whole door is now square in the frame and it has a better seal with the gasket.  No daylight.


Have you been tweaking, altering and adjusting parts of your system when a shim could be place in the right place and make the whole system work properly?  Perhaps, a strategically used adapter pattern would make the whole application easier to code?  Maybe some small tweak in the UI could lead to a more enjoyable experience through out the whole system?

Hot on the edges, raw in the middle

Use the microwave at 100% to cook or reheat and you can end up with something that is hot on the edges or the outside, yet raw in the middle.

However, if you utilize the manual settings, like 60% power level with more time you will find that the food is more evenly cooked.  Or better yet, consult the manual and use the correct settings for the food type and quantity.

In the software development processes, breathing time/thinking time is good.

100% full bore gets you done.  However, when you have time to think about the system you may experience remorse.  To extend the microwave analogy--perhaps,  you have a nice shiny UI but rotten messy code under the hood.  With an appropriate “cooking temperature” you’d have both a nice UI and good code under the hood.