Wednesday, March 02, 2011

Parallel, Concurrent, and/or Distributed Computing?

First there's a bit of confusion (or lack of clarity) in the computing space with the terms "concurrent computing", "parallel computing", and "distributed computing". They do overlap, and there are some distinctions.

Concurrent computing describes the process with which multiple workloads get operated on at the same time. For example, your computer maybe burning a DVD, browsing the web and doing virus checking at the same time. All these workloads are running concurrently - making visible progress from the user's perspective during any given second. But underneath, the workload may be time multiplexing sequentially on the same core in the same processor in the same system, or running on multiple cores or multiple processors or multiple systems at the same time.

Parallel computing is a subset of concurrent computing, where the multiple workloads really are operating on different hardware resources at the same time. The different resources could be different cores or different processors or different systems.

Distributed computing is a subset of parallel computing where the execution takes place on computer systems that are distributed. This is usually motivated by the distributed nature of the location of data, where the computation can take place close to where data is located.

Why should we care about the similarities and differences?

Highly parallel processors such as the modern GPUs are providing the same general-purpose computation capabilities to run business-critical applications at only 1/20th the power and 1/10th the cost.

Development environments for parallel computing have become more accessible. For example, development environments for the GPUs (CUDA and OpenCL) are based on the C/C++ development environment, which allows GPUs’ computational capabilities to be easily accessed by those seeking acceleration solutions for compute-intensive applications.

Distributed computing is easier with availability of Linux and Hadoop, both excellent examples of free and open source software collaboration. Hadoop, a software framework, enables reliable, efficient, and scalable distributed manipulation of large amounts of data (think in TBs and PBs). Hadoop runs well on Linux as a production platform, which can be widely installed on commodity hardware.

The increased capabilities, flexibility, and accessibility of GPUs and Hadoop are fueling the next level parallel and distributed computing innovations to solve compute intensive and/or data intensive problems faced with both increasing throughput and reducing latency requirements.

What new opportunities do you see?

No comments: