Parallel Databases/Parallel Processing in Databases

Parallel processing is very useful in database management. The parallelization of database operations enable the database related activities to speed up and allow faster responses as more transactions are carried out per second. In Parallel Databases the processing tasks are efficiently divided. In sequential processing, the query is executed as a single task while in parallel processing it is divided among smaller number of tasks to different processors.

For example if tasks 1, 2 and 3 are assigned to be run for databases. In sequential processing, each task will wait for the previous task is run as it was assigned to a single processor. But in parallel processing all the processes will be run on their own processors. Here, the wait time constraint is not present. So less amount of time will be elapsed to run tasks in parallel. So, a multiprocessor is involved in parallel processing where the CPU power is distributed over various processors.

Key Concepts – Parallel Databases Processing

A hardware architecture that allows multiple processors to share access of data and storage is required to carry out parallel processing in databases. Parallel servers can also be used to access a single database to multiple users. A master node distributes a stream of transactions to different nodes that have different CPUs. It also provides concurrent access to data and protects the data integrity.

Simultaneous operations are performed in parallel processing. Parallel machines conduct parallelization processes. These can be classified as follows:

Coarse grain parallel machine- Contains small number of powerful processors
Massively parallel/fine grain machine-Uses thousands of small processors

Coarse grain parallel machines are commonly used in many organizations today. A machine that uses large transactions on databases can improve its response time as well as throughput by using parallel processing. Please note that parallel systems contain various components such as memory, disk, processors etc. All of which can be shared with the help of interconnected networks.

Performance parameters of Parallel Databases Processing

Speed-Up

Speed up can be defined as the extent to which hardware can perform same task in less time when compared to the original system. It can be measured using the following equation:

Speed-up=Original Processing Time/Parallel Processing Time

Where,

Original Processing Time is the elapsed time is the time spent on the original/old system.

Parallel Processing Time is the time elapsed by the larger and parallel system for the task.

Scale-Up

Scale-up can be defined as the capability of a parallel databases system to perform job when compared to the small system. If the transaction volumes grow and we have a good scale-up, the response time can be kept constant by adding additional nodes/hardware for parallel processing of databases.

It can be measured with the help of the following equation:

Scale-up= Parallel Processing Volume/Original Processing Volume

where,

Parallel Processing Volume is the transaction volume processed in a given time on a small parallel system.

Original Processing Volume is the transaction volume processed in a given time on a large parallel system.

Locking

Many locking techniques are used to enable synchronization of tasks that may be required for parallel processing. Locking is basically a way to synchronize tasks. Messaging and Locking between nodes are handled by the Distributed Lock Manager in most of the database management systems. DLM is used as an external locking facility which is present at the OS level. It coordinates resource sharing among several nodes. It is also used to coordinate any modifications in the parallel databases resources.

Synchronization

The coordination of several concurrent tasks in parallel processing is known as Synchronization. The synchronization must be less for the scale up and speed up to be maximized. A success parallel processing system will have least synchronization. High speed of communications is required for successful parallelization. So, the synchronization overhead can be very expensive. Amount of synchronization is also dependent on the amount of resources and the number of users and tasks. If more time is being spent on synchronization of resources, it can diminish the benefits of parallel databases.

Messaging and communication

Parallel Databases Processing System must be fast and efficient to communicate between several nodes. So we need low latency and high bandwidth to communicate with DLM. Latency is the delay between starting an operation and getting its appropriate response.

Benefits – Parallel Databases

Great performance with efficiency
High Flexibility for allocation and de-allocation of instances
Less Time consuming
Can be used by more users at the same time

admin