The rise of MPP platforms — Comparing SMP to MPP Architecture

The motivation for writing this is to explain the major difference between SMP and MPP platforms. I will also explain their appropriate use cases, pros, and cons.

Symmetric Multi-Processor (SMP) Architecture

Symmetric multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes” reference

Reference: http://shahfaisalmuhammed.blogspot.com/2015/12/SMP-versus-MPP-architecture-.html

Let’s start with some history on analytical databases

Relational databases (ex SQL Server, Oracle, DB2) were used both as an Online Transactional Processing (OLTP) database (to support applications) and Online Analytical Processing (OLAP) database (for the analytical use cases).

By the way, the main difference between OLAP and OLTP is that in OLAP we build a data model that is denormalized (facts, and dimensions) in a star or snowflake schema, while OLTP data model is normalized (3NF at least) using reference, transaction, and bridge tables.

In those days, organizations that desire to improve the performance of their analytical database have only one option which was to scale up by adding more CPU, RAM, and I/O disk storage.

Characteristics of SMP architecture

  1. In SMP every processor shares a single copy of the operating system.

Benefits of SMP architecture

  1. Network speed: Since all the components sit in the same server, there is no latency at all

What are the issues with this SMB architecture?

  1. Performance: Even with the best data model, most SMP Relational databases struggled to scale with the data growth. There is only so much CPU you can add to a single server. This constraint hinders your performance

When to use an SMP Database

  1. Great as a Database for monolithic applications

Massively Parallel Processing (MPP) Database Architecture

An MPP database is a database that is optimized to be processed in parallel for many operations to be performed by many processing units at a time.

MPP (massively parallel processing) is the coordinated processing of a program by multiple processors working on different parts of the program. Each processor has its own operating system and memory.

Imagine I wanted to count the number of pages in a book, I can achieve this quickly if I split the work according to the chapters in the book, and assign each chapter to a node. Each node will perform its count of the chapter and send the result to a parent node for accumulation. All these are done in parallel. You can also see the power this gives us because we can now scale our processing power by adding more nodes to the machine.

Characteristics of MPP Architecture

  1. MPP supports shared-nothing Architecture

2. In MPP each processor works on a different part of the task.

3. Each processor has its own set of disk

4. Each node is responsible for processing only the rows on its own disk

5. Scalability is easy by just adding nodes

6. Data is Horizontally Partitioned with huge compression ability

7. MPP processors communicate with each other using some form of messaging interface

8. In MPP each processor uses its own operating system (OS) and memory.

Advantages of MPP Architecture

  1. Performance: The speed of computation grows linearly. The more nodes you have, the faster it is to perform aggregations and computations on the entire dataset.

Disadvantages of MPP

  1. Network speed: Since all the nodes are connected with a network fabric, this introduces some latency, though it is minimal

Use Case

The best use case is for modern large scale Data warehouses for analytics.

Examples of MPP Database: Snowflake, Azure Synapse, Netezza, Teradata, Redshift

I will delve into the various differences in architecture between some of the most popular MPP databases in a later post.

--

--

I am passionate about empowering and encouraging people of color in the data analytics career path

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AfroInfoTech

I am passionate about empowering and encouraging people of color in the data analytics career path