The rise of MPP platforms — Comparing SMP to MPP Architecture

  1. In SMP every processor shares a single copy of the operating system.
  2. SMP architecture is a tightly coupled multiprocessor system
  3. SMP grows by buying a bigger System
  4. In SMP resources like bus, memory, and an I/O system are common/shared.
  1. Network speed: Since all the components sit in the same server, there is no latency at all
  2. Enforcement of constraints: Primary, foreign key, unique constraints are easily maintained when all the data sits on the same server
  3. Seamless integration of components: Due to the tightly coupled CPU, memory, I/O, we have fewer failures in a single server.
  4. Data consistency: We can implement the ACID property of Relational Databases in full effect. Data written to the database is validated in real-time for validity, incorruptibility, and integrity. That is why we can implement Database triggers
  1. Performance: Even with the best data model, most SMP Relational databases struggled to scale with the data growth. There is only so much CPU you can add to a single server. This constraint hinders your performance
  2. Scalability: We can only scale up in a limited fashion. Any data size of more than 15TB is at risk of major performance issues. There is also unavailability of single hardware that can host a petabyte of storage.
  3. Cost: the more you spend buying faster processors and bigger memory, the more expensive the system became.
  4. Single point of failure: If one processor fails, the entire server can become unusable. This creates a maintenance nightmare
  5. Elasticity: If I want to add more storage, the chances are the entire server needs to be brought down and a lot of work needs to be done to reconfigure the hardware and software.
  1. Great as a Database for monolithic applications
  2. Maintenance of data integrity and constraints is a necessity
  3. Data size is less than 4 TB
  1. MPP supports shared-nothing Architecture
  1. Performance: The speed of computation grows linearly. The more nodes you have, the faster it is to perform aggregations and computations on the entire dataset.
  2. Scalability: We can scale out in an unlimited fashion. By adding more nodes to our architecture we can scale out our MPP database to store and process larger volumes of data.
  3. Cost: We don’t need to buy the most expensive hardware to accomplish the task. Since we are adding more nodes, it is easier to handle more data with less expensive hardware
  4. Single point of failure: If one node fails, other nodes can still be functioning and support the database activities while it is being maintained
  5. Elasticity: If I want to add more nodes to the database, it is easy to do so without making the entire cluster unavailable.
  1. Network speed: Since all the nodes are connected with a network fabric, this introduces some latency, though it is minimal
  2. Enforcement of constraints: Primary, foreign key, unique constraints are not maintained because the nodes are not sharing the same pieces of data. There won’t be a way to validate the data integrity easily.
  3. Seamless integration of components: Sometimes we experience more network or system failures due to the multinodal configuration of the MPP architecture.
  4. Data consistency: in MPP we trade immediate consistency for the partition tolerance. Therefore we may not guarantee all the nodes processed all the data at the same time due to possible network issues.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AfroInfoTech

AfroInfoTech

178 Followers

I am passionate about empowering and encouraging people of color in the data analytics career path