OLTP vs OLAP
Let’s say you decide to build a Facebook clone. You and your roommate grind away for a few weeks to get the application up and running. Everything looks great, you’ve got over 100 users (including your crush from the introduction to biology course). Then one day, the power goes out. You start the application again, only to find out all of the data is gone. You do some research and find out that storing the user data in a Python dictionary probably wasn’t the best idea. So, you make use of a relational database to persist your data. Fast forward a few months, and your user base has grown exponentially. You reach out to your friend who’s majoring in marketing about potentially placing some ads on your website. The friend in question goes on about the fact that you should select the advertisements based on the users’ demographics. As the saying goes, there’s no point of placing an ad for prune juice if all of your users are poor college students. Simple enough, you tell yourself, I’ll just run a SQL query to get the average age. A minute passes… Then another, and another… You’re just about to give up when you receive an angry complaint from a user about how it’s taking them forever to upload a picture of their cat. You decide that the simplest thing to do is get a bigger server. Fast forward another few months, and you’re trying to perform more sophisticated queries (some of which take hours), populating multiple dashboards, and the like. Your website’s speed has been reduced to a crawl, and you’re at risk of losing all your users.
Fortunately, your boy Martin Fowler, comes to give a talk at your school. You approach him afterwards, and ask for his advice.
No, padwan, you’re going about it all wrong…
Most EnterpriseApplications store persistent data with a database. This database supports operational updates of the application’s state, and also various reports used for decision support and analysis. The operational needs and the reporting needs are, however, often quite different — with different requirements from a schema and different data access patterns. When this happens it’s often a wise idea to separate the reporting needs into a reporting database, which takes a copy of the essential operational data but represents it in a different schema.
Now, I understand why we’d need a separate database, that is, queries run against the reporting database don’t add to the load on the operational database, but why a different schema?
The infallable sensei replies:
In practice, storage is cheap, but the analyst’s time is expensive.
In the example above, the operational database, which houses the data used by the Facebook clone, is an example of a OLTP system, whereas the reporting database is an example of a OLAP system.
Online Transaction Processing
Online Transaction Processing (OLTP) is characterized by a large number of ACID compliant CRUD (Create, Read, Update, Delete) transactions.
The performance of an OLTP system is measured in terms of the number of transactions per second.
OLTP systems can be used to store things like purchases, bookings, or, like in our example, user posts.
If you’ve ever taken an introduction to databases course, chances are you were designing databases to support OLTP transactions. This is because, in practice, unlike OLAP systems, OLTP database schemas are in 3rd Normal Form (3NF).
Row oriented databases are well suited for OLTP systems. Suppose that a customer ordered a pair of socks from Amazon. We’d need to create a new entry in our Orders table. Assuming that we were using a hard drive for secondary storage, because the data is stored in a row oriented database, we could consecutively write every column of the row (i.e. order number, quantity, etc…) to disk. Thus, avoiding lifting the head to write the data in another sector, which would be the case in a column oriented database.
Online Analytical Processing
Online Analytical Processing (or, OLAP, for short) is characterized by a relatively low volume of read only transactions.
We quantify an OLAP system’s performance by query response times.
The data within OLAP systems is derived from OLTP systems. In other words, it’s not uncommon to create new tables by querying the OLTP database at predefined intervals.
OLAP database store their data using star schemas. Without going into too much detail, star schemas are not in 3rd Normal Form (3NF) because the main concern in OLAP systems is making it simple (i.e. require less joins) to analyze the data.
Columnar databases are well suited for OLAP systems. Suppose an analyst wanted to determine the total quantity of products sold in the month of May. Unlike in a row oriented database, which would require us pass over the order records in their entirety, the needle of the hard drive could quickly scan all the timestamps (since they’re stored on disk consecutively) and select the ones with a month equal to May. Similarly, the needle could stictly read the values in the quantity column adding them up as it goes.
Let’s bring it all together, we discussed 4 core differences between OLTP & OLAP systems:
- OLTP: Simple CRUD
- OLAP: Complex aggregations
- OLTP: transactions / unit of time
- OLAP: unit of time / number of records
- OLTP: Row oriented
- OLAP: Column oriented
- OLTP: Entity model in 3NF
- OLAP: Star schema
- Martin Fowler’s ReportingDatabase post