The 4 Types of Data Scales
In a lot of BI tools, numbers are often treated uniformly.
At first glance, this might seem efficient—after all, numbers are numbers, right? However, this can lead to some perplexing and downright nonsensical operations. For instance, you might apply a SUM aggregation to fields like user_id
or phone_number
. Not only does this make little sense, but it also introduces errors and shouldn’t even be an option.
So, why does this happen?
The problem is how we classify numbers in data analysis.
Numbers aren't all created equal; they fall into four distinct categories known as data scales. Once you understand these four types, you might realize you’ve probably made wrong calculations before, and you might have used the wrong kind of number for the job.
In this post, we’ll go through these four data scales. We’ll see why treating all numbers the same leads to errors. And we’ll learn how to handle each type correctly to make your data work for you.
Before going further, here's a quick table to sum things up.
Data Scale | Definition | Ordering | Equal Distance | True Zero Point |
---|---|---|---|---|
Nominal | Number used as labels. Cannot be ranked or compared. Although it is number, it should be treated more like string. Examples: Phone numbers, Postal Code, SSN, User ID | ❌ No | ❌ No | N/A |
Ordinal | Numbers used to order categorical values. Can be ranked (ordered). Intervals between the ranks are not equal or measureable. Examples: Education Level, customer rating, pain scale, etc. | ✅ Yes | ❌ No | N/A |
Interval | Used as measures. Intervals between ranks are equal. Hence can be measured the difference. No true zero point. Cannot compare using ratios. Examples: Temperatures (F, C), test scores (SAT, IELTS), credit scores, pH levels, time of day, day of month | ✅ Yes | ✅ Yes | ❌ No |
Ratio | Used as measures. With true zero point. Can be compared using ratio (proportions). Examples: heights, weights, Kelvin (K) temperature, population, money, time duration | ✅ Yes | ✅ Yes | ✅ Yes |
1/ Nominal scale
Think about phone numbers (e.g. 555-2368).
They’re definitely numbers, but it would be absurd to sum them, sort them, or perform arithmetic operations like addition and subtraction on them. They’re merely labels, categorical values that happen to be represented as numbers in your database records.
This is the first type of data scale: the nominal scale. I like to call them 'fake numbers', because they cannot be ranked or compared mathematically. They are used as labels, and are better thought of as "texts that happen to carry numerical values". In fact, people recommend storing phone numbers as 'varchar' in your database.
Other examples of the nominal scale include randomly generated IDs, Social Security Numbers (SSN), or numbers representing categorical values like gender, color, or marital status.
2/ Ordinal scale
Now, think about a survey response where the options are typically "very dissatisfied," "dissatisfied," "neutral," "satisfied," and "very satisfied," which usually map to a number from 1 to 5.
In this case, it's clear that 1 < 2 < 3 < 4 < 5. However, can we say that "2 – 1 = 3 – 2"?
In other words, are the intervals between them equal?
If you say yes, think again. There's no basis for claiming that going from "neutral" to "satisfied" is the same as moving from "satisfied" to "very satisfied”. In fact, any distance calculation between these points is entirely arbitrary.
This introduces the second type of data scale: the ordinal scale. Ordinal-scale numbers are used to denote ranking (ordering) between categorical values. They can be ranked, but the intervals between them are neither equal nor measurable.
Other examples of ordinal scales include education levels (e.g., Grade 1, Grade 2, Grade 3), product ratings, and pain intensity levels.
3/ Interval scale
Let’s take Celsius temperature as an example.
Can I say that "60 ºC – 50 ºC = 50 ºC – 40 ºC" (which equals 10 ºC)? Yes, absolutely.
However, can I say that 60 ºC is twice as hot as 30 ºC? Not quite. Why? Because Celsius temperature does not have a true zero point. In other words, 0 ºC does not represent a complete absence of energy or heat.
Other examples of interval-scale numbers include Fahrenheit temperatures (°F), time of day (1 am, 2 am, 3 am), and days of the month (Day 1, Day 2, etc.).
Interval-scale numbers are numbers that can be ordered with equal intervals between them but lack a true zero point. Hence, they can be compared using differences (subtraction) but not ratios.
This brings us to the last type of data scale…
4/ Ratio scale
Well you guess it right. Ratio-scale numbers are the most familiar type of numbers you deal with as a data analyst every day. It's money, transaction counts, time durations, and population sizes.
These numbers have a true zero point. For instance, a population of zero means no one, $0 indicates no money, and 0 K (Kelvin) represents absolute zero energy.
You can compare them, subtract them, or calculate ratios between them ($1M is quite as rich as $500K).
What does it mean for me as a data analyst person?
It's simple: Don't perform silly calculations without knowing what data scale you're dealing with.
Don't say one phone number is bigger than another. Or red is smaller than blue. Or try to sort them.
Don't subtract two satisfaction survey ratings. Or sum them.
Don't say your IELTS score is 2x better than your brother's.
The only time you can go wild is when you absolutely know you're working with ratio-scale numbers.
Happy analyzing!