A more readable alternative to SQL’s CASE expression
Abstract
Introducing SQL’s function decode
as a more readable and maybe more performant alternative to the case-expression, usable when the conditions all have a certain structure.
For a certain kind of case-when-then-expression, you can use the SQL function decode
as an alternative that I find more readable and that I prefer whenever I can.
Granted, the case
expression is much more flexible than decode
and can thus be used under many more circumstances, but it’s exactly the limits of decode
for which I immediately valued it once I first heard of decode
. The restrictions of decode
mean that whenever I see decode
being used, I know exactly what awaits me reading the coming code and, more importantly, what cannot await me. Whenever I see decode
, I can be certain that it’s that particular use case.
At my workplace, I’m the code owner of much of our SQL code and I thus review a lot of SQL code that was written by other people. After I learned about SQL’s decode
function, I taught it to my colleagues and am always grateful when they use it.
As a data engineer, part of my job is to tidy up raw/unprocessed/"dirty" data.
Example#
I review many merge requests that contain SQL code like this:
select
case
when is_active = 'true' then true
when is_active = 'false' then false
else null
end as is_active
from some_table
The logic is not the point, it’s more about the structure/syntax of the code. In this case-expression, we examine one (and only one) column, is_active
, and look out for certain values it might contain. For a set of pre-defined values, we replace the original value with a different value. In this particular example, it bothers us that the column is_active
is of type text
rather than of type boolean
, and we want to fix that. The three expected values would be 'true'
, 'false'
and null
. There could also be unexpected strings in that column. As I said, the logic isn’t the point, and there are several approaches to achieve this particular goal. The point in this (maybe contrived) example is to replace the values of one particular column with other values. We want to use the/some values of the given column as "keys" of a pre-defined dictionary and replace all values in the given column either with the according "dictionary value" or a default value (null
in this example). Some values of a given column serve as code for another value, similar to an identifier, and we want to de-code that column.
I could replace the code above with this (in my opinion more readable) code:
select
decode(
is_active,
'true', true,
'false', false
) as is_active
from some_table
I know I could have solved that particular problem in different ways. To show you the true value of decode
, I’ll show you another example, and then I’ll explain why I like decode
so much.
I also didn’t specially handle the else null
case, since null
is the default and fallback. I could have explicitly handled the "default default" and write this equivalent, more verbose code instead:
…
'false', false,
null
) as is_active
…
Another example#
Consider this code:
select
case
when importance = 1 then 'high'
when importance = 2 then 'medium'
when importance = 3 then 'low'
else 'unimportant'
end as importance
from some_table
We have a column, importance
, that obviously contains numbers, and we want to map those numbers to (pre-defined) strings.
Personally, I would write this equivalent code instead:
select
decode(
importance,
1, 'high',
2, 'medium',
3, 'low',
'unimportant'
) as importance
from some_table
How decode
works#
The arguments the function decode
expects can be grouped into basically three groups:
- the column to be examined
- at least one pair of a pre-defined and a pre-defined where will be substituted with
- an optional default value that will replace the column’s value if it’s not one of the pre-defined values and will be
null
if it’s not specified
Why I prefer decode
whenever it can be used#
When decode
is used, I (as a reviewer) no longer have to pay special attention to the when
part to check whether the condition consists of something more than the basic comparison of the column value with a specific value. As soon as I see the keyword decode
, I no longer have to remain alert to the possibility that the condition could surprise me; that there could come a case that would require tricky special logic that I could miss if I don’t keep a watchful eye.
The flexibility of the case-statement allows that logic which looks simple at a first glance could contain a twist; that one of the conditions could, for example, be importance = 3 and some_other_column is true
instead of just importance = 3
like all the previous conditions; or that a condition could be entirely unrelated to the column importance
. When decode
is used, I can be sure from the beginning that no such thing can happen.
When my co-workers use decode
whenever they can, it makes reviewing their SQL code more pleasant for me. I can be sure that I can skim over the conditions and won’t overlook a special condition because there can’t be a special condition. The limitations of decode
simply don’t allow for deviations in the structure of the conditions which means they’ll all have the same structure.
Final words#
Today I learned about the SQL function decode
which can be used as a more readable alternative to the case-expression if all conditions follow a certain pattern. Hopefully, you could learn something from this post too. In any case, be well.
