The Query
Posts
🤓 Learn 2 key SQL concepts in 10 minutes

🤓 Learn 2 key SQL concepts in 10 minutes

The Query
June 15, 2023

Read Time: 3 minutes

GM crunchers! The Query here — the data newsletter that's like a neural network for your data career. We'll help you connect the dots and master the art of turning data into actionable intelligence.

Here’s what we have for you today:

Learn SQL bucketing and why you shouldn’t use select *
A sporting good store product analytics portfolio project dataset ⚽️
Avoiding Zero-Division Errors in SQL
A basket full of funnies 🧺 🤣

def learn_data_analysis(👨‍💻):

1. Bucketing is a simple, yet core concept in writing SQL.

If you want to create a chart showing average income by age bucket (e.g. 20-29, 30-39, 40-49, etc.), but your dataset only has age, you’ll need to create the buckets yourself.

An easy way to do this is with a CASE statement, but there are other ways.

If you aren’t comfortable with bucketing with SQL yet, I HIGHLY recommend reading this article.

2. Do you know why using select * is bad for SQL performance?

This article explains the reasons why.

If you’re fairly fluent in SQL, I recommend giving this a read.

You don’t need to understand all of it, but if you can start to understand some of the foundational concepts, it’ll make you a much better SQL coder because you’ll be aware of performance tradeoffs.

select * from dataset-of-the-week

The dataset of the week is by Podsyp on Kaggle and is for an online sporting goods store.

Business Problem:

On the main page of the store, a banner is shown to improve sales. 1 of 5 banners is randomly displayed there. Each banner advertises either a specific product or the entire company.

The CMO of the company has an offer to sell the banner space to another company instead of promoting their own product / company.

Help the CMO make a decision.

If you want some inspiration, here’s a Python notebook that shows an approach to this analysis.

This same analysis can be with SQL and Tableau / Looker Studio / PowerBI.

Good luck!

class MiniLesson:

SAFE_DIVIDE function in SQL

If you’ve been analyst for long you have probably run into this common error:

Can’t divide by zero.

SAFE_DIVIDE is a function in SQL that allows you to perform division operations while safely handling potential “divide-by-zero” errors.

If a divide-by-zero error is encountered, instead of causing your query to crash or return an error, the SAFE_DIVIDE function will return NULL.

Here's an example to illustrate its usage:

Suppose we have a table named sales with the following schema and data:

We want to calculate the average revenue per unit for each product. If we try to divide total_revenue by units_sold directly, we will encounter a divide-by-zero error for product_id 2.

To avoid this error and handle the division safely, we can use the SAFE_DIVIDE function:

In this example, the SAFE_DIVIDE function takes two arguments:

The numerator (total_revenue)
The denominator (units_sold)

When the denominator is zero, as in the case of product_id 2, the function returns NULL instead of causing an error.

As a beginner data analyst, understanding how to use functions like SAFE_DIVIDE to handle potential errors in your SQL queries is important.

It helps you maintain the stability and reliability of your analysis, allowing you to work with data that may contain unexpected zero values or other edge cases that could otherwise cause your queries to fail.

Note: SAFE_DIVIDE is what this function is called in BigQuery. Other SQL environments may have a different name for this function but it is essentially the same thing. To find yours just google “Safe divide [insert platform i.e. Snowflow, Postgres, etc.]”

import memes as 😂

That’s it for today.

Stay crunchin’ folks and see you next week!

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

What'd you think of today's newsletter?

new new