- The Query
- Posts
- 🤓 Ecommerce SQL project for data analysts
🤓 Ecommerce SQL project for data analysts
Read Time: 3 minutes
Hey crunchers! This is The Query — the data newsletter that's like a well-tuned algorithm for your data career. We'll help you navigate complex terrain with efficiency and precision.
Here’s what we have for you today:
Build 6 Excel projects in under 2.5 hours.
An amazing dataset for portfolio projects.
Ranking data with SQL and Window Functions.
Avocados that metaphorically represent your life as a DA 🥑
def learn_data_analysis(👨💻):
1. Build 6 Excel projects in under 2.5 hours!
Learning by doing is key to developing skills in data analytics.
In this YouTube tutorial, you’ll learn Excel by building 6 projects — a payroll project, a grade book project, a sales database, and more.
2. Using mental models to learn SQL faster 🧠
A mental model is an interrelated set of beliefs that shape how a person forms expectations for the future and understands the way the world (or in this case, SQL) works.
In Select Star SQL (the free, online book), Zi Chong Kao the author, teaches SQL to beginners by conveying a mental model for writing SQL.
select * from dataset-of-the-week
This week’s dataset is the Olist (largest department store in Brazilian marketplaces) E-commerce Dataset!
This dataset is perfect for a portfolio project AND for obtaining experience:
Analyzing real-world, sales data
Practicing SQL JOINS (there are 9 CSVs you’ll need to JOIN together)
Upload these CSVs into your database of choice and write SQL to bring your tables together.
Here are some questions to get your analysis started:
How often is the average order value for a second purchase, larger than the first purchase?
What is the average order value by city?
What’s the total value of orders that haven’t been delivered?
class MiniLesson:
Ranking Data with SQL Window Functions
Data analysts are get asked to rank data.
But this can be tricky because in SQL there are 3 ranking functions that do nearly the same thing.
The 3 functions I'm referring to are:
ROW_NUMBER
RANK
DENSE_RANK
So what’s different about them?
The main difference is in how they handle a tie (i.e. equal values).
Suppose you have a table called sales
.
You want to rank the salespeople based on their sales amount.
The example data, query, and output are shown below with examples of each ranking function.
As you can see RANK, ROW_NUMBER, AND DENSE_RANK don't produce the same rankings.
ROW_NUMBER() assigns unique values to each row, even if the sales_amount is the same. It will not show any ties.
Use this if you want to make 100% sure you have a tie breaker.
RANK() assigns the same rank to the rows with the same sales_amount and leaves gaps in the ranking sequence.
This is similar to how a sporting event like golf would rank players in a tournament.
DENSE_RANK() assigns the same rank to the rows with the same sales_amount without leaving gaps in the ranking sequence.
Knowing the difference between these three can ensure you always use the right one.
import memes as 😂
When you thought data analysis was the main course…
But life serves you a giant avocado pit of data cleaning. 🧽 🫧
If you’ve been reading The Query, you probably noticed the format change we made this week.
We changed the format because:
We want you to be able to read the newsletter in under 5 minutes.
We want the newsletter to be as actionable as possible.
Quality of resources > quantity of resources.
Reply to this email and let us know if you like the new format.
Can’t wait to get your feedback 🤓
That’s it for today.
Stay crunchin’ folks and see you next week!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
What'd you think of today's newsletter? |