- The Query
- Posts
- 🤓 4 best books for data analysts
🤓 4 best books for data analysts
Read Time: 4 minutes
Hey crunchers! The Query here — the data analyst newsletter that's like a pivot table for your data career, summarizing and analyzing data to reveal hidden trends and insights.
Here’s what we have for you today:
The best books for data analysts 📖
What it actually means for a business to be “data-driven”
A way to “QUALIFY” your SQL statements
Joy in the form of memes 🤣
This post contains affiliate links. We may earn a commission for purchases made at no additional cost to you.
def learn_data_analysis(👨💻):
1. The BEST books to supplement your learning as a data analyst 📚️
When I’m learning something new, I like using books as supplementary learning resources.
I usually only pick them up in the evenings, right before bed.
When picking up a new skill like data analysis, 80% of your time should be “learning by doing” (i.e. building projects).
But when the day is over and you’re winding down, picking up a related book can help keep your brain focused on data and speed up the learning process.
Here are my personal 4 favorite books related to data analytics:
How to Measure Anything: Finding the Value of Intangibles in Business - This book totally changed the way I looked at analytics, mainly, the goal is to reduce uncertainty in decision-making, not eliminate it. I can’t recommend it enough.
Lean Analytics - Use Data to Build a Better Startup Faster - If you’re looking to be a data analyst at a startup or found your own startup someday, definitely read this book. You’ll learn about the North-Star Metric (NSM) and other really important concepts for analytics at a startup.
Naked Statistics: Stripping The Dread From The Data - An excellent read as an introduction to learning statistics. If you haven’t taken a stats course or it has been a while, don’t start with a textbook. Start with this. It explains basic statistics from a first principles perspective, while also being a really interesting and engaging read.
Growth Units: Learn to Calculate Customer Acquisition Cost, Lifetime Value, and Why Businesses Behave The Way They Do - If you want to work in growth or marketing analytics, this is THE book. I wish someone gave me this book a decade ago at the beginning of my career. It explains the foundational metrics for measuring the growth of a business. There are only a handful of metrics but they are VERY nuanced and this book does an amazing job of explaining all the nuance.
2. What it means to be a “data-driven” organization
What does it mean for an organization to be data-driven?
If you’re someone that wants to work in data, this is something you MUST understand.
Here’s a series of 3 posts by George Xing that does an excellent job of explaining the nuances of this:
select * from dataset-of-the-week
This week’s dataset is an awesome portfolio project dataset.
It’s HUGE.
The UNESCO Institute of Statistics collects country-level data on the number of teachers, teacher-to-student ratios, and related figures.
Here’s what I recommend: Take 20 minutes and explore the various datasets available using the data explorer. Using the data available, come up with a few interesting questions related to the data. Then use whatever analytics workflow you’re comfortable with.
If it were me, I would:
Import the raw data to BigQuery
Use SQL to transform the data into the format I need for analysis
Complete the analysis with Tableau Public or Looker Studio (both free)
class MiniLesson:
QUALIFY Statement in SQL
The QUALIFY statement in SQL is a handy tool for making your queries more concise and readable.
I use it all the time when I’m writing SQL in BigQuery
It allows you to filter the results of a query based on the result of a window function, such as ROW_NUMBER().
This can be particularly useful when you want to select a single row for each group based on specific criteria, without the need to write a separate Common Table Expression (CTE) or subquery.
Let's consider a sales dataset with the following columns:
order_id
, customer_id
, product_id
, sale_date
, and sale_amount
Suppose you want to find the most recent purchase for each customer.
You can use QUALIFY along with ROW_NUMBER() to achieve this without using a CTE or subquery.
Here's an example of how to do this with and without QUALIFY, so you can see the benefit:
In this query, the ROW_NUMBER() window function assigns a row number for each row within each group of customer_id, with the row number 1 assigned to the most recent purchase (based on sale_date).
The QUALIFY statement then filters the results to only include rows with a row number of 1, effectively returning the most recent purchase for each customer.
The benefits of using QUALIFY with ROW_NUMBER() include:
Simplifying your query: Using QUALIFY can help you avoid writing complex subqueries or CTEs, making your query easier to understand and maintain.
Improving performance: Since QUALIFY filters the results of the window function directly, it can lead to better performance by reducing the amount of data that needs to be processed in subsequent steps.
Enhancing readability: By removing the need for nested subqueries or CTEs, QUALIFY can make your query more readable, allowing you and your colleagues to understand it more easily.
The QUALIFY statement in SQL can be a powerful tool for data analysts, simplifying queries and improving overall readability and performance.
Remember QUALIFY next time you are working with window functions!
import memes as 😂
Me Singing to ChatGPT: “You raise me upppp…”
That’s it for today.
Stay crunchin’ folks and see you next week!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
What'd you think of today's newsletter? |