• The Query
  • Posts
  • 🤓 The perfect 1st project to learn data analysis

🤓 The perfect 1st project to learn data analysis

Read Time: 3 minutes

GM crunchers! The Query here — the data newsletter that's like a sleek dashboard for your data career, guiding you through a narrative of evolving data trends and insights.

Here’s what we have for you today:

  • The perfect first Excel data analysis project 🥇 

  • SQL vs Pandas: What tool to use for the job? 🐼 

  • A handy SQL function for cleaning data 🧹

  • A juicy meme 🤣

def learn_data_analysis(👨‍💻):

1. Building a personal budget is an excellent first Excel project

If you’re just getting into data analysis, you’re probably feeling overwhelmed by the sheer amount of information you need to absorb.

Here’s a piece of advice — spend most of your time learning by doing.

In the beginning, learning by doing is guided learning where you’re following a tutorial.

This Coursera Personal Budget Excel project is exactly that.

2. Choosing the right tool for the job: SQL vs. Pandas

Both of us (Kyle and Cody) have been doing data analysis for a decade each.

In the beginning, we (like everyone) had shiny object when it came to tools.

We thought, “to be a real data analyst, we need to learn [insert tool name].”

The truth is, you can do most analysis within Excel.

The question is, “What’s the best tool for the job?”

Regarding Python’s Pandas library and SQL, it’s tough to know which tool is better in which situation.

If you’re a bit more advanced reader of The Query, give this article a read.

You’ll learn about:

  • the performance difference between SQL and Pandas

  • how to do more complex tasks in both SQL and Pandas

  • and more!

It’s a great piece because it shows 2 approaches (SQl vs Pandas) to solving the same problems.

select * from dataset-of-the-week

Customer insights analyses are a type of analysis of a company’s ideal customers. They help a business better understand its customers and make it easier for them to modify products according to their specific needs, behaviors, and concerns.

Customer insights analyses help a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only to that particular segment.

Here are the columns you’ll have to work with:

What insights can you pull from this dataset to help inform marketing or product-related decisions?

class MiniLesson:

Cleaning Data with LOWER

I reviewed all the SQL code I’ve written recently; the data cleaning function I use most often is LOWER.

The LOWER() function in SQL is a string manipulation function that converts text to lowercase.

It is useful when you need to normalize text data, such as email addresses, to ensure consistent formatting and avoid issues with case sensitivity in your queries.

Email addresses are often entered by users in a free-form manner, leading to inconsistent capitalization.

Some SQL environments are case-sensitive, which means that '[email protected]' and '[email protected]' would be treated as different email addresses, even though they represent the same person.

By using the LOWER() function, you can standardize email addresses to lowercase, ensuring that your queries and joins work correctly regardless of the original capitalization.

Usage is straightforward. Here is an example…

Let's consider an example where we have a table named users with email addresses in various capitalizations.

We want to convert all email addresses to lowercase.

We can use the LOWER() function in a query to achieve this:

Simple, but powerful.

As a data analyst, understanding how to use the LOWER() function in SQL is essential as it allows you to standardize and clean text data within your queries.

This can be particularly helpful when working with datasets that contain inconsistent capitalization or case-sensitive data, such as email addresses.

By using the LOWER() function, you can ensure that your queries, joins, and comparisons work correctly regardless of the original capitalization, improving the accuracy and reliability of your analysis.

import memes as 😂 

That’s it for today.

Stay crunchin’ folks and see you next week!

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

What'd you think of today's newsletter?

Login or Subscribe to participate in polls.