🤓 The BEST dataset ever?

GM data wizards! This is The Query — the data newsletter that's like an efficient data pipeline for your data career. We'll help you transform raw information into actionable knowledge.

Here’s what we have for you today:

  • An epic public dataset for portfolio projects 🤘 

  • A world-class intro to Python course

  • An unexpected use of CASE statements

  • A tool for leveraging AI in spreadsheets

  • A sizable serving of silly 😆

best_public_dataset_for_portfolio_projects

Enter IPEDS, the Integrated Postsecondary Education Data System.

It’s the statistics and research arm of the US Department of Education and holy 🐮 is it great for becoming a better analyst and building portfolio projects.

Why?

1. The data isn’t perfect.

There’s missing data, null values, etc.

Welcome to the real world of data analysis 🙂 

Practice how you play.

2. The data is spread out across more than 25 tables.

You’ll get tons of practice JOINing these tables together. All you need to do is import them into your DB or DW of choice.

3. The data is time series.

Each table is for a single year, but the data goes back to 1980, so there’s lots of analysis to be done over time.

How to come up with project ideas with IPEDS?

Review the data dictionary for each survey in the dataset.

The dictionary will give you a sense of what the data in each table means — specifically the tabs varlist and Description.

Import a few of the CSV files into your DB/DW that pique your interest and begin exploring!

Here are a few questions you can answer with data to get you started:

  • Of the institutions that closed, what’s the median and average length of time they were open for?

  • Build a distribution of open institutions by size.

  • What’s the male-to-female ratio of schools bucketed by size?

  • Perform a cost analysis on schools by state.

To download the CSVs containing the data from IPEDS, click the link in the Data File column.

Good luck!

select * from content-spotlight

1. The Phases of a Data-Driven Startup. This article explains the 4 phases startups evolve through in their data analytics journey. When you join a new company, assess the stage the company is in so you can make the most meaningful impact as an analyst.

2. The Best Intro to Python Course I’ve Taken. There are a lot of different resources to learn python as a beginner. When I first started, I probably took over 10 introductory courses. The best one of them all was this course by Code with Mosh. At around 6 hours total you can knock it out in 2 sittings at 2x speed. Happy learning!

3. 11 New Google Sheets Functions. Google Sheets recently launched a list of 11 new functions available for 2023. This article by Ben Collins explains each one and provides examples. Skim through each one to see if you think you can apply them on the job.

class LearningResources: 📊

u/mrjsmathematics on Reddit made a successful career change from math teacher to data analyst. 👏 

Use their story as inspiration and motivation to continue your own learning journey with the resources below.

three project-based learning resources

The projects below are ordered from easiest to hardest.

All are beginner friendly 🙂 

two technical tips

Time to get technical…

1. Python tip 🐍: Use meaningful variable names and follow the PEP 8 style guide.

If you’re new to Python, you’ll save time in the long run by building good habits up front.

Writing variable names that clearly describe the data they hold is one of those good habits.

Your code will be easier to read and understand for future you and anyone else who might work with your code.

Here are a few pointers from the style guide:

  1. Use lower_case_with_underscores for variable and function names.

  2. Use UPPERCASE for constants.

  3. Indent using 4 spaces, not tabs.

  4. Keep lines to a maximum of 79 characters.

  5. Place imports at the top of your file, separated by a single blank line.

  6. Add a space after commas and around operators.

Here's a simple example that follows these guidelines 😍 

2. SQL Tip 👨‍💻: Use CASE statements to pivot data dynamically

CASE statements can transform data in ways beyond simple conditional expressions.

One interesting application of CASE statements + aggregate functions is to pivot data dynamically, creating a more readable and structured view of your dataset.

Pivoting data means converting rows into columns, which can be especially useful when dealing with categorical data.

Here's an example of how you can use the CASE statement to pivot data:

In this query, we used SUM functions and CASE statements to pivot the age_group column, converting it into separate columns for each age group's dog count.

Here is what that looks like before & after the pivot:

one tool 🔧

Meet PromptLoop — a tool for leveraging the power of ChatGPT inside Google Sheets.

Image using a function like SUMIF or COUNTIF, but the function being powered by ChatGPT 🤯 

That’s one big 🧠 function.

Here are a few possible applications:

  • Customize Emails to Coworkers

  • Generate Dummy Data

  • Extract something from a large piece of text (without writing REGEX)

It’s only a matter of time before Microsoft builds this sort of feature into Excel.

def data_jobs(👨‍💼👩‍💼):

remote, entry-level data jobs

Because who likes writing SQL from a busy office?

  1. Data Analyst @ Jam City — $81-129k (apply here)

  2. Data Analyst, Product @ Warner Bros. — $80-149k (apply here)

  3. BI Analyst @ Acumed — $57-106k (apply here)

  4. Data Analyst @ Stride — $48-105k (apply here)

  5. Data Analyst, Marketing @ Nutrisense — $40-65k (apply here)

freelance data gigs

Need work experience?

Freelancing is a GREAT way to improve your data analytics skill set and develop entrepreneurial skills.

  1. V-Lookup Help (This is a layup) — $10-35 hourly (apply here)

  2. Data Visualization in Google Sheets — $10 fixed price (apply here)

  3. Python Script to Pull Data — $40-81 hourly (apply here)

  4. Connect DB to Excel — $40 fixed price (apply here)

import entertainment as fun

meme of the week

Ah, the perfect date. So romantic. ❤️

r/dataisbeautiful

Two days ago was Pi day!

The image below shows the first 2023 digits of PI — the darker the color the higher the number.

Can you spot the impressive sequence of 9’s near the middle?

data tok

Insanity 😲 

@codeseb

Has AI come too far? 😱 #chatgpt #pythonprogramming #codingmemes #programmerhumor #pythoncode #codinghumor

That’s it for today. Stay crunchin’ folks and see you next week!

If you want more, be sure to follow us on LinkedIn (here and here) — we post data content daily.

And if you like The Query, send it to a fellow data cruncher 🤓 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

What'd you think of today's newsletter?

Login or Subscribe to participate in polls.