Starting Out with SQL — Part 1

Jim Fay
4 min readOct 8, 2020

Why Use SQL?

The Data-Driven World

Thanks to developments in technology and communication, we’re spending more and more of our lives online. In 2019 the average internet-using American spent 6 hours and 31 minutes online per day! There will be 320 billion emails sent each day by 2021. Everything from our cars, to our refrigerators, to our flip flops are connected to the internet.

What does all of this mean? That more and more data is being generated at a faster rate each day — and someone will have to make sense of it all. SQL can help us do just that.

Much of the data being generated today is unstructured. This refers to things like audio files, videos, social media posts, and more. Working with this kind of data will have to wait for another post, because SQL works with structured data. An easy way to think about this is that structured data is data that can be stored in a good old fashioned Excel sheet. You have rows representing a person, object, payment, etc., and you have columns that store features or attributes associated with that object.

What is SQL?

SQL is a programming language used to access, manage, and process data stored in a relational databases. A relational database stores tables containing structured data. This data can be queried using SQL, with the output of each query being a new table. Each row in one of these tables uniquely identifies one of the entities or objects stored in the table.

For example, one table in a database could store customer information (one row per customer), another could store product information (one row per product), and a third table could store transaction-level data (one row per purchase).

This data can be accessed and manipulated using SQL queries! For example, to find the 5 top selling products you could query the database using the following code:

SELECT product_ID, SUM(price) AS product_revenueFROM transaction_dataGROUP BY product_IDORDER BY product_revenue DESCLIMIT 5;

This would output a new structured data table looking something like this:

Example Output

Don’t worry about the specifics of this code just yet. How to actually write code in SQL will be explained in further blog posts.

Can’t I Just Use Excel?

You might be wondering why not just use Microsoft Excel for these tasks? Excel is fairly simple, widely used, and easy to interpret, but there are situations where SQL has the upper hand.

SQL is:

Faster (with large datasets)

Excel can be speedy when working with smaller data sets, but loading times can increase dramatically once the number of rows gets into the hundreds of thousands or millions. Formulas, pivot tables, workbooks with many tabs can also add to these loading times.

Switch to SQL and say goodbye to waiting 15 minutes after every adjustment made to a large pivot table!

Reproducible

When producing reports or modifying with Excel there is no easy way to document every change you’ve made to the data. Detailed notes are necessary if you want anyone else to be able to produce the same results with confidence.

In my previous life as a financial analyst, much of my time was taken up by documenting steps in producing reports so that the same analysis could be done the next month or quarter. This takes time and is error prone. It’s all too easy to assume a step is obvious, or to simply miss a step in the directions. This can lead to incorrect outcomes and time consuming analysis.

SQL (and programming languages in general) solves this issue. Each step in the data analysis is documented right there in the code! This makes SQL operations/workflows both easy to interpret and easy to reproduce.

Less Prone to Error & Accidental Changes

Errors are easy to make when working in complex Excel workbooks. An accidental keystroke could overwrite a cell value. Modifications could accidentally be made on a filtered data set instead of the full set.

In Excel, you generally have your analysis and your data in the same file, meaning that accidental errors in analysis can lead to errors in the data itself.

SQL solves these issues. Firstly, it separates the data from the analysis. The data is stored in the relational database, and your SQL code just says which tables to use and how to manipulate them. This makes it much harder to accidentally alter the data itself. Additionally, because all steps are listed in the code, it’s easy to make sure you’re only performing operations that you actually mean to perform.

Conclusion

Hopefully all of this has given you a reason to be interested in SQL! In further articles I will walk through the basics of SQL, along with some more interesting features.

Stay tuned!

--

--