Predicting Spending in Google’s Online Store — Regression Modeling
This blog is part two of a walkthrough of a recent machine learning project. Part one covered the data and business context. This installment covers the modeling process. View the entire project (including all code and the accompanying slide deck) on Github.
In part one of this series I described our data, did some exploratory data analysis, and walked through some of the preprocessing done to get the data ready for modeling.
The data includes 717k rows, with each row being one visit to Google’s online store between 2016 and 2018. Features describing geography, traffic source, device properties, page views, time, price, and spending are included in the dataset. About 2.5% …
Predicting Customer Spending in Google’s Online Store
This blog is part one of a two part walkthrough of a recent machine learning project. View the entire project (including all code and the accompanying slide deck) on Github. Keep an eye out for part two, where I’ll go through the modeling process and results.
E-commerce is becoming a larger and larger part of all of our lives. In the US, $602 billion was spent online in 2019. About 75% of those who shop online do so at least once per month. …
An Intro to Dash
Dash is a Python framework for developing web applications. It’s written on top of Plotly, so any graphs you can create with Plotly are easy to implement in an interactive web app! The potential for dash apps is limitless, and there are plenty of complex and beautiful examples on the Dash App Gallery (source code is available for these projects too).
Why Use SQL?
Thanks to developments in technology and communication, we’re spending more and more of our lives online. In 2019 the average internet-using American spent 6 hours and 31 minutes online per day! There will be 320 billion emails sent each day by 2021. Everything from our cars, to our refrigerators, to our flip flops are connected to the internet.
What does all of this mean? That more and more data is being generated at a faster rate each day — and someone will have to make sense of it all. SQL can help us do just that.
Much of the data being generated today is unstructured. This refers to things like audio files, videos, social media posts, and more. Working with this kind of data will have to wait for another post, because SQL works with structured data. An easy way to think about this is that structured data is data that can be stored in a good old fashioned Excel sheet. You have rows representing a person, object, payment, etc., and you have columns that store features or attributes associated with that object. …
From the Big to the Really Big
This article borrows heavily from Scott Aaronson’s article on large numbers. Think of this blog post as a summary of his work. I encourage you to read his more detailed version if this article interests you!
As a kid I remember having contests with my friends to see who could name the bigger number. One of us would start with ‘a billion!’, then the other would counter with ‘a trillion!’, and then ‘a googol!’, until we eventually reached ‘infinity!’. Even this could be countered with the galaxy-brain response of ‘infinity plus one!’. …
In the last blog I discussed cleaning and visualizing data using Pandas. Now I’ll expand on that theme by walking through how to show your data on a map using plotly.
In this example I’ll be using US census data for Virginia. The topic of interest is population and how it has changed from 2010 to 2019. The census data gets very granular, but I’ll use county-level data here. I’ve also saved a file containing the Federal Information Processing Standard (FIPS) codes for each county in Virginia. …
Today we’ll be exploring Spotify Data from Kaggle user Yamac Eren Ay. Our data contains information on the audio characteristics, popularity, key, tempo, and duration of almost 169k songs released from 1921 to 2020. We also have access to each song’s name, artist, and year of release.
First we need to import our data. …