Predicting Spending in Google’s Online Store — Regression Modeling

Image for post
Image for post

This blog is part two of a walkthrough of a recent machine learning project. Part one covered the data and business context. This installment covers the modeling process. View the entire project (including all code and the accompanying slide deck) on Github.

Introduction:

In part one of this series I described our data, did some exploratory data analysis, and walked through some of the preprocessing done to get the data ready for modeling.

The data includes 717k rows, with each row being one visit to Google’s online store between 2016 and 2018. Features describing geography, traffic source, device properties, page views, time, price, and spending are included in the dataset. About 2.5% …


Predicting Customer Spending in Google’s Online Store

Image for post
Image for post

This blog is part one of a two part walkthrough of a recent machine learning project. View the entire project (including all code and the accompanying slide deck) on Github. Keep an eye out for part two, where I’ll go through the modeling process and results.

Introduction:

E-commerce is becoming a larger and larger part of all of our lives. In the US, $602 billion was spent online in 2019. About 75% of those who shop online do so at least once per month. …


An Intro to Dash

Image for post
Image for post

Topics

  • What is Dash?
  • Elements of A Dash App
  • Coding a Dash App

What is Dash?

Dash is a Python framework for developing web applications. It’s written on top of Plotly, so any graphs you can create with Plotly are easy to implement in an interactive web app! The potential for dash apps is limitless, and there are plenty of complex and beautiful examples on the Dash App Gallery (source code is available for these projects too).

Today I’ll be breaking down the basic elements of a Dash app, and teach you to code your own simple app. This app will show Wal-Mart store openings across the US over time. …


Why Use SQL?

Image for post
Image for post

The Data-Driven World

Thanks to developments in technology and communication, we’re spending more and more of our lives online. In 2019 the average internet-using American spent 6 hours and 31 minutes online per day! There will be 320 billion emails sent each day by 2021. Everything from our cars, to our refrigerators, to our flip flops are connected to the internet.

What does all of this mean? That more and more data is being generated at a faster rate each day — and someone will have to make sense of it all. SQL can help us do just that.

Much of the data being generated today is unstructured. This refers to things like audio files, videos, social media posts, and more. Working with this kind of data will have to wait for another post, because SQL works with structured data. An easy way to think about this is that structured data is data that can be stored in a good old fashioned Excel sheet. You have rows representing a person, object, payment, etc., and you have columns that store features or attributes associated with that object. …


From the Big to the Really Big

Image for post
Image for post

This article borrows heavily from Scott Aaronson’s article on large numbers. Think of this blog post as a summary of his work. I encourage you to read his more detailed version if this article interests you!

Name Your Number

As a kid I remember having contests with my friends to see who could name the bigger number. One of us would start with ‘a billion!’, then the other would counter with ‘a trillion!’, and then ‘a googol!’, until we eventually reached ‘infinity!’. Even this could be countered with the galaxy-brain response of ‘infinity plus one!’.


Using Plotly to Create Choropleth Visualizations

Image for post
Image for post
An Example of What You Can Create With Plotly

In the last blog I discussed cleaning and visualizing data using Pandas. Now I’ll expand on that theme by walking through how to show your data on a map using plotly.

Tools:

  • Python, Plotly, Pandas, Numpy, JSON

The Data:

In this example I’ll be using US census data for Virginia. The topic of interest is population and how it has changed from 2010 to 2019. The census data gets very granular, but I’ll use county-level data here. I’ve also saved a file containing the Federal Information Processing Standard (FIPS) codes for each county in Virginia. …


Image for post
Image for post
Let’s explore some data from Spotify!

What we’ll accomplish in this article:

  • Import data from an .xlsx or .csv file.
  • Perform steps to clean our data.
  • Explore and manipulate the data using Pandas.
  • Visualize our findings using built-in Pandas functions.

Today we’ll be exploring Spotify Data from Kaggle user Yamac Eren Ay. Our data contains information on the audio characteristics, popularity, key, tempo, and duration of almost 169k songs released from 1921 to 2020. We also have access to each song’s name, artist, and year of release.

Import the Data

First we need to import our data. …

Jim Fay

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store