A machine that generates money with pandas-datareader and Prophet

What is this? This isn’t really a money machine, I’m just kidding about that, sorry. This is just a quick exploration of two awesome Python packages that I wanted to play with for a while Prophet for time series forecasting pandas_datareader for grabbing historic stock price data Prophet seems like an awesome project by Facebook to make state-of-the-art time series forecasting really easy and simple. I’ve been hoping to give it a try for a while. [Read More]

What is TF-IDF? The 10 minute guide

I recently started reading up a bit on tf-idf, which stands for term frequency-inverse document frequency. Tf-idf is a simple, but surprisingly powerful technique which can be used to figure out what a document is ‘about’. It’s often used in the fields of information retrieval and text mining. Documents? First, let’s just define what I mean with document. For our purposes, a document can be thought of all the words in a piece of text, broken down by how frequently each word appears in the text. [Read More]

A Redshift UDF to find AB test significance

I use Amazon’s Redshift every day. It’s an amazing database for data warehousing and analytics and allows you analyze huge datasets in a blazingly efficient manner using SQL. The reason why Redshift is so fast for analysis work is that unlike many other SQL databases, it uses columnar storage and is highly optimized for distributing workloads across a cluster of instances. Redshift is based on PostgreSQL 8.0.2., so it’s pretty familiar to anyone who’s used Postres or any other mainstream SQL dialect before. [Read More]