Moneyball == Big Data

 

Brad Pitt as Billy Beane in MoneyballBeing born and raised in Oakland, I've been an A's fan since I was a little kid.  So when I watched the movie Moneyball last year, lots of memories, both thrilling and painful, were brought back to life on the big screen.  I was at the game where Scott Hatteberg hit the game winning HR to give the A's a record-setting 20 consecutive wins.  I was also at the game where Jeter made that incredible tag at home plate that was probably the turning point of that series, which the A's eventually lost in 5 games to the Yankees.  What the A's were able to achieve during that era was absolutely incredible.  A bunch of kids with a team salary that was 20-25% that of the mighty New York Yankees, and yet they competed and made the playoffs 4 straight years (2000-2003 seasons).

As I think about it now, what's fascinating about the Moneyball philosophy that Billy Beane and A's have made so famous is that it's essentially what many tech companies have been and are trying to conquer.  Yes,

Moneyball == Big Data

In today's age of Facebook, iPhones, fitness tracking devices, etc., data is ubiquitous, and companies are trying hard to make sense of this data.  For example at my company, SumAll (shameless plug), we have built an app that takes data from various verticals (e.g. shopping carts, web traffic, payments, social, etc.) and combines them into a single, clean interface, allowing customers to better understand trends and patterns in their business.  The A's were able to overcome their financial disadvantage by being smart and examining data.  We think our customers can benefit the same way.

SumAll

Fast forward to 2012.  Many experts predicted that the Oakland A's would lose over 100 games this year.  Why?  Because the team let go of 3 top pitchers from last year, 1) Gio Gonzalez, 21 game winner and Cy Young favorite, 2) Trevor Cahill, and 3) All-star closer Andrew Bailey.  This year's team is a bunch of no-name kids with the LOWEST PAYROLL IN THE AMERICAN LEAGUE.  Sound familiar?  

Yesterday, on Oct 3rd, 2012, the final game of the season, the A's completed an improbable run to win the American League West division by sweeping the Texas Rangers, a team that has represented the AL in the World Series the last two years.  Incredible.  Regardless of what happens in the playoffs, the fairytale season of the 2012 Oakland Athletics is already worthy of a Moneyball sequel.

I'll end with some fun stats and observations:

  • The Oakland A's had an opening day payroll of $55,372,500, the lowest in the American League.
  • The New York Yankees had an opening day payroll of $197,962,289, the highest in the AL.
  • The A's finished with a record of 94-68, 2nd only to the Yankees at 95-67.  I guess that extra game was worth the $142M.  =P
  • Dan Szymborski's preseason projections had given the A's a 1.4 percent chance of making the playoffs and 0.4 percent chance of winning the division.
  • Las Vegas oddsmakers had the A's at 100-1 odds at the All-Star break to win their division.
  • On June 30th, the A's were 37-42 and 13 games behind the Texas Rangers.
  • With 9 games left, the Rangers were 5 games ahead of the A's.
  • The only time the A's were in sole possession of 1st place in the division was after winning the final game.
  • The A's lead the league with 14 walk-off wins.
  • After losing the aforementioned 3 top pitchers, the A's went on to lose 4 more pitchers (McCarthy, Anderson, Braden, and Colon), leaving them with the current starting rotation of ALL ROOKIES.  Amazing.
  • With the San Francisco Giants also making the playoffs, there's a possibility of another "Bay Bridge" World Series, which the A's of course won via a sweep in 1989.  =]

 

mapstagram: mapping instagram photos in real-time

For those of you who are uninterested in the details behind Mapstagram and just wanna see the damn thing, click the logo:

The Story

Instagram has quickly become one of my favorite apps on the iPhone.  It combines photography, social networking, and mobile in a seamless and fun way.  It goes to show that even though Facebook is dominating the land of social, there's still room for innovation and niches.

Although Instagram has been out since October, it still exists purely as a mobile app.  No website and therefore no way to show off your Instagram "profile" (other than through individual sharing of pics via Facebook, Twitter, etc.).  I had a sneaking suspicion that they had something in the works, so when Instagram recently announced their API, I was naturally excited.  Why build out a web interface (although they probably will do that eventually), when you can release an API and harness the power of the developer community?

So I took a look at their docs, and wasn't surprised to see the typical REST-style API with JSON-formatted payloads.  I registered to be an API developer, opened up Eclipse, and off I went.  I began by creating a pretty boring grid display of my own Instagram pics:

instagram-grid2.png

Grid display of my Instagram photos using REST API

Meh.  I noticed that the geo location for each photo was included in the response, so then I created a simple mashup by displaying my pics as markers on Google Maps:

instagram-map.png

Map display of my Instagram photos using REST API

Kinda cool but still nothing super special.  Then literally the next day, Instagram announced their Real-Time API.  What this means is that you can subscribe to photos based on certain aspects such as tags, named locations, or geo coordinates, and whenever a photo is uploaded by a user that matches that criteria, you'll receive a notification from Instagram in real-time.  Awesome sauce!

And this is where the idea of Mapstagram took off.  I began architecting a backend system to handle real-time updates from Instagram (technical details below), while my friend, Jochem Geerdink, worked on the creative and front-end side.  Together, we created a dynamic visual display of Instagram photos on Google Maps in real-time.  

We give you Mapstagram:

Mapstagram

Technical Details

Mapstagram Real-Time Flow Diagram.png

High level real-time flow:

  1. Instagram user uploads new geo-tagged photo via his/her iPhone.
  2. Instagram processes this update , and immediately notifies all subscribers (such as Mapstagram) that are subscribed to that photo's geo coordinates.
  3. Mapstagram receives the notification, then queries Instagram for the new photo(s) in that geo location.
  4. Mapstagram sends these new photos to all connected clients.
  5. Photos are displayed on Google Maps, and added to the scrolling list of photo updates.  

For the software stack, I decided to toy with some technologies that I've dabbled with in the past, but never got the chance to go in depth with:

  • Google App Engine (GAE) - It's free and super easy to develop/deploy on using the Eclipse plugin.  Here are some of the GAE features I'm using:
    • Schema-less Datastore - essentially a wrapper around BigTable that implements the JDO interface
    • Memcache - distributed memory cache
    • Channel API - GAE's implementation of the Comet model - this was a crucial piece to Mapstagram's real-time architecture
    • URL Fetch API - communication API for external calls
    • Cron Service - a way to perform scheduled tasks
  • Google Maps Javascript API V3
    • Markers - I customized the markers by using Instagram's thumbnail URLs.  At present, when you click the small image, I'm actually removing the marker, and creating a new marker with the URL to a larger image.  Somewhat of a hack but it works - kinda clever no?  I plan on changing this in the future.
    • Events - I use the marker's "click" event to implement the aforementioned hack.  
  • jQuery - I have the most experience with YUI, but I know jQuery is the most popular JS library so I went with it:
    • $.ajax() - For AJAX calls to server
    • $.queue() - Used to queue the pipeline of real-time pics, as well as the "Replay" feature
    • Effects - To perform basic animations
    • Timeago plugin - Used to display relative time (e.g. "2 mins ago")

Future

Mapstagram started off with me goofing around with the Instagram API, and quickly became a fun 2-person project with Jochem.  We definitely have more ideas and improvements that we'd love to add to Mapstagram, so hopefully we'll have the time and energy to implement them.  Feedback is definitely welcome!

Look for me (eimajination) and Jochem (orangeup) on Instagram, and add us to your friend list!