Sunday, June 22, 2014

d3 visualization experiences Schrodinger's cat paradox (i.e., win and fail)

A few weekends ago, I took a d3 visualization workshop class hosted at Zipfian. Eager to show off my newfound skillz (with a "z"), I tried dumping my previous post's data into d3. Encountering countless hurdles along the way (asynchronous calls, adding jQuery source to blogspot, cross-site hosting, MIME-type errors, jsonCallback, getting pandas to output json in a desirable format, etc.), it is now two weeks later. I have good news and bad news.

The good news is that after hours of futzing around on a perfectly sunny Saturday afternoon, I managed to get a prototype working! It reads in jsons hosted on my github and plots nice, simple bars, dots, and a SF street map overlay (or underlay?). See above, if you're lucky.

The bad news, I discovered this morning, is that it actually still doesn't work (for you unlucky folks). As far as I can tell, it doesn't work with Chrome on Windows or Chrome on Chrome OS. It also doesn't work on mobile (but even my MathJax equations from the Markov Chains and Bayes-MCMC posts didn't work on mobile). I am working on an Ubuntu machine with Chrome, and it works here for some reason.

The problem apparently is that github hosts raw files as text/plain to prevent hotlinking. Initially only Internet Explorer respected the Content-type header, but Chrome and Firefox got on board about a year ago. My guess is the Ubuntu version of Chrome hasn't been updated yet, which is why it works for me.

So... back to the drawing board.

Monday, June 2, 2014

Gone in 134 seconds... Actually I still see you.

For my Zipfian Academy Capstone Project, I want to work on something related to traffic. A few weekends ago, I found some interesting data sets on the interwebs, including Anonymized Uber GPS Traces from several years ago. The data only contains trip IDs, timestamp, latitude, and longitude. The timestamps have preserved time of day and day of week, however the actual dates have been modified to protect the privacy of drivers and passengers. Moreover, the start and end of each ride is truncated, also for privacy reasons.



I created the above image right off the bat. It is just an overlay of all the rides together (after a little housekeeping). In case you don't recognize the geography, it is the San Francisco Bay Area. The long tail reaching southward are mostly trips to SFO Airport, and the eastward tail are mostly trips to OAK Airport.

Just last week, I went to a meetup where I met Kevin Novak, the lead data scientist at Uber. He mentioned one particular driver who has been with Uber since nearly the beginning. This driver, who shall not be named here, is well regarded but known particularly for driving under the speed limit.

So I decided to try and find this driver.

First, I needed to create new features based on the limited data I had. Using the central finite differences method, I took time derivatives to compute the speed and acceleration associated with each datapoint. Seemingly trivial, this was actually somewhat challenging to do within the python pandas framework.

(I will add some details here later when it's not 2am.)

I converted speed and acceleration (both scalars for now) into units of miles per hour and meters per second-squared, respectively. It turns out there are several "trips" that have long latency times. Specifically, this means the car is stopped, potentially for hours, and this can drive the mean and median speeds way down. Of course, there are ways around this, but for a quick result, I decided to look instead at maximum acceleration of each driver. Presumably this unnamed slow driver not only drives slow, but accelerates slowly as well.

Behold, my findings:


In the upper plot, I have just shown the path taken by "Driver 19393." I believe it is 101-N to I-80E. In the middle plot, I have shown the speed as a function of time. Driver 19393 has a top speed of approximately 17 miles per hour.  In the bottom plot, I have shown the acceleration as a function of time. Driver 19393 has a maximum acceleration of approximately 0.2 meters per second-squared. This is equivalent to 0 to 60 in a whopping 134 seconds.

Now, one might be inclined to ask if this driver was driving in traffic. I can not say absolutely definitively, but I know this trip happened between 12:00am and 12:30am. I've actually driven near that highway junction close to midnight quite often, and have yet to encounter any traffic.

So maybe Driver 19393 is our mystery driver? We'll see if deeper exploration reveals anything different...