Monday, June 2, 2014

Gone in 134 seconds... Actually I still see you.

For my Zipfian Academy Capstone Project, I want to work on something related to traffic. A few weekends ago, I found some interesting data sets on the interwebs, including Anonymized Uber GPS Traces from several years ago. The data only contains trip IDs, timestamp, latitude, and longitude. The timestamps have preserved time of day and day of week, however the actual dates have been modified to protect the privacy of drivers and passengers. Moreover, the start and end of each ride is truncated, also for privacy reasons.

I created the above image right off the bat. It is just an overlay of all the rides together (after a little housekeeping). In case you don't recognize the geography, it is the San Francisco Bay Area. The long tail reaching southward are mostly trips to SFO Airport, and the eastward tail are mostly trips to OAK Airport.

Just last week, I went to a meetup where I met Kevin Novak, the lead data scientist at Uber. He mentioned one particular driver who has been with Uber since nearly the beginning. This driver, who shall not be named here, is well regarded but known particularly for driving under the speed limit.

So I decided to try and find this driver.

First, I needed to create new features based on the limited data I had. Using the central finite differences method, I took time derivatives to compute the speed and acceleration associated with each datapoint. Seemingly trivial, this was actually somewhat challenging to do within the python pandas framework.

(I will add some details here later when it's not 2am.)

I converted speed and acceleration (both scalars for now) into units of miles per hour and meters per second-squared, respectively. It turns out there are several "trips" that have long latency times. Specifically, this means the car is stopped, potentially for hours, and this can drive the mean and median speeds way down. Of course, there are ways around this, but for a quick result, I decided to look instead at maximum acceleration of each driver. Presumably this unnamed slow driver not only drives slow, but accelerates slowly as well.

Behold, my findings:

In the upper plot, I have just shown the path taken by "Driver 19393." I believe it is 101-N to I-80E. In the middle plot, I have shown the speed as a function of time. Driver 19393 has a top speed of approximately 17 miles per hour.  In the bottom plot, I have shown the acceleration as a function of time. Driver 19393 has a maximum acceleration of approximately 0.2 meters per second-squared. This is equivalent to 0 to 60 in a whopping 134 seconds.

Now, one might be inclined to ask if this driver was driving in traffic. I can not say absolutely definitively, but I know this trip happened between 12:00am and 12:30am. I've actually driven near that highway junction close to midnight quite often, and have yet to encounter any traffic.

So maybe Driver 19393 is our mystery driver? We'll see if deeper exploration reveals anything different...

No comments :

Post a Comment