Alma Andersson

Getting to work

Three to four mornings a week, I walk to one of the Genentech shuttle buses, hop on, fall asleep, and wake up slightly disoriented as we arrive on campus. On the other days, I bike or run to work. To some extent this walk is a part of my routine, and when something becomes part of my routine I want to optimize it.

In this case optimization means that I want to find the time by which I should leave from home that:

  1. Guarantees that I do not miss the bus
  2. Minimizes the time I had to wait for the bus to arrive

Both criteria are important; if I only cared about not missing the bus I could leave 2h before the bus departs and just wait there. It would definitely make me not miss the bus, but who wants to stand waiting at a bus stop for 1.5h? However, I don’t want to push it and have to sprint to the bus or risk missing it. In short, it’s a balance of not stressing while also not wasting time just waiting.

To figure this out I first had to collect data 🔍. My process was simple, the moment I stepped out of the apartment door I started the stopwatch on my phone and as I reached the bus stop I paused it while also taking a screen shot. This was something I kept up for about two months, which resulted in 19 data points - the math doesn’t fully add up because I forgot to time my walk multiple times…

With only 19 data points, I could have manually transferred the times from the screenshots into a text file for downstream analysis – but that’s no fun. Instead, I uploaded all the screenshots to my computer and processed them with a simple Python script that: read the images, cropped the area around the watch face, and extracted the text from the cropped sections using OCR. I used pytesseract (a wrapper around Tesseract OCR) for the OCR step – this was very straightforward to use despite no prior experience. You can see some examples below 👇

clock

The red dashed box indicates the cropped out area, “time” is the time format and “minute” is the decimal format of the time.

This is a tiny minuscule dataset that doesn’t really hold up for rigorous statistical analysis, but we can still play around with it just for fun.

To make things simple, we’re going to assume that the data follows a normal distribution i.e.,

$$x \sim \mathcal{N}(\mu,\sigma)$$

which given the shape of the observed data, is at least not horribly wrong:

obs

Next, we put some priors on our parameters $(\mu, \sigma)$ – I always say that it takes on average 20min to get to work so we’ll use that to inform the prior for the mean

$$\mu \sim \mathcal{N}(20,1), \qquad \sigma \sim \textrm{Inverse-Gamma}(2,2) $$

These are the conjugate priors, meaning the conditional probabilities for each parameter has a closed form (given the other parameter and the data). With this we can use Gibbs sampling to get the posterior of the parameters and the posterior predictive of the data.

$$ \mu_* | \sigma_{*}^2, X \sim \mathcal{N}(\hat{\mu}, \hat{\sigma}^2), \quad \textrm{with} \quad \hat{\mu} = \frac{\sigma_0^2\bar{x} + \sigma_*^2\mu_0}{\sigma_0^2 + \sigma_*^2} \qquad \hat{\sigma}^2 = \frac{\sigma_0^2\sigma_*^2}{\sigma_0^2 + \sigma_*^2}$$

and

$$\sigma_*^2 | \mu_*,X \sim \textrm{Inverse-Gamma}(\hat{\alpha},\hat{\beta})$$$$ \hat{\alpha} = \alpha_0 + \frac{N}{2}, \qquad \hat{\beta} = \beta_0 + \frac{1}{2}\sum_{i=1}^{N}(x_i - \mu_*)^2$$

We take 5000 samples and use the first 1000 as a burn-in, which gives us the following distributions 👇

dists

Almost there 🎉 The only thing left to do was to decide how certain I want to be on making it in time for the bus. I went with 90-99% certainty, those are pretty good odds. With that we just need to find the 0.90-0.99 quantile, which we can do in two ways: using the mean posterior estimates of the parameters or the posterior predictive. Both options give fairly similar results.

cdf

From this, I should be in the clear if I leave approximately 21min (20.5min rounded up) before any bus departs. Leaving at this time, I should be able to stroll at my regular pace without ever having to worry about getting late. 😎

Fun fact, a short while after I was done with this little project I moved in with my amazing girlfriend ❤️ I’m obviously happier than ever, but since my route has changed I’m now back to the data collection step again! 💪

#Random