←

Debugging Visualizations with Visualizations | by Shawn Allen

→

Back to photostream

Shawn Allen

Debugging Visualizations with Visualizations

Best viewed at original size.

I've been having some issues with our MoMA-bound Cabspotting visualization lately, and, as is often the case, ended up having to create another visualization just to figure out what the problem was.

Each of the white dots represents a discreet data sample–the location of a specific cab at a particular time. Here, samples for each cab are placed on a separate row and arranged temporally from left to right. More "active" cabs (i.e., the ones with more available samples) are placed at the top.

The green and red marks at the top represent the start and end times of the displayed period. For each cab, an algorithm seeks through the list of segments between each sample that fall within them. The hue corresponds to the position in the line between the start and end of the period: Green lines are closer to the start time, red ones to the end time.

So, what does it show? Primarily, that there is quite a bit of "bad" data in our set. Those long lines at the bottom indicate extended periods of time during which those cabs weren't transmitting their locations. Most cabs tend to ping the depot every 30-60 seconds, but some do it less than once per hour. For the most part, though, the consistency of that green-to-red column seems to indicate that we've got a pretty good idea of where most of the cabs were in that time period, and with a reasonable degree of resolution.

God, I'm such a geek.

3,447 views

14 faves

1 comment

Uploaded on February 8, 2008

Debugging Visualizations with Visualizations

Best viewed at original size.

I've been having some issues with our MoMA-bound Cabspotting visualization lately, and, as is often the case, ended up having to create another visualization just to figure out what the problem was.

Each of the white dots represents a discreet data sample–the location of a specific cab at a particular time. Here, samples for each cab are placed on a separate row and arranged temporally from left to right. More "active" cabs (i.e., the ones with more available samples) are placed at the top.

The green and red marks at the top represent the start and end times of the displayed period. For each cab, an algorithm seeks through the list of segments between each sample that fall within them. The hue corresponds to the position in the line between the start and end of the period: Green lines are closer to the start time, red ones to the end time.

So, what does it show? Primarily, that there is quite a bit of "bad" data in our set. Those long lines at the bottom indicate extended periods of time during which those cabs weren't transmitting their locations. Most cabs tend to ping the depot every 30-60 seconds, but some do it less than once per hour. For the most part, though, the consistency of that green-to-red column seems to indicate that we've got a pretty good idea of where most of the cabs were in that time period, and with a reasonable degree of resolution.

God, I'm such a geek.