Google Analytics (GA) is generally a very powerful tool, but there are some things it doesn’t let you do that would be really useful.
For example, as standard, GA only gives you the mean average for time metrics, like time on page, or session (visit) duration. The mean is what most people think of when thinking of an average; for session duration it’s the duration of all sessions added together, divided by the number of sessions.
Most of the time, what services want to know is what the ‘typical’ amount of time spent on a page / the service is. The mean average isn’t a great measure of this, as it’s affected by outliers; a few 1 hour-long sessions will probably have a big impact on the mean, particularly if the service has relatively small numbers of users.
The median average is generally a better measure. The median is the found by ranking all the values (for example, durations of sessions) from smallest to largest, then finding the one that’s half way down the list. This means it isn’t affected by those hour-long sessions anywhere near as much as the Mean is. The way the median is calculated means half of the sessions are shorter (or the same as), and half longer (or the same as) the median. That’s probably better than the mean as an indication of typical use of the service, but you need to be able to see the distribution of session durations to really understand how long users spend on the service.
But the standard GA implementation doesn’t let you have that, all it’ll let you have is the mean. To get around this, we’ve implemented a solution developed by Simo Ahava (a well respected web analyst). By gathering some kind of session ID as a custom dimension, we can see how long each individual session was, and therefore the distribution of durations, and the median.
An example of where this has proved useful was page load time for an internal service we’re supporting (ie a service where the users are staff rather than customers). This is usually only available as the mean average, but because we have gather session ID on this service we were able to see the distribution had two distinct peaks. By segmenting this data, we identified the cause; the office in Glasgow had a poor internet connection, meaning page load times were longer than the other offices.
Although there’s not a happy ending in this case (the Glasgow office still has long page load times), we know this isn’t a problem with our hosting etc we can solve, so aren’t spending/wasting time on an unfixable issue.