Statistics and analytics are two branches of data science that share many of their early heroes, so the occasional beer is still dedicated to lively debate about where to draw the boundary between them. Practically, however, modern training programs bearing those names emphasize completely different pursuits. While analysts specialize in exploring what’s in your data, statisticians focus more on inferring what’s beyond it.

Human search engines

When you have all the facts relevant to your endeavor, common sense is the only qualification you need for asking and answering questions with data. Simply look the answer up.

Want to see basic analytics in action right now? Try Googling the weather. Whenever you use a search engine, you’re doing basic analytics. You’re pulling up weather data and looking at it.

Even kids can look facts up online with no sweat. That’s democratization of data science right here. Curious to know whether New York is colder than Reykjavik today? You can get near-instant satisfaction. It’s so easy we don’t even call this analytics anymore, though it is. Now imagine trying to get that information a century ago. (Exactly.)

When you use a search engine, you’re doing basic analytics.

If reporting raw facts is your job, you’re pretty much doing the work of a human search engine. Unfortunately, a human search engine’s job security depends on your bosses never finding out that they can look the answer up themselves and cut out the middleman… especially when shiny analytics tools eventually make querying your company’s internal information as easy as using Google Search.

Inspiration prospectors

If you think this means that all analysts are out of a job, you haven’t met the expert kind yet. Answering a specific question with data is much easier than generating inspiration about which questions are worth asking in the first place.

I’ve written a whole article about what expert analysts do, but in a nutshell they’re all about taking a huge unexplored dataset and mining it for inspiration.

“Here’s the whole internet, go find something useful on it.”

You need speedy coding skills and a keen sense of what your leaders would find inspiring, along with all the strength of character of someone prospecting a new continent for minerals without knowing anything (yet) about what’s in the ground. The bigger the dataset and the less you know about the types of facts it could potentially cough up, the harder it is to roam around in it without wasting time. You’ll need unshakeable curiosity and the emotional resilience to handle finding a whole lot of nothing before you come up with something. It’s always easier said than done.

While analytics training programs usually arm their students with software skills for looking at massive datasets, statistics training programs are more likely to make those skills optional.

Leaping beyond the known

The bar is raised when you must contend with incomplete information. When you have uncertainty, the data you have don’t cover what you’re interested in, so you’re going to need to take extra care when drawing conclusions. That’s why good analysts don’t come to conclusions at all.

Instead, they try to be paragons of open-mindedness if they find themselves reaching beyond the facts. Keeping your mind open crucial, else you’ll fall for confirmation bias — if there are twenty stories in the data, you’ll only notice the one that supports what you already believe… and you’ll snooze past the others.

Beginners think that the purpose of exploratory analytics is to answer questions, when it’s actually to raise them.

This is where the emphasis of training programs flips: avoiding foolish conclusions under uncertainty is what every statistics course is about, while analytics programs barely scratch the surface of inference math and epistemological nuance.

Without the rigor of statistics, a careless Icarus-like leap beyond your data is likely to end in a splat. (Tip for analysts: if you want to avoid the field of statistics entirely, simply resist all temptation to make conclusions. Job done!)

Analytics helps you form hypotheses. It improves the quality of your questions.

Statistics helps you test hypotheses. It improves the quality of your answers.

A common blunder among the data unsavvy is to think that the purpose of exploratory analytics is to answer questions, when it’s actually to raise them. Data exploration by analysts is how you ensure that you’re asking better questions, but the patterns they find should not be taken seriously until they are tested statistically on new data. Analytics helps you form hypotheses, while statistics lets you test them.

Statisticians help you test whether it’s sensible to behave as though the phenomenon an analyst found in the current dataset also applies beyond it.

I’ve observed a fair bit of bullying of analysts by other data science types who seem to think they’re more legitimate because their equations are fiddlier. First off, expert analysts use all the same equations (just for a different purpose) and secondly, if you look at broad-and-shallow sideways, it looks just as narrow-and-deep.

I’ve seen a lot of data science usefulness failures caused by misunderstanding of the analyst function. Your data science organization’s effectiveness depends on a strong analytics vanguard, or you’re going to dig meticulously in the wrong place, so invest in analysts and appreciate them, then turn to statisticians for the rigorous follow-up of any potential insights your analysts bring you.

Image for post

You need both!

Choosing between good questions and good answers is painful (and often archaic), so if you can afford to work with both types of data professional, then hopefully it’s a no-brainer. Unfortunately, the price is not just personnel. You also need an abundance of data and a culture of data-splitting to take advantage of their contributions. Having (at least) two datasets allows you to get inspired first and form your theories based on something other than imagination… and then check that they hold water. That is the amazing privilege of quantity.

Misunderstanding the difference results in lots of unnecessary bullying by statisticians and lots of undisciplined opinions sold as a finished product by analysts.

The only reason that people with plenty of data aren’t in the habit of splitting data is that the approach wasn’t viable in the data-famine of the previous century. It was hard to scrape together enough data to be able to afford to split it. A long history calcified the walls between analytics and statistics so that today each camp feels little love for the other. This is an old-fashioned perspective that has stuck with us because we forgot to rethink it. The legacy lags, resulting in lots of unnecessary bullying by statisticians and lots of undisciplined opinions sold as a finished product by analysts. If you care about pulling value from data and you have data abundance, what excuse do you have not to avail yourself of both inspiration and rigor where it’s needed? Split your data!

If you can afford to work with both types of data professional, then hopefully it’s a no-brainer.

Once you realize that data-splitting allows each discipline to be a force multiplier for the other, you’ll find yourself wondering why anyone would approach data any other way.