In Analytics, Machine Learning

In the era of big data analytics, there may still be room for human input and judgement.

A recent Harvard Business Review article discusses the very real likelihood of reaching different conclusions from the same data. The article recounts how multiple teams of analysts were given the same question to answer and the same data set to research. Of 29 teams working the problem, 20 found a statistically significant relationship that answered the question. Nine team found no significant relationship.

In the end, the teams “converged toward agreement” that there was “a small, statistically significant relationship,” the cause of which was “unknown.”

This phenomenon could be helpful. If you have the luxury of multiple teams, you can generate a more thorough investigation and debate. This phenomenon could also be bad, an endless sort of analysis paralysis.

Big data only magnifies this problem. Imagine multiple teams working with multiple data sets, each of which is relevant to the answer, but none of which is sufficient by itself.

How can you tackle this?

Aside from compromise or consensus answers, the article mentions averaging different conclusions as another possible approach.

In big data analytics, you might substitute multiple algorithms for multiple teams. Ensemble methodologies have gained strong traction recently. For example, the Netflix Prize was won by an ensemble methodology (RBM). It’s fair to say, ensembles of regression trees (BT) are the most popular methodology for classification. Amex, for example, uses BT for fraud and credit worthiness.

Outside the application of analytics, business considerations might provide additional, deciding constraints for sorting out multiple approaches. Feasibility, budget or timeline for implementation, safety, regulatory constraints and other considerations could be the deciding factor when choosing an algorithm. For example, a financial company could use a BT for training their analytics at scale, but once in production they may switch to using simple regression-based classification to stay in compliance with regulations.

Having data supporting your conclusions is usually better than having no data. Better yet is a thorough examination of methods behind your analytical approach to deriving and applying value from big data.

Recent Posts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.