On “Lowering the Cost of Curiosity”

Earlier this week, I gave a talk entitled, “It’s Not ‘Big,’ It’s Complex: An Outsider’s Perspective on Retail Data” at the National Retail Federation’s Big Show 2016. During the / that talk, I asked, “how might we lower the cost of curiosity?”

I wish I could take credit for the idea of “lowering the cost of curiosity,” but I can’t; I am not entirely sure of the phrase’s provenance. A former colleague once used it in a conversation with me and it resonated. Analysts tend to be intellectually curious and tenacious; when presented with a problem (or a question or an outlier), they often want to try gain insight into why they are seeing it and what it might mean.

The data on hand, however, might not be enough be enough to allow them to satisfy their curiosity.

Within a retailer, for example, there might be large volumes of point-of-sale (POS) / transaction log (TLog) data, or customer data (e.g., loyalty cards, etc.). Contextual data such as e-receipts, competitive pricing, or weather data (to name just a few examples of contextual data) might exist within or outside of the organization.

Within an organization, where and how the data is stored often influences the cost of curiosity: different data warehouses, different data formats, and different data governance policies, for example, all might raise the cost of curiosity. Contextual data that exists outside of the organization is a different problem: the factors influencing the cost of curiosity often revolve around how quickly the data can be acquired and put into an analytically-friendly format.

Given that time is often the most precious commodity in any organization, acquiring and formatting data represents an opportunity lost: tactically speaking, the analyst—or manager or executive—that had the question likely has moved on to the next question.

Why an opportunity lost? I assume that “hidden” opportunities and risks lurk in data, particularly when corporate data holdings are joined with contextual data. On the one hand, there might be a degree of bliss in ignorance; from the perspective of opportunity analysis or risk management, ignorance is rarely a friend.

In my experience, it is difficult—if not impossible—to presuppose what questions an analysts might have on a day-to-day basis.

As a result, the ease with which ad hoc joins can be made is another important factor influencing the cost of curiosity. The joins can be made on common facets of the data (e.g., date/time stamps), though they might need to be normalized between data sets (e.g., think of all the ways that date/time stamps are presented). Many analytic platforms address the importance of joins, the question is how easy it is for the user to make those joins. In 1010data’s Trillion Row Spreadsheet, for example, it is a matter of matching one or more columns in two tables.

I am not talking about the Tier 0 questions that are often reflected in reports; I am talking about questions that the reports might generate: are increases in the volume of sales resulting in an increase in profitability? Why or why not? What are, or seem to be, the most significant variables at play? If we cannot answer the question to our satisfaction, what information might provide us with better insight?

I am also talking about the questions that have been set aside in explicit recognition that corporate data holdings and enterprise information technologies are simply not up to the task of letting users get insight into the questions that they really have. This might be a series of iterative queries drawing on different combinations of data or it might be the development of a simple model.

In any case, analytic curiosity is a resource: the availability of data and the ease with which that data might be used—particularly in unanticipated ways—affects the cost of curiosity. The question for organizations that rely on analyses to inform their decisions is how are they lowering (or inadvertently raising) the cost of curiosity.

If you like this post, you might want to watch “The Cost of a New Question,” a Webinar given by Afshin Goodarzi, 1010data’s Chief Analyst, for a closer look at how 1010data creates the opportunity for quickly developing new analytic insights from massive volumes of data.