The Importance of Defining a Research Goal in a Data Science Project

“Hey! You there – Data Scientist. Tell us something about our data.”

In the digital economy, data is the new gold– indeed, there’s a new gold rush – for businesses. However, like gold, data isn’t valuable in its raw state. To obtain value from gold, the raw material first needs to be processed – minted into coins or fashioned into jewelry and other products that consumers desire to own and purchase. Similarly, data needs to be processed – manipulated and analyzed – to extract real business value. And this is where data science comes in. Data scientists are the prospectors and the tools they use are the innovations that make them more effective.

In reality, it is big insights rather than big data itself that properly defines the new data rush for enterprises.The data may be the nuggets, but the insights are the refined bars.

Though of course gathering data is of crucial importance, it is only by conducting meaningful, full-scope data science projects to extract the insights. These insights are needed to solve previously unsolvable problems so that companies can gain competitive advantage, make better evidence-based decisions, and ultimately drive increased profit.

The question, however, is what makes a data science project “meaningful” in the first place? A project that begins with the vague instruction to “Tell us something about our data” will rarely unearth those 24 karat insights that will assist in decision making. More often than not, the answer is found through good research design – and one of the most important elements of good research design is defining the research or business problem in concrete terms at the outset of a project. “Concrete terms” are those that have clear and concise definitions that are easily understood by non-technical people. This is crucial. The data scientist needs is an exact definition of the research or business question – in other words, a clear research goal.

Defining the Research Goal

Even with all the programming, machine learning, mathematics, analytics and visualization capabilities in the world, time and time again the success of a data science project often hinges on goal-setting. Failure to define a clear research goal at the launch of a data science project quickly leads to confusion and havoc – collaboration becomes impossible, actions are misaligned, and nobody will ever know if the goal has been reached.

As such, every data science project should aim to fulfill a precise and measurable goal that is clearly connected to the purposes, workflows, and decision-making processes of the business. What’s more, in order to achieve maximum return on investment (ROI), this goal must be expressed in such a way as to lead to advantageous changes in business operations. To this end, efforts must be centered around identifying the right business question to ask, “How will the results be used?”

With this in mind, the research goal cannot be vague or abstract, but concrete and impactful. This usually requires a process of gradual refinement of the goal in question. For example, the marketing department may begin by requesting a model that will help them determine how to improve lead generation efforts. This is along the right lines, but still a little hazy. Accordingly, the data scientist engages in discussions with the marketing team to gain a better understanding of the business problem and precisely what it is that needs to be optimized…or totally changed. Through these discussions, the goal starts to become clearer – increase the percentage of marketing qualified leads (MQLs) for sales, for example – and keeps getting refined until the goal is concrete and achieves the aims of the marketing department: discover which products merit more promotional support amongst which demographics in which geographic areas. This goal is now clear, as the results can be used advantageously to change the actions the marketing team takes.

Objectives and Key Results (OKRs) – A Goal-Setting System

As the underlying factor behind all data science projects is to fulfill a goal, it should come as no surprise to learn that many organizations use tried and tested goal-setting systems to clarify precisely the goals they are trying to achieve and the results they want to see. One such system is the method first pioneered by Intel and later perfected by Google, known as Objectives and Key Results, or OKRs.

In simple terms, the “Objectives” in OKRs relate to the goal of the project, and the “Key Results” express how that goal will be reached. John Doerr, the man responsible for introducing OKRs to Google and author of Measure What Matters: How Google, Bono, and the Gates Foundation Rock the World with OKRs, explains the purpose of OKRs with the simple formula: “I will accomplish ‘X’ Objective as Measured by ‘Y’ Key Result.”

OKRs must be set at the beginning of a project and define the end-goal.The idea is to choose the most meaningful metric – the highest-ranking component – associated with a project and setting it as the Objective. This shows the goal you want to achieve. Next, the Key Results are defined to display how to reach the Objective. Key Results are aggressive, measurable, and usually limited to three to five per Objective.

OKR software company Atiim provides examples of good OKRs in its “Definitive OKR Goals Infographic”:

Examples of OKRs
Examples of good OKRs

(Image source: atiim.com)

The combination of an overarching Objective with Key Results to work towards provides a team with clear and powerful direction. It aligns everyone’s work and connects employees to an organization’s mission.

Naturally, OKRs can be applied specifically to data science projects. Writing in Towards Data Science, Data Scientist at Volkswagen Group Services Jan Zawadzki gives the example of team working in the automotive industry. They want to develop a driver-assistance function that alerts truck drivers to the presence of pedestrians in urban areas using the truck chassis camera. They agree that a 98% detection rate is a proper stretch-goal for the first quarter of the project. To meet the goal, the team will need a dataset with at least 10,000 labeled images. In addition, they will need adequate time to conduct research and implement a prototype, and still more time and resources to iterate until the goal is reached. Transforming this information into OKRs, Zawadzki gives us this:

Specific OKR example

(Image source: towardsdatascience.com)

OKRs can be used to set goals and Key Results for all manner of projects – from identifying pricing levels that optimize revenue for ecommerce companies, to developing software that assesses the credit risk of customers, to social projects that aim to improve education or healthcare. OKRs enable data scientists to align and prioritize their work, set stretch-goals, track progress, and – crucially – to connect each project with the overarching purposes and objectives of the business.

Final Thoughts

For organizations new to data science, the vast array of tools and techniques available to them can be overwhelming. Is Python or R the right choice for our project? Are we using the right database? While such questions do indeed need to be asked, they are not the most important ones. The most relevant questions center around the business – “What business problem do we need to solve and what data do we have to solve it?” This is where data science projects need to begin – deciding which tool, technology, or algorithm to use comes much later. Though all data science projects are different, they all begin with setting a goal, and OKRs form one of the very best goal-setting systems to keep projects on track for measurable progress and ultimate success.