A Recipe For Building A Data Strategy

A Recipe For Building A Data Strategy
Summary: This article details a results-focused approach to building a data strategy for learning initiatives.

Cooking Rice Takes Too Long!

An evidence-based conclusion I made as a child: rice takes way too long to cook! The evidence—my mom's answer every time I ask when dinner will be done: "As soon as the rice is done." Needless to say, as soon as I was cooking my own rice, I quickly found out that rice doesn't take long to cook, and my mom always made it last to serve it hot.

This article is not about rice, it's about the importance of putting together a solid data strategy when we want to leverage data to make intelligent conclusions and decisions about the learning experiences we serve up to our learners.

Why Do You Need To Invest In A Data Strategy?

Unless your learning content is free floating on connected devices and multiple web platforms (think PDF document, video file, etc., shared via email), it is sitting within a web platform that is capable of collecting at least some data about your content and learners. A good start, but without a data strategy in place, it's unfortunately a good start in the wrong direction.

Almost all learning platforms will require a learner to create a profile and provide some basic information about themselves. Learning content is uniquely identifiable, and learner progress is sometimes, and to varying extents, tracked. Looks great so far! A reporting engine capable of generating multiple types of reports and graphs at the click of a button completes the picture. This is when many are led to believe that they have the tech in place to capture the value of their data, leading to a thought process centered around tech. What can be expected down this path is a lot of frustration, disappointment, and tonnes of value left on the table.

Let's put the tech aside for a while and explore a different path towards extracting the most value out of our data.

Building A Data Strategy

To jump straight into data strategy, we need to accept the following statements to be true and assume them to be the solid foundation on which we build our strategy:

  • Protecting our learners' privacy is not negotiable.
  • Data is the new gold; let's not leave too much of it on the table.
  • The journey to AI is fueled by data.

Its important to note that a data strategy is expected to remain relevant for an extended period of time, and thus should result in a system that is robust but also malleable to allow for iterative enhancement along the way.

The data strategy should tackle three main components:

  1. Data collection
  2. Data storage
  3. Data utilization

The data strategy should tackle three main components.

Collecting The Right Data

More is not necessarily better, as it may add unnecessary complexity and costs down the line. The most important, and possibly the most time consuming, part of building a data strategy is deciding on what data should be collected.

At the strategy level, we may not need to define granular dimensions of the data we want to collect but rather focus on the meta-dimensions. I.e., We need to generate a list of data points (or types) with associated attributes and an answer to who needs this data and why. I like to follow a top-down approach, starting from the stakeholders as described below.


Who will benefit from data is highly dependent on the nature of the learning initiative in focus. However, in all scenarios, the learner is a primary stakeholder through the learning experience they are being served and the data they can utilize about their own learning journey. Course instructors/facilitators are stakeholders in facilitated learning journeys, while course creators may be more prominent stakeholders in self-paced course formats. In commercial learning applications, business owners, along with their sales and marketing teams, are major stakeholders. Non-commercial, public awareness, and corporate social responsibility (CSR) initiatives depend highly on entities funding those initiatives, which makes them major stakeholders as well. You get the idea.

Since the learning experience is first, foremost, and always the primary focus, it's good practice to split stakeholders into two categories: learning and other stakeholders.

Stakeholder Metrics

Once the stakeholders are identified, we need to know what they care about, what their key objectives are, what their performance metrics are, etc. Multiple approaches may be utilized to capture this information from stakeholders. It's not within the scope of this article to expand on these approaches; suffice it to say that I lean heavily towards the use of inclusive, collaborative, and human-centred approaches.

While different stakeholder groups may be investigated separately, there may be overlapping interests which need to be identified and consolidated into a final list of metrics in preparation for the next step.

Data Points

When extracting data points from the metrics we identified already, we may opt for a multi-level approach increasing in granularity. For example, at the highest level, we may group data points into types which share the same attributes, such as registered learner demographic data and platform visitor demographic data. The reason we separate these into two types is because they don't share the same attributes such as the source of data. We can expect registered learner demographic data to be stored on our learning platform, while visitor demographic data may be sourced from Google Analytics or similar applications. Below are some of the data attributes we should be investigating and recording at this point:

  • Source of data – Where is the data coming from? It is a basic learning event tracked by my LMS from a SCORM package it hosts. Is it a more advanced learning event, in an xAPI package, stored on a Learning Record Store (LRS) independent of my LMS? Or is it an LTI package from a third-party activity provider that I've integrated into my LMS? Is it a web form?
  • Ownership of the data – Who owns the data I need, and do I have the required permissions to access this data?
  • Access methods – How can I access the data? What method should I use to query the data? Do I need to build a custom integration?
  • Storage format – Is the data stored in an SQL DB, JSON statements in an LRS, or in simple flat files?
  • Anonymization – Is the data anonymized, or would I need to manipulate the data to secure my users' privacy?

Data Storage And Utilization

The thorough work done on laying out our data landscape in the data collection strategy processes feeds into building our data storage and utilization strategies. What we can easily consolidate at this point is a list of all the different data sources, our access rights to each, methods of access, etc.

The reason data storage and utilization need to be investigated together is that they are very much interconnected, whereby how and where our data is stored influences what we can do with the data and vice versa. My preferred approach is to first go back to the stakeholder metrics we collected previously and start mapping out how we plan to use our data to provide intelligence about our metrics. This will, on the one hand, inform us of the type of reports, dashboards, and visualizations we need to produce and, on the other hand, how we would like to relate and analyze our data points against each other. Think about the intelligence we can produce from correlating marketing data with course purchase decision data and learner performance data! There are opportunities to enhance the learning experiences we deliver to different groups of people, hone our marketing efforts, and acquire more learners.

Finally, we can answer key questions to make intelligent decisions about how we will store our data. Questions such as: Can we utilize our data as we plan to with the distributed data as is? Do we need to consolidate data into a central database to achieve this? What data can we consolidate? And do we need data transformers to prepare data for consolidation? What measures need to be taken to ensure user data privacy?


As previously mentioned, we need to ensure that our strategy allows for iteration and continuous improvement. Stakeholders may be inspired by the effectiveness of the system we built and think up new metrics to measure. Organizational changes may force a change in how we utilize our data. Or, a sensational learning experience designer may suggest an innovative way to utilize data to enhance the learning experience they design for our learners.

Learning Data Plan

To ensure that we keep our focus on our learners and utilize our data strategy to deliver the best learning experiences we possibly can to our learners, it's encouraged to further our effort on the learning content level. Incorporating the development of a data plan into the learning experience design process for every course, learning object, etc., allows us to granularly define what data we should collect and analyze at this level. The opportunities to collect learning events are endless when utilizing standards designed for that, such as the xAPI standard, so let's plan for it!

Rice Does Not Take Too Long To Cook, A Data Strategy Does!

For those still wondering, rice takes about ten minutes to cook and an additional ten minutes of patience to let it rest. It doesn't need a data strategy to come to this intelligent conclusion. Building a data strategy does take effort and a collaboration of individuals with multiple skillsets, including learning experience designers, data scientists and developers, Subject Matter Experts, and business stakeholders.

While Artificial Intelligence and the plethora of AI-enabled tools were not discussed in this article, it's important to acknowledge that we are on a journey towards the integration and utilization of more and more AI-enabled tools in digital education. Having a data strategy and a thorough understanding of our data enables us to make better decisions as we go through our AI enablement journey.

eBook Release: Kashida
We are about learning, simply.We design and create custom learning content and deliver it across multiple platforms, always enriching learning with technology.Gold winners at Learning Technologies Awards UK 2018 for Best Learning Technologies Project