in Data, data science

In August of 2019, I was invited by my friend Preston Adie of Africa’s Talking to attended a panel discussion called “AI in Kenya” with 3 other smart practitioners in the Kenyan data science space: Babatunde Ekemode of Africa’s Talking, Skyler Spearman of IBM, and Fiona Rasanga of Griffin Insurance.

The discussion was wide, ranging from what we thought the state of data science in Kenya was to what trends we saw. It was interesting, and I hope some video of the conversation comes out.

Each member of the panel was asked to make a short presentation to seed some of the anticipated conversation. I chose to talk about Design Thinking in Data Science. This is an edited version of what I presented.

Let’s start by looking at fundamentals. From Wikipedia:

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

https://en.wikipedia.org/wiki/Data_science

Personally, this is how I like to think about data science:

The process of using data to tell stories.

Sidney Ochieng

It’s one of the reasons that I like to call myself a storyteller; I think I can tell a good story.

I’ve noticed that data scientists typically tend towards jumping directly to data and immediately begin modelling to answer the question we’ve been asked. However, I think before gettng to that point, it’s a good idea to engage in some design thinking.

Looking back at the two definitions we have of data science, Wikipedia’s and my own, I think the key elements are extracting knowledge and insights and telling stories. We love to think of modelling (extraction) without contemplating much else. But when we talk about knowledge, insight or storytelling we really need to think about more. 

It’s imperative to think about who we’re presenting the knowledge or insight to. We also need to think about things like where the data lives, who owns it and how it was collected. We need to think of all these things so that we are not only answering the right question, using the right data and thinking about any bias in the data, but also so that the insight we get is presented in a way that it is used in the best way.

Data science as a relatively new field is still coming up with processes and ways to think about the issues I’ve brought up but I think that the language of the design thinking can help us as data scientists here.

So what is design thinking?

Design thinking is a human-centered approach to innovation that draws from the designer’s toolkit to integrate the needs of people, the possibilities of technology, and the requirements for business success.

Tim Brown, CEO of IDEO

Design thinking is an ideology supported by an accompanying process. A complete definition requires an understanding of both.

The design thinking ideology asserts that a hands-on, user-centric approach to problem solving can lead to innovation, and innovation can lead to differentiation and a competitive advantage. This hands-on, user-centric approach is defined by the design thinking process and comprises 6 distinct phases, as illustrated below.

If you look at the image above you can see that the design thinking process consists of 6 stages that fall under the broad categories of: understand, explore and materialise. These categories work well for data science as well.

Of this entire process I think the most important part is the part of understanding because without adequate understanding we end up answering the wrong question and after that it won’t matter what model you use.

Under understanding, we have empathise and define. Here you talk not only to the person asking you the question, finding out exactly why they’re asking the question, how they hope to use the knowledge you generate, but also talk to other stakeholders such as the person in charge of the dataset you plan to use. This whole process ensures that not only are you answering the right question but you’re prepared to answer it in the best way for your audience.

The explore phase is the part of data science that most of us think of when we think “data science”. We do exploratory data analysis (EDA), we come up with the: charts, graphs and models to answer the questions we found in the understand phase.

Finally we have the materialise phase. Here we have our results and we present them to our stakeholders. We could have one answer here. For instance, we need to spend more on Instagram ads, but we also have to present evidence to support our answer here. This is the part that the understand phase comes back to help. Because when you understand the needs of your stakeholders, you’re prepared to make the most persuasive arguments.


From the diagram, and from personal experience, you can see that each of these phases can feed into others, but I think it’s important to codify them like this to ensure that we don’t skip any part and are thoughtful about each stage and everything that goes into them. It’s particularly important so that we’re delivering value with our solutions.

If this talk came off as vague, it’s because I’m still thinking about this and clarifying my own thoughts on this. I was heavily inspired by the work of Hillary Parker, a data scientist at Stitchfix, and Roger Peng, a Professor in the Department of Biostatistics at John Hopkins Bloomberg School of Public Health, and their podcast Not So Standard Deviations.

I’ll work to expound on what needs to happen at each step so that it’s clear whether you’re doing enough at each step because even for me I’m likely to skip stages. This will be a continual process. Peace!!

Links:

https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/

https://simplystatistics.org/2018/09/14/divergent-and-convergent-phases-of-data-analysis/