Understanding the whole predictive story

At Dacture, we currently have three different tools that you can use to better understand your data: Data Exploration, Predictive Modeling, and Statistical Modeling. You might be wondering what type of information you can gather by using each of these tools. This blog will walk you through them by using a customer success data set, where each customer is a company.

Our main goals are to 1) better understand what factors are associated with whether a company churns, and 2) predict whether a particular company will churn.

Step one: Data Exploration

To start, we will go into Dacture’s Data Exploration tool and look at a correlation heat map. A correlation heat map visualizes the relationships between a set of variables. This tool is helpful for gaining a general sense of the patterns present in the dataset.

In the figure below, we are most interested in looking at the last column, as it shows which variables are associated with whether a customer churns (Churn_Y).

Data Exploration chart

From the heat map, we can see that UptimeOver90Days, SupportContactIn180d, Feature 4, and Feature 1 correlate most strongly with whether a company will churn. Specifically, UptimeOver90Days and Feature 1 have a negative relationship with Churn, whereas SupportContact180d and Feature 4 have a positive relationship with Churn.

Step two: Predictive Modeling

Next, we’ll move on to the Predictive Modeling tool, which uses supervised learning to actually predict specific companies who are most susceptible to churning. After setting Churn as our target variable and CompanyID as our identifier, Dacture builds our model.

Predictive modeling output

Above, we see the most impactful predictors and a performance metric score. The first thing we want to check is our performance metric. Matthews Correlation Coefficient ranges from -1 to 1, where 0 is no better than a coin toss. Because our MCC is 0.934 (i.e., excellent), we know that we can rely on the results that the model generates for us.

Consistent with the heat map results, we can see that UptimeOver90Days, SupportContactIn180d, Feature 4, and Feature 1 were our most impactful predictors. It’s important to note that although this plot tells us what the most impactful predictors are, it does not tell us the direction of the relationship. However, we can go back and look at the heat map we generated using the Data Exploration tool to see the direction of the relationship.

Step three: Statistical Modeling

Next, we may be interested in taking a closer look at specific relationships among our variables using the statistical modeling tool. After selecting our data source and specifying Churn as our target variable, we can select UptimeOver90Days and MonthlyCharges as the variables we are interested in using to predict Churn.

Unsurprisingly, based on the most impactful predictors plot from the Predictive Modeling tool, our model shows us that UptimeOver90Days was statistically significant in predicting Churn. We also learn that each additional unit increase in UptimeOver90Days is associated with a 63% decrease in the odds of the company churning.

Although MonthlyCharges was displayed in the most impactful predictors plot from the Predictive Modeling results, its impact was not very high and our statistical model lets us know that MonthlyCharges did not statistically significantly predict Churn.

The Statistical Modeling tool also allows us to plot relationships:

Statistical Modeling chart

From the plot, we can see evidence that when UptimeOver90Days was less than 97, most companies churned (Churn = 1). However, when UptimeOver90Days was greater than 97, most companies did not churn (Churn = 0).

Step four: What-If Scenarios

With a better understanding of how our predictors relate to Churn, we can now start creating “what-if scenarios.”

Let’s take a closer look at CompanyID 5. Here are its original data and its predicted outcome:

What-if original output

What-if predicted outcome

Based on what we’ve learned about Churn’s relationship with various predictors, let’s start playing around with some what-if scenarios.

Through predictive modeling, we’ve learned that UptimeOver90Days is the predictor that has the most impact on Churn. Through the statistical modeling tool, we’ve learned that this predictor has a negative relationship with Churn. More specifically, each unit increase (i.e., percentage) is associated with a 63% decrease in the odds of the company churning. Furthermore, the plot we generated shows that a change in prediction is most likely to occur when UptimeOver90Days is in the 90s.

Currently, UptimeOver90Days is at 100 and the model is incredibly confident that the company will not churn. What if this company’s UptimeOver90Days was 95 instead of 100?

What-if predicted outcome

Now, the model is predicting that the company will churn but is only moderately confident in this prediction.

Finally, let’s change UptimeOver90Days to 85.

What-if predicted output

Now, the model is highly confident that this company will churn.

In summary, all of Dacture’s tools provide insight into how different predictors are related to a target or outcome. Now that we know that UptimeOver90Days is strongly associated with churning, we can strategize ways to increase Uptime for companies.

As always:

  • If you want to learn more about what Dacture can do your for organization, discuss use cases, or see a demo, you can schedule something with us
  • You can contact us if you have questions