Telecom Probable Churn Detection Using ML
Machine Learning (ML) is a widely used term in the present world. Sometimes you will see abbreviated form “ML” is used instead of Machine Learning. We can solve most of our critical business problem by using Machine Learning. What is Machine Learning? Machine Learning is a statistical process of learning pattern from a specific training dataset using a specific computerized algorithm or a set of computerized algorithms in order to make prediction from those learning into another dataset. Today we will discuss about the process of implementing Machine Learning into our business. The purpose of the writeup is to make you understand the step by step process of Machine Learning.
- Problem Statement
- Prerequisite
- Data Collection
- Data Analysis
- Data Imputation
- Data Splitting
- Feature Selection
- Training Models
- Selecting Best Model
- Validating Model Performance
- Conclusion
1. Problem Statement
Churn is the most critical and most challenging part in any business. In some industries attrition is used instead of churn. But things are very much similar. Today we will discuss about churn problem of telecommunication industry. There are more than one telecommunication operators in most of the countries. So, customers have multiple options in their hands. Telecommunication churn occurs when customer stop doing business with the operator. If any operator is unable to control churn, then the operator will loose it’s market share day by day. Higher churn not only decrease the REC (Revenue Earning Customer) base but also decrease the overall revenue. Today we will discuss about a problem of XYZ telecommunication company. Presently churn rate of this company is 27% wheres winback rate is about 19%. Generally winback program is a reactive approach. XYZ company is spending huge amount of money for that. But applying different kinds of lucrative offers, company failed to increase gross winback rate comparing to gross churn rate. So, Base Management Team of this company is facing a great challenge to retain monthly targeted base. Since, monthly gross churn rate is higher than the monthly gross winback rate and if by any chance gross add rate become lower than the net churn, then net add become negative. Acquisition Team of this company is also facing a great challenge to increase month on month gross add rate. In that process, Management have to spend huge amount of money for winback and acquisition purposes in every month, which was affecting the overall profitability of the company. Recently Management have learned about Advanced Data Analytics and built a Advanced Data Analytics (ADA) team to apply different kinds of data analytics program in different sectors of the company. Now ADA team have started their works in the following steps to identify most probable customers using machine learning by which BMT can run cost effective special campaign for the targeted customers before they going to churn. So, let’s see the overall process.
2. Prerequisites:
Before getting started following things are required. ADA team has all of the following resources.
- Data Analytics Human Resource (s)
- Well Configured Computer
- Data Analytics Software
- Different Machine Learning Algorithms
- Data Availability
3. Data Collection:
ADA team can collect data in different ways.
- Direct data from production system
- Data through IT department
- Data through BI or reporting team
In the production system data are stored in raw level. To make a profile from the raw data you have to know the data dictionary of the production system. Then you can make a profile for each customer. Let’s think ADA team have collected the raw data from IT team with data dictionary and after that they have made a profile.
4. Data Analysis:
Let’s see the data structure of the profiled dataset:
'data.frame': 7043 obs. of 21 variables: $ CustomerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ... $ Gender : chr "Female" "Male" "Male" "Male" ... $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ... $ Partner : chr "Yes" "No" "No" "No" ... $ Dependents : chr "No" "No" "No" "No" ... $ Tenure : int 1 34 2 45 2 8 22 10 28 62 ... $ PhoneService : chr "No" "Yes" "Yes" "No" ... $ MultipleLines : chr "No phone service" "No" "No" "No phone service" ... $ InternetService : chr "DSL" "DSL" "DSL" "DSL" ... $ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ... $ OnlineBackup : chr "Yes" "No" "Yes" "No" ... $ DeviceProtection: chr "No" "Yes" "No" "Yes" ... $ TechSupport : chr "No" "No" "No" "Yes" ... $ StreamingTV : chr "No" "No" "No" "No" ... $ StreamingMovies : chr "No" "No" "No" "No" ... $ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ... $ PaperlessBilling: chr "Yes" "No" "Yes" "No" ... $ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ... $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ... $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ... $ Churn : chr "No" "No" "Yes" "No" ...
There are 7,043 observations with 21 different variables. Let’s do some minor changes in the profiled dataset which will change the dataset as follows:
'data.frame': 7043 obs. of 21 variables: $ CustomerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ... $ Gender : chr "F" "M" "M" "M" ... $ SeniorCitizen : chr "0" "0" "0" "0" ... $ Partner : chr "1" "0" "0" "0" ... $ Dependents : chr "0" "0" "0" "0" ... $ Tenure : int 1 34 2 45 2 8 22 10 28 62 ... $ PhoneService : chr "0" "1" "1" "0" ... $ MultipleLines : chr "0" "0" "0" "0" ... $ InternetService : chr "DSL" "DSL" "DSL" "DSL" ... $ OnlineSecurity : chr "0" "1" "1" "1" ... $ OnlineBackup : chr "1" "0" "1" "0" ... $ DeviceProtection: chr "0" "1" "0" "1" ... $ TechSupport : chr "0" "0" "0" "1" ... $ StreamingTV : chr "0" "0" "0" "0" ... $ StreamingMovies : chr "0" "0" "0" "0" ... $ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ... $ PaperlessBilling: chr "1" "0" "1" "0" ... $ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ... $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ... $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ... $ Churn : chr "0" "0" "1" "0" ... After that ADA has converted all character variable into categorical variable. It is very much simple in R. Just you need a line of code to convert all character variables into categorical variable within a second.
5. Data Imputation:
Now it’s time to analyze missing values in the dataset. There are many ways to see missing values in R. To analyze missing values in different variables of dataset ADA have used their own developed function. Here is the outcome of that function:
variable_name variable_type record_count unique_count empty_count null_count missing_count 20 TotalCharges numeric 7043 6531 0 11 11 1 CustomerID factor 7043 7043 0 0 0 2 Gender factor 7043 2 0 0 0 3 SeniorCitizen factor 7043 2 0 0 0 4 Partner factor 7043 2 0 0 0 5 Dependents factor 7043 2 0 0 0 6 Tenure numeric 7043 73 0 0 0 7 PhoneService factor 7043 2 0 0 0 8 MultipleLines factor 7043 2 0 0 0 9 InternetService factor 7043 3 0 0 0 10 OnlineSecurity factor 7043 2 0 0 0 11 OnlineBackup factor 7043 2 0 0 0 12 DeviceProtection factor 7043 2 0 0 0 13 TechSupport factor 7043 2 0 0 0 14 StreamingTV factor 7043 2 0 0 0 15 StreamingMovies factor 7043 2 0 0 0 16 Contract factor 7043 3 0 0 0 17 PaperlessBilling factor 7043 2 0 0 0 18 PaymentMethod factor 7043 4 0 0 0 19 MonthlyCharges numeric 7043 1585 0 0 0 21 Churn factor 7043 2 0 0 0 22 PredictedTotalCharges numeric 7043 6995 0 0 0 Total missing values: 11
It seems that only one variable “TotalCharges” has 11 missing values. If all variables types are OK then dataset is ready to make model for imputing missing values. There are many algorithm for missing value imputation. ADA has used one of them. Here is the variable importance plot from that algorithm:
In the plot ADA has seen that Tenure is the most important variable to predict missing value of TotalCharges followed by MonthlyCharges, InternetServices and so on…After successfully training machine using a specific algorithm, ADA has imputed the missing value using the developed model.
6. Data Splitting
Data splitting is an important part before modeling. You have to take random data from all categories to train machine to predict. The process is not easy by manual activities. There are many algorithms to do this job. ADA has selected any one of them. After selecting the splitting algorithm you have to select splitting ratio. Here are some split ratio which are commonly used to split the dataset into training & validation.
Training Dataset% | Validation Dataset% |
50% | 50% |
60% | 40% |
70% | 30% |
You can follow any of the above ratio. There is no bindings to take only above ratio. But Most of the cases ADA choice 70%-30% ratio.
7. Feature Selection
Feature selection is one of the important part before model building. If you select unnecessary features or variables it will kill your valuable processing time. So you have understand which features you should take or which features you should exclude. You don’t need to do all this thing with your open eyes. There are so many algorithms in this connection. CustomerID variable is the UniqueID of Customers. So, it will be wise to exclude the variable from the model. ADA has also checked other correlated variables in the independent variable list.
Tenure | MonthlyCharges | TotalCharges | |
Tenure | 1.0000000 | 0.2482966 | 0.8255787 |
MonthlyCharges | 0.2482966 | 1.0000000 | 0.6515933 |
TotalCharges | 0.8255787 | 0.6515933 | 1.0000000 |
Yes, it seems that TotalCharges and Tenure are very much correlated. You can exclude one of the variable from these two. If you want to exclude one then you can choice TotalCharges variable to exclude. But ADA has not excluded the variable since it will slightly decrease the performance of model by 0.5%. On the other hand number of variables are also not so much. Now, let’s see the features/variable importance using the an algorithm
Variable Name | Importance Value | Status |
Tenure | 40.0291093 | Confirmed |
Contract | 37.1939485 | Confirmed |
TotalCharges | 35.5461398 | Confirmed |
MonthlyCharges | 28.1973298 | Confirmed |
OnlineSecurity | 22.0938315 | Confirmed |
TechSupport | 21.7071516 | Confirmed |
InternetService | 20.3317942 | Confirmed |
PaymentMethod | 13.0625411 | Confirmed |
OnlineBackup | 12.5324037 | Confirmed |
DeviceProtection | 11.7226860 | Confirmed |
PaperlessBilling | 8.7510135 | Confirmed |
StreamingTV | 7.4218089 | Confirmed |
SeniorCitizen | 6.9929649 | Confirmed |
MultipleLines | 6.7559581 | Confirmed |
StreamingMovies | 6.3251025 | Confirmed |
Partner | 4.6631715 | Confirmed |
Dependents | 3.6353647 | Confirmed |
PhoneService | 2.4218330 | Rejected |
Gender | 0.9484301 | Rejected |
Let’s see the variable importance graphically:
Selected algorithm says to reject two variables: Gender & PhoneService. Among these two variable it seems that gender has insignificant importance. So, ADA has decided to exclude only Gender variable and keeping PhoneServices variable. So, finally ADA has decided to exclude two independent variables CustomerID & Gender.
8. Training Models
Now system is ready to train machine using different machine learning algorithms. There are many algorithms in R. ADA has used 14 most popular algorithms to train 14 different models. Overall process will take a good amount of time to train all the models. It will depends mainly on the capacity of your computer processor & random access memory.
9. Selecting The Best Model
It’s time to measure the performance of different models. Let’s see the performance of those 14 Machine Learning algorithms:
According to the above picture, it is clear that “xgbDART” algorithm is showing the highest performance in the 95% confidence level among all models. So, ADA team has selected xgbDART as their final algorithm. Let’s see the relative importance of variables used by the final algorithm:
10. Validating Model Performance
It’s time to validate validation dataset with model of selected Machine Learning algorithm. After fitting the validation dataset, following result is found. Let’s have a look into the result:
Specificity of the model is 50.9% and sensitivity of the model is 90%. Overall performance of the model is 79.3%. So, the accuracy level of the model is very good. The beauty of a model is to score the base according to prediction by ML. After applying propensity score the base is distributed into decile from top to bottom. Let’s see the performance of each decile from top to bottom:
It seems that churn percentage of 1st decile is 73.1% followed by 2nd decile by 57.8% and so on wheres average churn percentage is 26.5%. ADA suggested the management to target top 30% customers with suitable retention program. Now Management as well as Base Management Team & Acquisition Team are very much happy that they are going to start scientific proactive approach to target most probable top N% customer by which gross churn rate will be reduced from now and onwards.
11. Conclusion
Hope you have enjoyed the writeup. For the writeup we have used sample telecom dataset from IBM. We have used here the most popular open source Data Analytics software R and different R supported Machine Learning algorithms to solve the business problem. If you have any query related to this writeup, please feel free to write your comments on our facebook page. Please note that the purpose of this writeup is not to introduce the script of R but is to make you understand the step by step process of Data Analytics as well as application of Machine Learning in the real business world. Next time we will come with another example of another type of industry. If you want to get updated, you can subscribe our facebook page http://www.facebook.com/LearningBigDataAnalytics.