Through my years at Turnkey, I have built a multitude of predictive models for clients, on a regular basis. I hope to share some simple observations from these models that reflect the importance and quality of data that goes into these models. Turnkey offers two types of models, Priority and Capacity, that are customized based on the client and my role at Turnkey is to build these models. The Priority model predicts how likely a customer is to purchase a large package with an organization (such as a full-season ticket) and the Capacity model predicts how much that customer is expected to spend over the course of a year with the organization. Over 100 different Acxiom fields are considered in building each of these models and about 50 are used per model. I set out to examine which fields are the most important to predicting the purchase behavior, and which are the most frequently appearing fields in the models.
To answer this, I took a sampling of Turnkey’s already existing custom Priority and Capacity models and found the top 10 most important features in each model and then examined which ones appeared most often in the respective top 10s. Also I consider which of the features tended to rank highly. Put simply, feature importance is determined by how negatively the model accuracy is impacted if that feature was absent from the model. The 43 Priority models used in the analysis have a total of 29 unique features that appeared somewhere in the top 10 most important factors, while the 39 Capacity models have 56 unique features in the top 10, almost twice as much. This is mostly due to the fact that the Capacity model’s algorithm is notably more complicated than the Priority model’s.
I found that the top 5 most prevalent Acxiom fields in the Priority models are Age, PersonicX Cluster, Presence (or Lack) of Children in the Household, Occupation, and Gender. Of these, Occupation and Age are frequently ranked the highest in importance, while Gender and PersonicX Cluster tend to rank a little lower. Other somewhat less prevalent fields that also rank highly are Dominant Vehicle Type (what kind of vehicle is utilized most in the household), Education Level, and Retail Purchase Type (what kinds of Retail purchases the individual frequently make).
As previously stated, the Capacity model uses a much wider array of data – which is likely due to the fact that it’s model building process is significantly more complicated and the fact that separating customers by how much they spend is a more difficult task than predicting who will buy tickets. The 5 most prevalent groups are similarly seen often in top factors of Priority models – County/Distance from Venue, PersonicX Cluster, Education Level, Gender, and Age. These factors are typically the highest ranking.
Interestingly, the Capacity model also makes frequent use of features that the Priority models rarely, if ever, feature in the top 10 most important (although they are definitely still used in the models – just outside the top 10 most important features). Some of these include Online Purchasing, Type(s) of Credit Card Used, Interest in Sports & Leisure, Active Investor, and New Car Buyer.
Each client that has models gets their own memos so they can see, unique to them, what factors are most descriptive of their customers interacting with the team, but this analysis should allow you to see, on an overall scale, what factors influence interaction most frequently and most importantly.