CHAID Analysis

The CHAID Analysis (Chi Square Automatic Interaction Detection) is a method of analysis that defines in what way parameters is combined in best possible way to explain the result in a specified dependent variable. The model can be employed for forecasting and understanding responses, in cases of market penetration, or a multitude of other research queries. This analysis is particularly beneficial for data stating classified values instead of continuous values, because in that case - statistical tools such as regression are not relevant and CHAID analysis is an appropriate tool to determine the correlation among variables. One of the major benefits offered by CHAID analysis is that it helps us to envisage the association between the target (dependent) variable and the linked factors with a tree image.

First of all, range of predictor variables are considered to find out if fragmenting the sample through these predictors indicates a statistically substantial differentiation in the dependent variable. For this Chi square tests and F tests are applied and their P values are found out. The algorithm combines the relevant predictor variables (or categories in case of categorical data), in case of statistically insignificant p values. In case of statistical significant p values, a split is being done. This leads to the first branching of the tree. Subsequently, for each of the groups, the focus would be on - whether they can be further split into subgroups so that there are significant differences in the dependent variable. At the end of the tree building process, a series of groups that are considerably diverse from one another on the dependent variable is obtained. When the main objective is to track a pattern in complicated datasets, CHAID analysis is found to be extremely beneficial.