In this section, let’s talk about how to validate different approaches.
Predictive Power vs. Total Causal Impact – There are two important uses of multiple regression i-e.,
- Prediction
- Causal Analysis
The aim of the predictive analysis is to develop a formula that makes predictions about the dependent variable based on the observed values of the independent variables. However, in causal analysis, the independent variables are regarded as causes of the dependent variable.
The predictive power is measured by the R2 of the driver analysis. But be aware that the R2
and the predictive power only measures the validity of the direct pass to the direct causal impact. It does not measure the importance and the role of the indirect effects. You need to look at the R2 of any variable that has drivers. In the causal context, there are intermediary variables like sentiment as well as the final outcome. Therefore, if you need to fully understand the network, you have to look at all of those outcomes.
Cross Validation – It is a useful sampling technique for assessing the effectiveness of your model. It tries to check for overfitting when you have a small dataset, but a large number of drivers. So, it’s good to use machine learning approaches like cross validation to check out how large the overfitting radius is so that we can minimize it.
There is another resampling technique known as Jackknife. It takes your dataset and splits it into a number of different (for instance, ten) pieces. You simply take out one piece at one time and predict the part that you’ve taken out. So, you can use cross validation methodology to calculate the predictive power of the unseen data.
Impact vs. Effect Strength – You know that impact is the influence of an action or a phenomenon, whereas the effect is the consequence or outcome of a phenomenon. In short, impact refers to how the consequence of some action is going to affect someone or something. However, the effect only refers to the consequences.