What is Adjusted R-Squared?
Adjusted R-Squared is a statistical measure used to evaluate the goodness of fit for regression models, providing an adjusted version of the regular R-squared (coefficient of determination). While R-squared measures the proportion of the variance in the dependent variable that is explained by the independent variables in the model, it has a limitation: R-squared always increases when more independent variables are added, even if they don’t improve the model’s true predictive power. This is where Adjusted R-Squared becomes crucial.
Adjusted R-Squared accounts for the number of predictors in the model relative to the number of data points. It adjusts the regular R-squared by considering the degrees of freedom (i.e., the number of predictors and sample size), penalizing models that have unnecessary predictors. This adjustment helps ensure that the model only improves if the added variables genuinely contribute to explaining the variance in the dependent variable.
The formula for Adjusted R-Squared is:
In essence, this formula adjusts the R-squared downward when irrelevant predictors are included and only increases when additional predictors enhance the model’s explanatory power. This makes Adjusted R-Squared a more reliable metric when comparing models, especially those with varying numbers of independent variables.
A higher Adjusted R-Squared indicates that the model fits the data well, considering the number of predictors. Unlike R-squared, it can decrease if unnecessary variables are included in the model. Therefore, it strikes a balance between model complexity and explanatory power, helping to avoid overfitting by ensuring that only meaningful variables improve the model.
In summary, Adjusted R-Squared is a valuable tool for assessing the fit of regression models, providing a more accurate measure of performance by penalizing overfitting and rewarding true predictive improvements.