I intend to estimate prediction error for a dataset that includes a binary variable (0 or 1). My plan is to employ a multinomial logistic regression model to gauge the probability of a 0 or 1 response based on various predictor variables. To rigorously evaluate the accuracy of this logistic model, I’m opting for k-fold cross-validation, with k falling within the range of 5 to 10. This approach will help ensure that the model’s performance is robust and not overly influenced by the specific data split.
- Given the limited amount of data available, I’m also contemplating the use of a bootstrap procedure to create additional datasets. However, I’m currently uncertain about whether this is an appropriate strategy for my specific objectives. I plan to seek guidance from my instructors during class discussions to determine the suitability and best practices for implementing bootstrap resampling in this context. This will ensure that my approach to estimating prediction error is both valid and effective