Projects

Validating Machine Learning Models Under Spatial Dependence

Although cross-validation is a widely used procedure for assessing machine learning models, it can leads to optimistic results when evaluating data with spatial dependence. The spatial dependence structure observed in the sample dataset may not be present on the out of sample data. In this way, the model error will be biased and no reflect the error on the entire population. Alternatives to deal with spatial dependence in model validation fall into a worst scenario situation, where the objective is to asses the model when the training data does not present the same spatial dependence as in the test. Thus, the models selected will be those that can best learn indepently of spatial dependence structure. However, to create this scenarios is not an easy task. We need to remove training data to ensure independence between test and training data without increase the probability of overfitting. This project addresses these issues by proposing a graph-based spatial cross-validation approach to assess models learned from spatially contextualized datasets.