Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Spatial Cross-Validation: Estimating the Generalized Error in the Worst Scenario

1 minute read

Published:

Most recently, there has been an increase discussion towards the use of Spatial Cross-Validation (SCV) techniques to assess models learned from data with spatial dependence. While some authors advocate for always use the SCV, others does not recommend its usage since it can produce pessimistic results. In my point of view, the answer to wether your should or should not use such validation technique lies on your sample distribution, as dicussed in (cite), and how confident you are that the spatial dependence structure observed will be the same on the out of sample data. If I was not clear, imagine a sample dataset S, with a distrbution D presenting a spatial dependence structure. If S is representative enough so D is the same for the out of sample data. Them the traditional Cross-Validation (CV) will not overstimate the generalized error. On the other hand, if you do not have the confidence that the sample distribution D will be observed in the out of sample data, or you know that your sample is biased (e.g, you collected data from spatial clustered areas), then SCV is a better option than the traditional CV. However, you need to keep in mind that SCV will evaluate your model on the worst scenario, when the test set dependence structure is not the same as in the training set. In this way, the best models will be those that can generalize independently of the spatial data structure. In summary, the SCV is not a final validation technique for data with spatial dependence, but it is theoretically stablished method to evaluate your models on the worst scenario situations.

portfolio

Validating Machine Learning Models Under Spatial Dependence

Although cross-validation is a widely used procedure for assessing machine learning models, it can leads to optimistic results when evaluating data with spatial dependence. The spatial dependence structure observed in the sample dataset may not be present on the out of sample data. In this way, the model error will be biased and no reflect the error on the entire population. Alternatives to deal with spatial dependence in model validation fall into a worst scenario situation, where the objective is to asses the model when the training data does not present the same spatial dependence as in the test. Thus, the models selected will be those that can best learn indepently of spatial dependence structure. However, to create this scenarios is not an easy task. We need to remove training data to ensure independence between test and training data without increase the probability of overfitting. This project addresses these issues by proposing a graph-based spatial cross-validation approach to assess models learned from spatially contextualized datasets.

publications

A Fuzzy Variant for On-Demand Data Stream Classification Permalink

Published in BRACIS, 2017

T. P. da Silva, G. A. Urban, P. De Abreu Lopes and H. de Arruda Camargo, “A Fuzzy Variant for On-Demand Data Stream Classification,” 2017 Brazilian Conference on Intelligent Systems (BRACIS), 2017, pp. 67-72, doi: 10.1109/BRACIS.2017.60.

A Fuzzy Multiclass Novelty Detector for Data Streams Permalink

Published in FUZZ-IEEE, 2018

T. P. da Silva, L. Schick, P. de Abreu Lopes and H. de Arruda Camargo, “A Fuzzy Multiclass Novelty Detector for Data Streams,” 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp. 1-8, doi: 10.1109/FUZZ-IEEE.2018.8491545.

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.