Extensões da estatística scan espacial utilizando técnicas de otimização multi-objetivo

AUTOR(ES)
DATA DE PUBLICAÇÃO

2009

RESUMO

This work presents three new extensions of Kulldorffs Spatial Scan Statistic for the detection and inference of spatial clusters. Consider a map divided into m regions with known populations at risk and number of cases of some disease. We would like to know if the cases are randomly distributed over the m regions or not; if the cases are not randomly distributed, is it possible to locate a specific area within the map with an abnormal concentration of cases? We are interested in testing the alternative hypothesis (there is a cluster in the map) against the null hypothesis (there are no clusters in the map). In the first part, we propose a novel tool for testing hypotheses concerning the adequacy of environmentally defined factors for local clustering of diseases, through the comparative evaluation of the significance of the most likely clusters detected under maps whose neighborhood structures were modified according to those factors. A multi-objective genetic algorithm scan statistic is employed for finding spatial clusters in a map divided in a finite number of regions, whose adjacency is defined by a graph structure. This cluster finder maximizes two objectives, the spatial scan statistic and the regularity of cluster shape. Instead of specifying locations for the possible clusters a priori, as is currently done for cluster finders based on focused algorithms, we alter the usual adjacency induced by the common geographical boundary between regions. In our approach, the connectivity between regions is reinforced or weakened, according to certain environmental features of interest associated with the map. We build various plausible scenarios, each time modifying the adjacency structure on specific geographic areas in the map, and run the multi-objective genetic algorithm for selecting the best cluster solutions for each one of the selected scenarios. The statistical significances of the most likely clusters are estimated through Monte Carlo simulations. The clusters with the ix x lowest estimated p-values, along with their corresponding maps of enhanced environmental features, are displayed for comparative analysis. Therefore the probability of cluster detection is increased or decreased, according to changes made in the adjacency graph structure, related to the selection of environmental features. The eventual identification of the specific environmental conditions which induce the most significant clusters enables the practitioner to accept or reject different hypotheses concerning the relevance of geographical factors. Numerical simulation studies and an application for malaria clusters in Brazil are presented. In the second part, we develop a new methodology for analyzing clustering in maps of regions. Situations where a disease cluster does not have a regular shape are fairly common. Moreover,maps withmultiple clustering, when there is not a clearly dominating primary cluster, also occur frequently. We would like to develop a method to analyze more thoroughly the several levels of clustering that arise naturally in a disease map divided into m regions. The spatial scan statistic is the usual measure of strength of a cluster. Another important measure is its geometric regularity. A genetic multi-objective algorithm was developed elsewhere to identify irregularly shaped clusters. A search is executed aiming to maximize two objectives, namely the scan statistic and the regularity of shape (using the compactness concept). The solution presented is a Pareto-set, consisting of all the clusters found which are not simultaneously worse in both objectives. A significance evaluation is conducted in parallel for all clusters in the Pareto-set through Monte Carlo simulation, then finding the most likely cluster. Instead of using a genetic algorithm, our novel method incorporates the simplicity of the circular scan, being able to detect and evaluate irregularly shaped clusters. We define the circular occupation (CO) of a cluster candidate roughly as its population divided by the population inside the smallest circle containing it. The CO concept, computationally faster and relying on familiar concepts, substitutes here the compactness definition as the measure of regularity of shape. The scan statistic is evaluated for each of the m regions of the map taken individually. The regions are ranked accordingly in decreasing order. A Monte Carlo procedure is used for significance evaluation. The presence of knees in the Pareto-sets indicates sudden transitions in the clusters structure, corresponding to rearrangements due to xi the coalescence of loosely knitted (usually disconnected) clusters. As each Pareto-set contains the most likely clusters within a certain level of geographical information, they could be joined to provide an overall complete description. The multi-objective circular scan is a fast method that allows peering into the clustering structure of a map. The comparison of Paretosets for observed cases with those computed under null-hypothesis provides valuable hints for the spatial occurrence of diseases. The potential for monitoring incipient spatial-temporal clusters at several geographic scales simultaneously is a promising tool in syndromic surveillance, especially for contagious diseases when there is a mix of short and long range spatial nteractions. In the third part, we explore the novel concept of disaggregated spatial scan statistic. This part the thesis is still under development, so we will present only introductory work and a few examples. We present a multi-objective variant of Kulldorffs Spatial Scan Statistic, defining the search for the most likely cluster as a multiobjective problem. Two functions were considered for maximization in the multi-objective setting: the rate and the number of cases within the cluster. We show through examples that this novel approach presents some attractive features: it is capable of distinguishing families of clusters of geographical significance within the set of potential solutions, grouped by their relative position in the rates versus cases space. Thus the clustering structure is readily available to the practitioner, and more detailed inference could be derived through this new tool.

ASSUNTO(S)

estatística teses.

Documentos Relacionados