Keywords: Epidemiology, Models, Network analysis, Physic statistic, Epidemic Dynamics, Public Health
A. Contact:
- e-mail of the proposer : Eric.Fleury@inria.fr
- Research team name : D-NET / Research Unit : Rhône-Alpes / ENS de Lyon
- Research theme : Networks, Systems and Services, Distributed Computing
- Research team leader name : Pr. Eric Fleury
- Intern tutor : Pr. Eric Fleury in collaboration with Dr. Patrice Abry
- http:
- http://perso.ens-lyon.fr/patrice.abry
- http://perso.ens-lyon.fr/eric.fleury/
- http://www.ens-lyon.fr/LIP/D-NET/
- http://www.ens-lyon.fr/PHYSIQUE/
B. Research program and methodology
The goal is to study and analyze a set of data provided by the iBird-INVEST project (IP6 MOSAR Project), combining human interaction network dynamics (All interactions/contacts between 600 people were recorded every 30sec on a 6 month long period thanks to logsensors[1]), individual antibiotic exposure and ARB propagation patterns, should allow us to perform a direct estimation of these epidemicity parameters for all circulating ARB clones. To that aim, we will obtain model predictions with the “completed” ARB, taking into account contact network data, as well as observed data on antibiotic exposure of patients. We will then compare these predictions with the observed data on ARB propagation patterns, in order to investigate several hypotheses regarding the relative epidemicity of co-circulating bacterial strains.
The dynamically evolving interaction networks under study can be regarded as “out of equilibrium” systems, whose dynamics are typically characterized by non-standard statistical properties such as non-stationarity, long-range memory effects and intricate space and time correlations. Also, their dynamics often exhibit no preferred time scale or, equivalently, involve a range of scales and are characterized by a scaling or scale invariance property. Another important aspect of our real-world data-set (iBird-INVEST) is that the information recorded is of a different nature and is collected with different and unsynchronized resolutions in both time and space. This property, referred to as multi-modality, is generic and central in most dynamical networks.
The work program will be devoted to constructing a methodology providing relevant statistical analyses and characterizations of network dynamics. Specifically, it will concentrate on developing tools grounded in statistical signal processing theory and in statistical studies of complex networks that address the difficulties listed above. The postdoc will also conjointly be devoted to constructing a methodology aiming at providing relevant definitions, statistical models and a theory of dynamic graphs leading to a good understanding of the structure of the graph and its dynamics. Based on the characterization of network dynamics from both a statistical and structural point of view, we will be able to study epidemic models and tackle the problem of dynamic processes on dynamics networks.
Task 1: From “primitive” to “analyzable” data: Observables.
The various and numerous modalities of information collected on the network generate a huge “primitive” data set. It has first to be processed to extract “analyzable” data, which can be envisioned with different time and space resolutions: it can concern either local quantity, such as the number of contacts of each individual, pair-wise contact times and durations, or global measure, e.g., the fluctuations of the average connectivity. The first research direction of Task 1 consists therefore of identifying, from the “primitive” data, a set of “analyzable” observable data that would be relevant and meaningful for the analysis of network dynamics and network diffusion phenomena. Such “analyzable” observables need also to be extracted from large “primitive” data-set with reasonable complexity, memory and computational loads. To analyze and understand network dynamics through observables, it is essential that their (statistical) dependencies, correlations and causalities can be assessed. For instance, in the iBird framework, it is crucial to assess the form and nature of the dependencies and causalities between the time series reflecting, e.g., the evolution along time of the strain resistance to antibiotics and the fluctuations at the inter-contact level. However, the multimodal nature of the collected information together with its complex statistical properties turns this issue into a challenging task. Therefore, Task 1 will address the design of statistical tools that specifically aim to measure dependency strengths and causality directions amongst multivariate signals, using techniques from multi-resolution and non-stationary signal processing, as it is further detailed below.
Task 2: Granularity and resolution
The observables of the network dynamics will take the form of time-series, “condensing” the description at various granularity levels, both in time and space. For instance, the existence of a contact between two individuals can be seen as a link in a network of contacts. Contact networks corresponding to contact sequences aggregated at different analysis scales (potentially ranging from hours to days or weeks) can be built. However, it is so far unclear to what extent the choice of analysis scale impacts the relevance of network dynamics description and analysis. An interesting and open issue lies in the understanding of the evolution of the network from a set of isolated contacts (when analyzed with low resolution) to a globally interconnected ensemble of individuals (at large analysis scale). In general, this raises the question of selecting the adequate level of granularity at which the dynamics should be analyzed. This difficult problem is further complicated by the multi-modality of the data, with potentially different time resolutions. We will therefore consider as an alternative the analysis of network dynamics jointly at all resolutions, through wavelet decompositions and multi-resolution analyses. While these tools have been studied for self-similar and multifractal processes, their tailoring to network dynamics implies challenging theoretical issues that will be addressed; multivariate models for such processes remain to be invented and alternations of periods of regular and irregular behaviors may require the development of processes that go beyond the so-called intermittency models.
Task 3: Spreading processes & Inference in epidemic models
Complex networks are often the support of dynamical processes like disease spreading. While many efforts have been devoted to the analysis of such processes on static networks, much less is known when the network itself evolves dynamically. Effects of the bursty, non-stationary dynamics of the contacts or of causality have barely been investigated. Starting with the study of simple models of diffusion (random walks) or spreading (SI, SIS), we will first consider the question of characterizing these processes and their differences with the evolution of the same processes on static networks, possibly aggregated on different timescales. We will also generalize the heterogeneous mean-field approaches, which have been very effective for understanding dynamical processes on static complex networks, and tailor them to take into account the dynamical evolution of the network and its multimodal aspects. Another important issue concerns the definition of ‘key’ individuals for the spreading process. In static networks, the betweenness centrality identifies nodes through which many spreading/diffusion paths go. Due to causality effects, the generalization to dynamically evolving networks will have to be carefully designed.
Another aspect is how estimation of epidemiological parameters may proceed for disease spreading over such networks, which is ultimately what will be required to use models for prediction purposes. We will design methods for estimating epidemic model parameters when such detailed contact data is available, and compare accuracy, variability and ease of use, especially concerning computational cost. To this aim, epidemics will be simulated using simple epidemic models as described in the preceding paragraph, and the simulated data serve as a basis for comparing estimation procedures. Marginal approaches will first be used: Using the contact rates of each individual with others, estimated from the characteristics of the contact network, unsupervised or semi-supervised methods will be used to cluster individuals according to their contact patterns. This ad hoc description will then be used to structure compartmental model, from which parameters may be estimated using computer intensive approaches for inference, especially MCMC and data augmentation.
However, it may be better to use more details from the contact network. We will start from the case of full observation of both the epidemic and the contact network. In this ideal case, the probability of transmission may be estimated by the weighted ratio of “effective contacts” (i.e. leading to transmission) to all contacts, possibly weighting according to the frequency of these contacts. How the dynamics of the network can be taken into account may pass through proper weighting. Working from this ideal case, we will try to extend the methods to situations of less than full observation, including partial observation of the epidemics, partial observation of the contact network, and so on. In past years, a number of estimation procedures relying more or less directly on random graphs have been proposed in epidemic models. How these methods may be adapted to contact networks belonging in an empirically determined family will be studied to propose efficient estimation algorithms.
[1] See http://perso.ens-lyon.fr/eric.fleury/Upload/Mosar/MosarEng080120.wmv for a short movie explaining the experimentation that took place at the Hôpital Maritime in Berck, France within the MOSAR context.