Investigating self-similarity and heavy tailed distributions on a large scale experimental facility

Patrick Loiseau, Damien Ancelin, Matthieu Imbert, Paulo Gonçalves, Pascale Primet,
Guillaume Deawele, Pierre Borgnat and Patrice Abry.


Keywords: Grid5000, Metrology, Traffic analysis, Self-similarity, Heavy-tailed flow size distributions, Protocols,

Overview:


After seminal work by Taqqu et al. relating self-similarity to heavy tail distributions, a number of research articles verified that aggregated Internet traffic time series show self-similarity and that Internet attributes, like WEB file sizes and flow lengths, were heavy tailed. However, the validation of the theoretical prediction relating self-similarity and heavy tails remains unsatisfactorily addressed, being investigated either using numerical or network simulations, or from uncontrolled web traffic data. Notably, this prediction has never been conclusively verified on real networks using controlled and stationary scenarii, prescribing specific heavytail distributions, and estimating confidence intervals. In the present work, we use the potential and facilities offered by the large-scale, deeply reconfigurable and fully controllable experimental Grid5000 instrument, to investigate the prediction observability on real networks. To this end we organize a large number of controlled traffic circulation sessions on a nation-wide real network involving two hundred independent hosts. We use a FPGA-based measurement system, to collect the corresponding traffic at packet level. We then estimate both the self-similarity exponent of the aggregated time series and the heavy-tail index of flow size distributions, independently. Comparison of these two estimated parameters, enables us to discuss the practical applicability conditions of the theoretical prediction.



Estimated Self-Similar index H of the aggregated traffic (aggregation interval = 10us) versus estimated tail index of the corresponding flow size distribution. Solid plots represent the theoretical model proposed by Taqqu et al, dashed plots correspond to experimental results: (a) with TCP protocol ; (b) with UDP protocol.



Experimental verification of the relation between the heavy-tail index of flow size distributions and the self-similarity exponent of aggregated traffic time series Experimental verification of the relation between the heavy-tail index of flow size distributions and the self-similarity exponent of aggregated traffic time series

All estimates correspond to simulations:


We have revisited the relationship between file sizes and the self-similarity of traffic observed at link level. This work is based on three important innovative factors : the use of accurate estimation tools, a deeper analysis of Taqqu’s Theorem applicability conditions and the use of a large scale reconfigurable experimental facility. The wavelet based estimation procedure for H is known has a state-of-the-art tool, being one of the most reliable and robust (against non stationarities). It has been used with care. The alpha index estimation procedure used here, shown to outperform previous available techniques, had never been used before with Internet data. Widely reckoned, the deeply asymptotic nature of Taqqu’s Theorem has been better accounted for by conducting estimations of the selfsimilarity parameter at really coarse scales (coarse being quantified as scales far beyond the system dynamic). This asymptotic limit requires to produce traffic with particularly long observation duration, yet stationary, and well controlled. The nation wide and fully reconfigurable Grid5000 instrument enables generation, control and monitoring of a large number of finely controlled transfer sessions with real transport protocol stacks, end-host mechanisms, network equipments and links. Given such real and very long traces, we have been able to demonstrate experimentally, and with an accuracy never achieved previously with real data nor with simulations, that Taqqu’s Theorem and the relation between self-similarity and heavy-tailness can actually be observed. In particular, we obtained a significant agreement between theoretical and experimental values at the transition points, around alpha = 2. This is of particular difficulty for it mixes issues of different kinds regarding estimation of both the SS exponent and the heavy-tail index. Concerning the discussion about the relationship between transport protocols and self-similarity – which remained quite controversial after and despite Crovella et al.’s meaningful contributions – our observations confirm that protocols, rate mechanisms or packet and flow-level control mechanisms do not impact the observed self-similarity. Our analyses show that this is mostly because the ranges of scales related to self-similarity are far coarser than those (fine and medium scale) associated to such mechanisms. Ranges of scales have been quantified in terms of RT T and mean flow durations. We plan to further investigate this issue by designing specific experiments on our unique experimental Grid5000 tool. We aim at systematically investigating long term traffic traces generated under various congestion and aggregation levels, heterogenous source rates, mixed source protocols, various RT T s, different bottleneck and buffer capacities and new high speed transport protocols variants. We expect this will contribute to a better undestanding and a finer prediction of the network traffic in current and future Internet as well as to a relevant design of future transport and network control mechanisms.


Publications: