While our capacity to collect real-time data has grown significantly over the past decade, our ability to analyze that data to turn it into knowledge has not kept pace. The current focus of big data is shifting toward the design of tools and applications able to extract information from collected data in real-time. However, guaranteeing the real-time requirement is hampered by the traditional cloud system designs and management strategies. Current systems for big data applications rely on heterogeneous resources distributed across the constrained Edge and the powerful Cloud. In addition, the applications are now created as a set of self-contained microservices, developed by independent teams. This evolution of system design has introduced extreme heterogeneity and uncertainty into emerging applications, highlighting limitations in traditional management strategies.
In this thesis, we focus on designing a system for big data applications that re-thinks existing management strategies with a particular emphasis on the heterogeneity of incoming data, applications, and resources. We first study the decoupling of data producers and consumers in emerging microservice-based applications as the entry point to effectively leverage available services, even newly published ones. Second, we investigate the tradeoff between the quality and urgency of the results in big data applications distributed across an Edge-to-Cloud continuum as a promising strategy to overcome the limited and heterogeneous capacity of system resources. We then apply the proposed approaches in the context of Deep Learning applications. The evaluations of the proposed approaches are conducted on a real-life testbed.