It has long become clear that statistics and probability are the natural languages for climate:
for given boundary conditions, there are typical states, the climatology, and fluctuations around typical conditions, referred to as climate variability. We are often interested in predicting the occurrence of specific fluctuations of the climate system, being it a given mode of climate variability, such as the El Niño Southern Oscillation (ENSO), or rare events such as heat waves, cold spells or extreme precipitations. All these events have a probability of occurring any given year, i.e. with respect to climatological conditions, but one may also be interested in their probability of occurrence conditioned on the state of the climate system at the time of the prediction. A natural example is medium-range forecasting, which is inherently probabilistic because it lies beyond the deterministic predictability time of the atmosphere, but for which statistically significant prediction can be made, depending on the current state of the system. The purpose of this study is twofold. On one hand, it aims to discuss the mathematical structure of such climate prediction problems and to introduce a quantity which corresponds precisely to this type of prediction problem: the committor function. The committor function is the probability for an event to occur in the future, as a function of the current state of the system. On the other hand, it aims to explain how to efficiently compute the committor function from observations through several data-driven approaches, such as direct estimates, kernel-based methods and neural networks. In addition, how to validate an estimate of the committor function is discussed. Finally, a method for learning effective dynamics is introduced. It relies on the approximation of the real dynamics by means of a Markov chain on the data. Such a Markov chain allows the computation of many interesting quantities of the original system, including the committor function, quickly and at a low computational cost, regardless of the complexity of the system under investigation. Moreover, it overcomes some of the limitations of the other methods by proving its usefulness even in the lack of data. All the concepts mentioned so far will be illustrated in the framework of deterministic or stochastic dynamical systems. Each topic will be introduced in simple models and subsequently its application to more complex systems related to climate dynamics will be shown.