Online statistical inference in streaming data: renewability, dependence, and dynamics
Lan Luo, PhD
Assistant Professor, Department of Statistics and Actuarial Science
College of Liberal Arts and Science
University of Iowa
New data collection and storage technologies have given rise to a new field of streaming data analytics, including real‐time statistical methodology for online data analyses. Streaming data refers to high‐throughput recordings with large volumes of observations gathered sequentially and perpetually over time. Such data collection scheme is pervasive not only in biomedical sciences such as mobile health, but also in other fields such as IT, finance, services, and operations. Despite a large amount of work in the field of online learning, most of them are established under strong independent and identical data distribution, and very few target statistical inference. This talk will center around three key components in streaming data analyses: (i) renewable updating, (ii) cross‐batch dependency, and (iii) time‐varying effects. I will first introduce how to conduct a renewable updating procedure, in the case of independent data batches, with a particular aim of achieving similar statistical properties to the offline oracle methods but enjoying great computational efficiency. Then I will discuss how we handle the dependency structure that spans across a sequence of data batches to maintain statistical efficiency in the process of renewable updating. Lastly, a dynamic weighting scheme will be integrated into the online inference framework to account for time‐varying effects. I will provide both conceptual understanding and theoretical guarantees of the proposed method, and illustrate its performance via numerical examples.