On Learning in the Presence of Friction
In the practical setting of statistical analysis, the data is often subjected to bias or has undergone certain kinds of transformation during the collection or processing process prior to observation. These transformations, if unattended, could result in misleading outcomes. As a result, it is a fundamental problem to identify these transformations, which we call data frictions, and more importantly, to develop efficient and accurate statistical analysis methods even in the presence of these frictions. In this paper, we study a generalized version of the friction model and analyze the possibility and efficiency of statistical learning on data subject to friction. In particular, we focus on two statistical learning problems, Mean Estimation and Linear Regression. We also draw connections between our general form of friction and specific types of data bias, for example, truncated data and data under classification, the data analysis on both of which have been extensively studied in academia. For the two learning problems we are interested in, we first give a sufficient condition to identify the mean of the Gaussian or the regression parameters by considering a specific family of friction functions. Then, for Mean Estimation, we find a connection between assigning equal probability mass to multiple Gaussian measures and the famous Consensus-Halving problem, which helps us construct unidentifiable instances of friction functions for an arbitrary number of Gaussians. Finally, under a set of assumptions, we develop an efficient Projected SGD algorithm that provides an estimator of the mean of the Gaussian with a certain probability of success.