Non-Randomly Sampled Networks: Biases and Corrections
This paper analyzes statistical issues arising from non-representative network samples of the population, the most common network data used. We first characterize the biases in both network statistics and estimates of network effects under non-random sampling theoretically and numerically. Sampled network data systematically bias the properties of observed networks and suffer from non-classical measurement-error problems if applied as regressors. Apart from the sampling rate and the elicitation procedure, these biases depend in a non-trivial way on which subpopulations are missing with higher probability. We then propose a methodology, adapting post-stratification weighting approaches to networked contexts, which enables researchers to recover several network-level statistics and reduce the biases in the estimated network effects. The advantages of the proposed methodology are that it can be applied to network data collected via both designed and non-designed sampling procedures, does not require one to assume any network formation model, and is straightforward to implement. We use Monte Carlo simulation and two widely used empirical network data sets to show that accounting for the non-representativeness of the sample dramatically changes the results of regression analysis.