Estimating seroprevalence with data from an imperfect test on a convenience sample

Update Jan, 2022: Since this post was published in May 2020, Gelman and Carpenter (2020) have published a more comprehensive analysis on how to adjust for test imperfections using a Bayesian approach which goes beyond many of the ideas here. I recommend checking out their article.

Several recent studies have used data from antibody tests performed on a convenience sample to estimate seroprevalence of COVID-19 in a population. Estimating seroprevalence from this type of data presents two challenges. First, the analyst must take steps, through weighting or other measures, to deal with likely sample selection bias. Second, the analyst must take into account imperfections in the test itself. Addressing either of these challenges on their own is relatively straightforward to do using existing tools but addressing both at the same time is pretty tricky.

In this blog post, I first describe the most commonly used tools for adjusting for test imperfections and performing inference on a convenience sample. I then describe two different ways of tackling both of these issues at once: a simple approach which sequentially applies the two simple approaches and a more complicated, but also more theoretically sound, Bayesian approach. I also report results from a quick Monte Carlo experiment I used to assess both approaches. A companion git repo includes code that (hopefully) can be relatively easily adapted to estimate seroprevalence from other studies using these approaches.

The TLDR version of the results is that the naive approach seems to work fine for estimating overall population prevalence but that you should use the more sophisticated approach when generating estimates for subgroups.

The Rogan and Gladen Adjustment to Account for Test Imperfections

Rogan and Gladen (1978) developed a simple approach to adjust estimates of prevalence that takes into account test imperfections. If we define \(pt\) to be the true positive rate, \(se\) to be the test sensitivity (i.e. true positive rate), and \(sp\) to be the test specificity (i.e. the true negative rate), then the share of a population that will test positive is:

\[ pa = se*pt+(1-sp)(1-pt) \]

The Rogan and Gladen approach to adjusting for test imperfections solves for \(pt\) in this equation and replaces the true values of each parameter with their estimated values.

\[ \widehat{pt} = \frac{\widehat{pa}+\widehat{sp}-1}{\widehat{se}+\widehat{sp}-1} \]

Rogan and Gladen developed a few options for calculating standard errors of these estimates (and Reiczigel, Földi, and Ózsvári (2010) developed more sophisticated confidence intervals). Rogan and Gladen’s simplest approach to estimating the standard errors (which I will use later) is to calculate the variance of \(\widehat{pt}\) using the formula for the variance of a proportion.

\[ \widehat{Var({\widehat{pt}})}= \frac{\widehat{pa}(1-\widehat{pa})}{N(se+sp-1)^2}\]

The R package “epiR” allows users to apply the Rogan and Gladen adjustment and calculate confidence intervals using a variety of approaches. (See the function “epi.prev” in this package.)

Adjusting a Convenience Sample Using Post-Stratification

There are a lot of different ways to account for potential sample selection bias when analyzing data from a convenience sample but the simplest, and most commonly used, method is post-stratification. To apply post-stratification, we first divide up our samples into groups based on whatever demographic info we collected and calculate estimates for each group. We then weight the estimates for each of these groups according to the share of the total population that they represent. For example, if we sought to estimate the mean of some variable \(y\) for the total population our estimate would be:

\[ \widehat{\bar{y}} = \sum_{h=1}^H{\frac{N_h*\bar{y_h}}{N}} \] where \(\bar{y_h}\) is the estimate for group h, and \(\frac{N_h}{N}\) is the share of group h in the total population from, for example, census data.

The naive approach: simple poststratification followed by a Rogan Gladen adjustment

The simplest approach to adjusting for both test imperfections and potential sample selection bias arising from convenience sampling is to first estimate the apparent prevalence rate (without accounting for test imperfections) using poststratification and then apply the Rogan and Gladen adjustment.

Issues with the naive approach

In theory, the naive approach shouldn’t work too well. To see why this is the case, suppose you have two strata of equal sample size but one stratum represents a much larger portion of the population than the other strata (i.e. if you were to use weights, the weights for observations from this stratum would be much higher than observations from other strata). Suppose also that true prevalence is very low. Due to random test error, you will likely have some false positives in your sample. If you happen to get a false positive in the stratum with high weights, then the naive approach will lead you to overestimate the overall true prevalence. On average, your estimate of the true prevalence will be Ok but it (in theory) will have pretty high variance. (I caveat these claims with the phrase “in theory” since, as we will see below, for the simulated data I create it isn’t actually that much of a problem.)

A Bayesian Approach using Modified MRP

Theoretically, we should be able to improve on this approach by more carefully taking into account potential test imperfections. To use the example from above, if we saw that there was one positive test in the highly weighted stratum and 0 positive tests in the other stratum, we should adjust downward our overall estimate of the prevalence.

Quick overview of MRP

One way to do this is using a fully Bayesian approach built on multi-level regression and post-stratification (MRP). (For another Bayesian approach to this problem which doesn’t use MRP, see Larremore et al (2020).)

MRP is an approach to small area estimation in which the analyst first estimates the mean of each strata using a multi-level model and then weights up these estimates using the poststratification weights. For example, to estimate the overall proportion \(\theta\) in a population using data \(y_i\) for each individual, you might use a simple model as follows to first estimate, \(\theta_j\), the proportion in each stratum j using stratum variables \(X_j\).

\[ \theta_j =logit^{-1}(X_j\beta); \beta\sim MVN(0,\sigma I); y_i \sim bernoulli(\theta_{j[i]}) \] To derive your estimates of the total population, you just weight up. i.e. you calculate

\[ \widehat{\theta} = \sum_{h=1}^H{\frac{N_h*\widehat{\theta_j}}{N}} \]

MRP is especially useful when you have a lot of different strata (which is often the case) since it allows you to more effectively “borrow strength” between strata compared to the approach where you simply model a different intercept for each stratum. (If you are simply modeling a separate intercept for each stratum, then there is no way for the model to know, for example, that a stratum for white men between 41 and 45 in Georgia and a stratum for white men between 46 and 50 are likely to be similar.) It is also, believe it or not, relatively straightforward compared to other approaches to small area estimate. For a more thorough overview of MRP, I highly recommend this vignette.

Modified MRP to account for test imperfections

If implementing MRP using a Bayesian approach, it is fairly straightforward to modify the MRP model to take into account test error. As before, we use a multilevel model for the likelihood of the true prevalence. But in our likelihood of the test data, we use the apparent prevalence rate, which is the probability of a test being positive taking into account both prevalence and test imperfections, rather than the true prevalence. Lastly, we also model uncertainty in our estimates of the sensitivity and specificity using data on the number of true positives (tp), true negatives (tn), false positives (fp), and false negatives (fn) from a validation study of the antibody test.

\[ pt_j =logit^{-1}(X_j\beta); \beta\sim MVN(0,\sigma I)\] \[ pa_j = se*pt_j+(1-sp)*(1-pt_j) \] \[ se \sim binom(tp, tp+fn); sp \sim binom(tn, tn+fp)\] \[ y_i \sim bern(pa_{j[i]}) \] For a complete Bayesian model, we also need to add priors for sensitivity and specificity.

Results from a Monte Carlo Simulation

Theory is all well and good, but how do the two approaches compare when using data? To test this, I ran a simple Monte Carlo simulation using code borrowed from Kennedy and Gabry’s MRP tutorial. (And big thanks to them for letting me copy their code!)

Surprisingly, the naive approach actually did slightly better (in terms of average absolute deviation from the true seroprevalence) when it came to estimating overall seroprevalence. This is especially surprising since the data generating process used for the simulations is almost identical to my MRP model. The modified MRP process does much better when estimating subgroups (the Rogan and Gladen estimates for subgruops are often negative, which happens sometimes) but clearly, given the additional hassle of generating the code, the modified MRP approach is only worth it if you really want to estimate subgroup effects.

For people interested in using this code

All code for this analysis can be found here. If you looking to copy and adapt the code, start with the R notebook “Estimate seroprevalence” in the above repo. In that notebook, I fit the modified MRP approach in two different ways: using raw Stan code and using the brms package (with some custom code to extend the package). If you would like to use the more complicated modified MRP approach, I strongly recommend you use the brms package. If you use the brms package, you should be able to copy and paste the code I created to define a “custom family” for the brms package and then modify the code in the main call to brm to suite your data. Since brms uses the lme4 syntax for defining multi-level models, customizing this code hopefully shouldn’t be too hard. By contrast, I find that modifying raw Stan code always takes quite a bit of time.


Gelman, Andrew, and Bob Carpenter. 2020. “Bayesian Analysis of Tests with Unknown Specificity and Sensitivity.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 69 (5): 1269–83.
Reiczigel, J., J. Földi, and L. Ózsvári. 2010. “Exact Confidence Limits for Prevalence of a Disease with an Imperfect Diagnostic Test.” Epidemiology and Infection 138 (11): 1674–78.
Rogan, Walter, and Beth Gladen. 1978. “Estimating Prevalence from the Results of a Screening Test.” American Journal of Epidemiology 107 (1): 71–76.