Censored Data – What is it and what does it mean for my project?

What are censored data and what are the implications for your project?  These are two very good questions and the second can have a significant bearing on your management decisions, remedial actions, and approach to regulatory agencies.

Censored data are analytical results reported below a detection limit, e.g., < 2 ug/L, ug/m3, etc.  They appear in most environmental datasets because there are often many non-detect values.  This means the result is too low for the instrument or analytical method to yield a reliable number.  It also means that the result is unlikely to be zero and that there is possibly some amount of contaminant in the sampled medium.

Historically, the way of handling non-detects was to substitute one-half the detection limit or some other arbitrary value, or in some cases eliminate non-detects from the analysis altogether.  Ignoring non-detects means you are tossing out useful information.  Substituted, i.e., fabricated, values create an invasive signal that potentially distorts the meaning of the measured data (Helsel 2012).[1]  If you insert artificial values, you are declaring that you know something that you do not.  Fabricated values can deform your dataset and give unreliable and/or incorrect results (Helsel (2012). These invasive data can lead to poor or incorrect decisions about remediation or whether a cleanup goal has been reached, or even if remediation is required.  Ignoring censored results can leave the odds stacked against you because you are omitting low end values, which can bias the remaining results high.  Why report your results higher than they really are? This can have large, adverse, and unnecessary cost implications.  If you ignore censored data, then this is what you may be doing whether you realize it or not.  Likewise, if you are not handling censored data properly, you are possibly saying something that is not true, or at least inaccurate.

Being able to use and understand censored data can be very advantageous when handled properly.  Here is an example. The question was whether Trichloroethene (TCE) was higher in sub-slab soil vapor in one location versus a second location on the same property.  Approximately 40 percent of the sampling results were below detection limits. Visual examination of the numerical results could not discern a difference.  A graph of the results—always plot the data regardless—showed a difference, but it was not known if the difference was significant.  Only the proper statistical analysis could do that, but this was complicated because so much of the data were censored.  FLS analyzed the data statistically employing methods that accounted for the censored data without the loss of information.  The result was that there was no statistical difference in the soil vapor concentrations from the two location, despite the apparent difference observed by visually examining the graphs.

At FLS we understand how to deal with censored data and can use the information in it to your benefit.  Sometimes, there can many non-detect values in your data. Why not put this information to good use instead of throwing it away, or leading you astray? FLS can help you with that. And help you get the most out of the costly information you obtained. Being able to use this information can have huge cost implications.  But without it, you never know what could have been, or what was but was not really necessary.

[1] Helsel, D. R. Statistics for Censored Environmental Data Using Minitab and R, Second Edition. Published 2012 by John Wiley & Sons, Inc., p.2.