
Number 48 July 1991
A new slide show entitled Water Quality - The Challenge is Crystal Clear explains the role of the Cooperative Extension Service (CES) in the USDA water quality program. Covered in the 56-slide show are ways in which CES is cooperating with other agencies on the water quality program; CES program components such as the Hydrologic Unit Areas, Demonstration projects, Regional Initiatives, and Special Projects; staff training and development in water quality; and water quality components of ongoing programs. Intended audiences for the show include CES staff, other agency staff, local officials, agricultural producers, and other citizens. CES is encouraging that the slides be duplicated and incorporated into other state or local water quality slide presentations.
To obtain a slide set, contact the Water Quality Coordinator for CES at the state level or Charles H. Ullery, National Program Leader, Water Quality, Extension Service-USDA, Room 3348, South Bldg., Washington, DC 20250, Tel: 202-447-5369.
In May 1991, Jean Spooner, Assistant Professor and Water Quality Extension Specialist with the Water Quality Group, was promoted to Group Leader.
Jean is an applied statistician who has worked with the Water Quality Group for the past seven years. Among her many duties, Jean has analyzed water quality data for Rural Clean Water Program (RCWP) and other nonpoint source projects. She has authored numerous papers on RCWP, monitoring techniques, and data analysis. Her technical notes are a regular feature of NWQEP NOTES and well-known to NOTES readers. Jean also provides technical assistance to Section 319 grant recipients on water quality monitoring designs and data analysis, work made possible through U.S. EPA grants. She served as Interim Director of the Water Quality Group from November 1990 until her appointment as Group Leader.
Jean holds a B.S. in agronomy from Cornell University, a M.S. in soil science (minor in statistics) from NCSU, a M.S. in applied statistics from Utah State University, and a Ph.D. in soil science (minor in statistics) from NCSU.
A warm welcome to Jean in her new role!
TECHNICAL NOTES
This article is the second in a two-part series on censored data values. Part 1 consisted of a
discussion of the effect of censored data values on statistical trend analyses (see
NWQEP NOTES No. 47, May 1991). This issue discusses
methods of handling censored values to yield valid statistical results and to handle multiple
detection limits.
Censored Data Values:
Description and Effect of Censoring on Statistical Trend Analyses
(Part 2)
Jean Spooner
NCSU Water Quality Group
Most researchers agree that there is no perfect manner in which to handle censored data in statistical trend analyses. When comparing various methods, most authors use as their criteria to select the `best' method(s) the method(s) that yields minimum bias and root mean square error (RMSE) for the sample estimators of the mean and variance. Gilliom and Helsel (1986) also examined the most reliable methods to estimate the median and the interquartile range.
Gilliom et al. (1984) recommended reporting and using the actual measured concentrations even if they are below the detection limit (DL), are negative, and have high variability. Monte Carlo simulations confirmed that using non-censored values results in the most powerful and reliable method for trend detection. They used the linear nonparametric Kendall's tau trend detection technique. In the simulation performed by Gilliom et al. (1984), most of the data: 1) were assumed to be below the detection limit, 2) followed a lognormal distribution with varying scenarios of error, and 3) had varying levels of simulated trend. Not surprisingly, the authors found that the "detrimental effects of censoring increase with increasing reliability of data." The authors recommended that instrument readings beyond the DL that are usually censored be reported and used for statistical trend testing, despite the fact that censoring may be required by policy makers prior to public release of the data. Porter et al. (1988) supported the `no censoring rule' defended by Gilliom and Helsel (1986), but stated that an estimation of the observation error also should be required for use in statistical analysis of the data.
Substitution and deletion methods assign the same arbitrary replacement value to each observation that was censored or set the values to missing (i.e., delete the observations). This results in a loss of information regarding the true concentration values. In most cases, this loss of information is unavailable because the only reliable information known is that the true concentration value lies somewhere between zero and the DL.
Setting the DL values to missing results in a tremendous loss of information and may cause bias in the analysis. For example, the sample mean will be overestimated (positive bias) and the sample variance will be underestimated (negative bias). Therefore, setting the values to missing is usually not desirable.
Using the DL as the replacement value is not widely accepted either because the real value is known to be less than this value and, therefore, any statistical analysis will be bias. For example, the mean will be overestimated and the variance will be underestimated. A replacement value of zero is not appropriate either because this will skew the data excessively, resulting in an underestimation of the mean and an overestimation of the variance.
The most common method used by researchers to handle left-censored data (data sets censored at low values) is to substitute a value halfway between zero and the detection limit (DL/2) for the censored observation in the statistical analysis (see for example, Zirschky et al., 1985). This results in an unbiased estimate of the sample mean but a biased estimate of sample variance (Gilbert, 1987, p. 178). However, Newman et al. (1989) showed that when censoring intensity was greater than 20%, estimates of the mean were biased downward and estimates of the variance were biased upward when using the DL/2 replacement value. The degree of bias in the mean and standard deviation estimates is related to the population variance and mean values (Newman and Dixon, 1990) and the censoring intensity.
Several methods were given by Gilbert (1987) to calculate an unbiased estimate of the sample mean. These methods require that the data be sampled from a normal or lognormal distribution and contain censored values. The data set can be ordered and an equal number of observations can be "trimmed" (i.e., deleted) from both the censored values and the high data values. The sample mean is then calculated from these trimmed values. Alternatively, Winsorizing the data set (Dixon and Tukey, 1968) consists of replacing data in both tails of the data series with the next most extreme value in each tail and computing the mean and standard deviation of the new data set.
In general, if the censoring intensity is low, the use of DL/2 yields reasonable results from statistical trend tests.
Most of the water quality studies comparing various methods of handling censored values have dealt only with the estimation of the mean and variance (see, for instance, Owen and DeRouen, 1980; Helsel, 1986; Gilliom and Helsel, 1986; Helsel and Gilliom, 1986; Newman et al., 1989; El-Shaarawi, 1989; Gilbert, 1987). Most of these methods provide only estimates of the mean and standard deviation and do not provide substitute values for the individual censored observations that can be used in subsequent linear trend tests. Of course, the mean and standard deviation estimate would be useful by themselves in the Student's t-test and the analysis of variance for the detection of trends. One method described below can also be used to generate values (based on an estimated probability density function) which can be substituted for the DL reported values and used in trend analyses such as linear regression and time series analyses.
A lognormal probability plot can be used to calculate the sample mean and standard deviation based on the 50th percentile value and the slope, respectively (Gilbert, 1987). The probabilities are based on the total number of samples, but only the values for the noncensored observations are plotted. This method is closely related to the regression of lognormal order statistics described below as the LR method.
The two basic approaches used to account for censored data values using probability theory are: 1) creation of random values to substitute for the censored values based on the underlying population distribution; and 2) maximum likelihood estimation (MLE). The MLE techniques involve theoretical corrections to the sample mean and variance calculated from the noncensored data (Cohen, 1959). These two approaches are discussed in more detail below.
Helsel (1986), Gilliom and Helsel (1986), and Helsel and Gilliom (1986) compared several methods for accounting for censored data values based on probability distribution theory including those that created random substitution values based on the assumption of the underlying distribution. Two of the methods used by these authors, the UN and the LR methods, are the most common methods employed in conjunction with probability distribution theory.
The UN method was based on the assumption that the censored values follow a uniform distribution from the zero-to-censoring level. The UN method will provide the identical sample mean value as that obtained using the DL/2 replacement value for censored observations, but the sample variance will have less bias (Helsel, 1986).
The LR method is based on the assumption that the censored observations follow the zero-to-censoring level portion of a lognormal distribution obtained by a least-squared regression between the logarithms of the noncensored concentration observations and their normal scores (Helsel, 1986; Helsel and Gilliom, 1986). This method can be visualized by the lognormal probability plot described above. The method is equivalently called "regression on order statistics" or "normal scores" by other authors.
The maximum likelihood estimates (MLE) for the population mean and variance are based on the assumption that the underlying population is normal or lognormal (Cohen (1959). Maximum likelihood estimates can be obtained through an iterative process. However, Cohen (1959, 1961) developed simplified formulas and tables such that the sample mean and variance calculated from the noncensored data can be easily corrected to yield unbiased estimates of the population mean and variance. This method is summarized by Gilbert (1987, p. 182-183). When the population is not normal or lognormal, the mean and variance estimates obtained using this method can be very biased (Gilliom and Helsel, 1986; El-Shaarawi, 1989; Newman et al., 1989).
El-Shaarawi (1989) showed, using both Monte Carlo simulations and actual water quality data, that when the data distribution is lognormal, the MLE corrected for bias is superior to the LR method. For cases in which the distribution is not lognormal, the authors offer a modification of the MLE which they found to be superior to the LR method. However, they found that the LR method was best for small sample sizes from a lognormal distribution.
Gilliom and Helsel (1986) showed that if the parent distribution of a data set is known, the distribution of the censored data can be estimated allowing for calculation of the best estimate of the mean, median, variance, and interquartile range. This is better than guessing at the appropriate distribution. However, the LR method was shown to be the most robust method for estimating the mean and variance when the population distribution was unknown.
Helsel (1986), Gilliom and Helsel (1986), and Helsel and Gilliom (1986) compared several theoretical distributions using Monte Carlo simulations and verified their results with measured water quality data. The two best methods (UN and LR) performed equally well in producing the most unbiased estimate of the mean, variance, median, and interquartile range. The LR method was the most robust method and yielded the most unbiased estimator regardless of the theoretical distribution of the data series. However, the minimum variance estimators were the LR for the mean and standard deviation and the logarithmic MLE for the median and interpercentile range. (The UN method is similar to using one-half the detection limit.)
Newman et al. (1989) performed simulations on normal and lognormal distributed water quality data that were artificially left-censored with varying intensities to compare methods used to calculate the mean and standard deviation. They found that both maximum likelihood and regression on expected order statistics performed about the same, while the methods that replaced the censored values with the DL, DL/2, zero, or missing were biased in both the mean and standard deviation estimate. However, based on visual inspection of the results presented by the authors, the use of DL/2 was comparable to the use of MLE or order statistics when the censoring intensity was below 20%. Newman et al. (1989) and Newman and Dixon (1990) offer a public domain PC software called UNCENSOR. In this program the mean and standard deviation are calculated based on several popular techniques.
Methods for dealing with multiple detection limits in trend analysis have been investigated by several authors (Hughes and Millard, 1988; Millard and Deverel, 1988; Latta, 1981; Helsel and Cohn, 1988). All censored data are usually corrected to 1/2 of the DL or to the DL of the least sensitive analytical procedure (see, for example, Hirsch et al., 1982). This provides a closer comparison between samples and time periods, but results in a large loss of information. Hughes and Millard (1988), Millard and Deverel (1988), and Latta (1981) proposed and tested several nonparametric procedures that can be used to minimize the loss of information that occurs when the DL changes during the course of a study. These procedures are valid to test for trends over time.
An extension to the seasonal Kendall's test for trend was proposed by Hughes and Millard (1988). The procedure is based on the calculation of `expected ranks' (based on permutation theory) which are used as input into the seasonal Kendall tau statistic.
Millard and Deverel (1988) compared methods to account for changes in the DL between samples when performing the nonparametric two-sample Wilcoxon Rank Sum test. Methods examined included the common methods used in survival analysis and failure-time data which are right-censored. In addition, these authors used the normal score tests described above. Using the criteria of valid significance levels and maximum power from Monte Carlo simulations, they found that the best data rank method was based on the normal scores, assuming a lognormal distribution and a permutation variance estimate. The permutation variance is derived assuming that all permutations are equally possible in each sample (Latta, 1981). However, the normal scores statistic found by Millard and Deveral (1988) to be the best for water quality data series is not commonly used in survival analysis.
Latta (1981) studied the effect on the two-sample test when the distributions were lognormal, Weibull, or exponential. Latta (1981) found the best data rank test to be based on an asymptotic variance estimator. This estimator is based on the log likelihood of the rank vector. Latta (1981) did not test the normal scores method examined by Millard and Deverel (1988).
The adjustments for multiple DL censored data sets just discussed for use with trend tests assume that the pattern of censoring is random (the pattern is not dependent on the trend variable, such as time or space, as would be the case under improving laboratory techniques). Both Millard and Deverel (1988) and Hughes and Millard (1988) offered modifications (not given here) for their calculations based on conditional permutation distributions that can be used when the change in DL is dependent on time.
The procedures for handling multiple DL censored data sets proposed by Hughes and Millard (1988) and Millard and Deverel (1988) have several major limitations. Most of the procedures assume that the data series has constant variance. However, since the variability is usually greater near the detection limit this assumption would be violated. These procedures also assume that the data are independent.
As Millard and Deverel (1988) noted, the software packages of SAS and BMDP have procedures which test for median group differences based on multiple DL right-censored data. These procedures were developed for failure-time analysis, but can be used for left-censored data common with water quality data by multiplying the data by -1 and changing the sign of the resulting test statistics. The applicable SAS procedure is PROC LIFEREG (SAS Institute, 1985, Chapter 21).
Cohen, A.C., Jr. 1959. Simplified estimators for the normal distribution when samples are single censored or truncated, Technometrics 1(3):217-237.
Cohen, A.C., Jr. 1961. Tables for maximum likelihood estimates: Singly truncated and singly censored samples, Technometrics 3:535-541.
Dixon, W.J. and J.W. Tukey. 1968. Approximate behavior of the distribution of Winsorized t (Trimming/Winsorization 2), Technometrics 10:83-98.
El-Shaarawi, A.H. 1989. Inferences about the mean from censored water quality data, Water Resources Research 25(4):685-690.
Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold Company, New York, New York. 320 p.
Gilliom, R.J. and D.R. Helsel. 1986. Estimation of distributional parameters for censored trace level water quality data. 1. Estimation techniques, Water Resources Research 22(2):135-146.
Gilliom, R.J., R.M. Hirsch, and E.J. Gilroy. 1984. Effect of censoring trace-level water-quality data on trend-detection, Environmental Science and Technology 18(7):530-535.
Helsel, R.D. 1986. Estimation of distributional parameters for censored water quality data. p. 137-157. In: Statistical Aspects of Water Quality Monitoring. A.H. El-Shaarawi and R.E. Kwiatkowski (Eds.). Elsevier Publishers, New York.
Helsel, D.R. 1987. Advantages of nonparametric procedures for analysis of water quality data. Hydrological Sciences J. 32(2):179-190.
Helsel, D.R. and T.A. Cohn. 1988. Estimation of descriptive statistics for multiply censored water quality data, Water Resource Research 24(12):1997-2004.
Helsel, D.R. and R.J. Gilliom. 1986. Estimation of distributional parameters for censored trace level water quality data. 2 Verification and applications, Water Resource Research 22(2):147-155.
Hirsch, R.M, J.R. Slack, and R.A. Smith. 1982. Techniques of trend analysis for monthly water quality data, Water Resources Research 18(1):107-121.
Hughes, J.P. and S.P. Millard. 1988. A tau-like test for trend in the presence of multiple censoring points, Water Resources Bulletin 24(3):521-531.
Latta, R.B. 1981. A Monte Carlo study of some two-sample rank tests with censored data, J. American Statistical Association 76(375):713- 719.
Millard, S.P. and S.J. Deverel. 1988. Nonparametric statistical methods for comparing two sites based on data with multiple nondetect limits, Water Resources Research 24(12):2087-2098.
Newman, M.C. and P.M. Dixon. 1990. UNCENSOR: A program to estimate means and standard deviations for data sets with below detection limit observations, American Environmental Laboratory 2(2):26-30.
Newman, M.C., P.M. Dixon, B.B. Looney, and J.E. Pinder, III. 1989. Estimating mean and variance for environmental samples with below detection limit observations, Water Resources Bulletin 25(4):905-916.
Owen, W.J. and T.A. DeRouen. 1980. Estimation of the mean for lognormal data containing zeroes and left-censored values, with applications to the measurement of worker exposure to air contaminants, Biometrics 36(4):707-719.
Porter, P.S., R.C. Ward, and H.F. Bell. 1988. The detection limit: Water quality monitoring data are plagued with levels of chemicals that are too low to be measured precisely, Environmental Science and Technology 22(8):856-861.
SAS Institute Inc. 1985. SAS User's Guide: Statistics, Version 5 Edition. SAS Institute Inc., Cary, North Carolina. 956 p.
Zirschky, J, G.P. Keary, R.O. Gilbert, and E.J. Middlebrooks. 1985. Spatial estimation of hazardous waste site data, J. Environmental Engineering 111(6):777-789.
The Soil Conservation Service (SCS) and the Agricultural Research Service (ARS) are evaluating water quality computer models as a part of the USDA Water Quality Initiative. The purpose of the evaluation is to determine which models SCS should support. Through the evaluation, the two agencies hope to determine the technical reliability of each model (does the model produce reasonable information) as well as to evaluate its usability and utility (is the model user-friendly and does its output meet SCS needs).
Four models were selected for initial testing: two watershed-scale models (AGNPS and SWRRBWQ) and two field-scale models (EPIC and NLEAP). The NPURG procedure is also being evaluated. Additional models will be tested in FY92.
The models will be run at eleven field test sites representing a variety of water quality problems. The sites include 9 Hydrologic Unit Areas and 2 Demonstration Projects located in 11 states.
Conservation Tillage Fact Sheets
The Conservation Tillage Educational Planning Group has published 15 fact sheets on conservation tillage. The fact sheets are part of an effort by federal agencies and conservation groups to help farmers comply with the soil conservation provisions of the farm bill. Contributors to the Group include U.S. EPA, U.S.D.A. (Soil Conservation Service and Cooperative Extension Service), the Conservation Technology Information Center, and the Midwest Plan Service.
Copies of the fact sheets are available from: CTIC, 1220 Potter Dr., Room 170, West Lafayette, IN 47906-1334. Quantities of the fact sheets are available at $ .03 per piece. There is a minimum charge of $3 per order.
New Guide Will Help Farmers Use Sustainable Options in 1990 Farm Bill
A new Farm Program Options Guide prepared by the Sustainable Agriculture Working Group explains each sustainable agriculture-related program of the 1990 Farm Bill. The reader is taken through a step-by-step process for assessing whether each option makes economic and stewardship sense from a farmer's point of view. Some of the program options examined in the guide include: 1) integrated farm management program options, 2) conservation Reserve Program (CRP) Changes, and 3) Water Quality Incentive Program. The guide is available ($3 per copy) from the Center for Rural Affairs, P.O. Box 405, Walthill, NE 68067, Tel: (402) 846-5428.
Video Trains Citizen Stream Monitors
A new 28-minute video describing the Izaak Walton League of America's Save Our Streams (SOS) program is available for training citizens to become active stream monitors. The SOS program involves a simple, scientific way to trap, identify, and record stream life to determine water quality. Based on the biota found, a water quality rating for the stream can be determined.
The video provides a hands-on demonstration of stream monitoring techniques and discussion of water quality issues for citizens including: 1) stream pollution problems, 2) stream sampling, 3) identification of stream organisms, 4) use of stream survey, and 5) how to adopt a stream.
Copies of the video may be purchased ($15) from: SOS Video, Izaak Walton League of America, 1401 Wilson Blvd., Level B, Arlington, VA 22209 (make checks payable to IWLA).
Conservation Tillage Symposium Proceedings
Proceedings of the Great Plains Conservation Tillage Symposium, held Aug 21-23, 1990, in Bismarck, ND, are available. The book includes reports by 36 researchers on studies of conservation tillage, seedbed preparation, soil fertility, water quality, and sustainable agriculture in the Great Plains from Texas to the Canadian Provinces.
Copies of the 305-page proceedings may be obtained ($10) from Dr. Carl Fanning, Plant Science-Soils, 203 Waldron Hall, North Dakota State University, Fargo, ND 58105 (checks payable to Conservation Tillage).
Comprehensive Bibliography of Water Quality Educational Materials
A comprehensive bibliography of educational materials relating to water quality has been prepared by the Washington State University Extension Service. The bibliography is available on 3 1/2 or 5 1/4-inch diskettes and lists more than 400 resources, including ground and surface water best management practices applicable to both urban and rural areas.
To obtain a free copy, send a blank diskette to: Chris Feise, Coordinator, Water Quality, WSU Tuyallup Research and Extension Center, 7612 Pioneer Way East, Tuyallup, WA 98371-4998. (A limited number of printed copies are available while they last.)
Bibliography of Western U.S. Macroinvertebrates
The Idaho Department of Health and Welfare recently published a bibliography authored by William H. Clark entitled Literature Pertaining to the Identification and Distribution of Aquatic Macroinvertebrates of the Western U.S. with Emphasis on Idaho. The 59-page bibliography may be requested by contacting William H. Clark, Water Quality Bureau, Division of Environmental Quality, Idaho Dept. of Health and Welfare, 1410 North Hilton St., Boise, ID 83720-9000.
Video Addresses Farm Chemicals and Water Quality
A 12-minute video entitled Our Bread and Water has been produced by the National Council of Farmer Cooperatives. The video, targeted to a citizen audience, explains the role of farm chemicals in today's farming operations and strives to dispel the myth that farmers' practices are driven entirely by economics. Individuals interviewed discuss the use of Best Management Practices and Integrated Pest Management. The video (VHS) can be purchased from: Don Hanes or Norvalla Reid, National Council of Farmer Cooperatives (202) 626-8700. Cost is $20.
NWQEP NOTES is issued bimonthly. Subscriptions are free (contact: Publications Coordinator at the address below or via email at wq_puborder@ncsu.edu). A list of publications on nonpoint source pollution distributed by the NCSU Water Quality Group is included in each hardcopy issue of the newsletter.
I welcome your views, findings, information, and suggestions for articles. Please feel free to contact me.
Judith A. Gale, Editor
Water Quality Extension Specialist
North Carolina State University Water Quality Group
Campus Box 7637
North Carolina State University
Raleigh, NC 27695
Tel: 919-515-3723
Fax: 919-515-7448
email: notes_editor@ncsu.edu