*** Q. Do data gaps mean that we have some information but try to apply it 
  sensibly with regards to the area we don't have information or is it referring 
  solely to the fact that we just don't have specifics for each individual and 
  don't have a magic crystal ball with the future in it?
  A. I put a slide show in the Module 10 Closure file, which you can access. (After 
  the show you'll have to find your way back here, the trick to link them eludes 
  me.) The first 10 slides explain this better by use of technical terms I've 
  tried to avoid. Show from here. I've 
  tried to just skim the surface of sampling and data analysis. When I teach this 
  in the classroom I have colleague, Ph.D. chemist-type, who gives that class. 
  It can get very involved. Take-home message is that you need to prepare and 
  plan for the sampling. Planning includes a careful review of the regulations, 
  coordination with the testing labs, and coordination with the agencies who are 
  responsible for accepting your final report.
  *** Q. There were lots of words and term in the paper by Ames and Gold that 
  I have never heard of. But one question is torturing me the most: Who is Rachel 
  Carson?
  A. Rachel Carlson wrote a book published in 1962, Silent Spring. The book was 
  a benchmark in publics attitude toward environmental pollution, especially pesticides, 
  and most especially DDT. Here is a great website http://onlineethics.org/index.html 
  and an article about Carson. http://onlineethics.org/moral/carson/main.html 
  The Ames and Gold article was poking fun at her apparent main thesis, that chemical 
  are bad. Her book would have lacked its appeal to the mass media if it would 
  have spoke to the technical issues very well, besides many were not defined 
  well in 1962. (That web site might provide a great final exam question.)
  ***Q. Please clarify the response you were looking for in your question about 
  the "two commonly" used extrapolations to predict human health effects 
  based on animal testing. 
  A. See discussion in 10B. The answer was 1.) The extrapolation from high doses 
  to low doses in animals, and two the extrapolation from animal testing to humans. 
  This latter is not an "extrapolation" in the mathematical or scientific 
  sense. 
  *** Q. On the exposures page of submodule 10A, non detected chemicals were discussed. 
  When referring to the contaminated site that many of the samples do not report 
  a particular chemical, does this mean the value was zero or that the test for 
  a certain chemical might not have been done? 
  A. The lab reports should never show zero. They show "ND" for non-detect 
  or other codes. They mean the sample was presented to the lab. Some samples 
  misfire for one reason or the other and these are also coded. Generally the 
  risk assessor gets all the lab data, and then tries to arrange it so those questions 
  do not come up. That can be a big job on a complicated site. Be nice to your 
  chemist.
***Q. Also, if you plug zeros into ttest, the UCL will drop. Does this drop 
  mean a decrease or be eliminated to zero? 
  A. Plugging a whole bunch of the same number into a ttest will increase the 
  degrees of freedom and, if the number is close to the average, bring the UCL 
  in. Most environmental data has lots of zeros, and the important limits are 
  not that far from zero. So, how you handle the non-detects can skew you data 
  quite a bit. See next.
*** Q. What would be a abnormal distribution---something other than a bellcurve???
  A..The most common in the log-normal. Remember the normal tapers at both ends, 
  or has a more or less even distribution each side of the mean. Environmental 
  data has a minimum of zero and often is skewed to the low end. This is not a 
  normal distribution, but the log of it is. See, page 
  three of module 12E The first two figures, although they are talking about 
  risk, are similar to what environmental data often look like. You can clearly 
  see how the arithmetic data is skewed, but the log of it is normal. You should 
  not use the log directly in the t-test, but there are very similar tests that 
  you can use, to the same purpose, namely estimating the confidence that two 
  means are the same. 
  Another common "abnormal" data set is bi-model. That is, there are 
  two peaks. 
  *** Q. What is sensitivity analysis?
  A. When you have a model, or just an equation, some of the factors (parameters, 
  or unknowns) affect the answer more than others. Sometimes this is obvious, 
  such as the price of apples = 0.01X + 100Y^3, the price is much more sensitive 
  to changes in Y than to changes in X. (provided both X and Y are greater than 
  1.0). In other cases it is not obvious, so you play with the model, varying 
  one parameter at a time to determine if your result changes much. You may find 
  that the parameter you were worried about does not make much difference, so 
  a guess is good enough. Or the model might be very sensitive and you need to 
  be very careful what you use. Of course some parameters affect others, so that 
  is one reason a mathematical or statistician is a recommended part of the risk 
  assessment team. 
Q. I have one more question regarding the module 10 and the statistics refresher. How do you know what sample size would be appropriate for an assessment?
A. Ideally you "design" a study and consider the "power" of the sampling. I did not go into power, but if you are trying to distinguish if two population are "different," by which we usually mean different means, the greater the difference between their means and the smaller their standard deviations, the less samples we will need to determine if they are different. Based on our preliminary notions about the populations we can estimate how many samples are needed.
For real environmental sampling, we are hampered because the suspect contamination is likely just a portion of our study area. If it is obvious, a stain, it is easy to delimit "samples of the stain." If it is not obvious, there may be dozens of "areas" that are contaminated and many that are not. An old drill pad or gravel contractors equipment yard might have many stains related to specific spills. What then? From the regulator's viewpoint, if you try to use an average, it may average as below the action levels, but many spots within might be above the action levels, if you sampled that spot. Hot spots are typically the danger to on-site people and cleanup crews.
The EPA has several guidance documents about sampling, using grids and random numbers. You really have to use these, if you do not have any known hot spots. If you have known hot spots, you want to sample these separately and distinguish these from the "rest of the site." Then use the grid for the rest of the site. You can get such from RAGS and there are many others. How many samples per hot spot? You want to be sure to get several at depth. Surface, two feet and four feet are common. I would want at least one sample per 1000 CY of estimated contamination, but no less than 5 from each hot spot.
The usual scenario is that the regulators just want enough samples to prove there is a problem. Then it is up to the PRP to define the problem. Usually the PRP will present a sampling plan, which the regulators then accept or modify. Here the regulators just want to be sure the PRP samples each hot spot, then the PRP will take lots of samples to find the limits of the hot spot, otherwise the regulators will have them clean the whole thing.
For risk assessment for populations distant from the site, on the other hand, 
  it is usually the average contamination at the site that determines the risk, 
  not the hot spots. For these the grids work fine. 
Q. For the weight-of evidence classification implemented by the EPA and the 
  IARC, who decides what chemical belongs to what class? Also who decides what 
  the criteria are? 
  A. "Experts." For EPA the process goes something like this: In-house 
  experts decide that there is some need to classify a chemical or change a classification. 
  They get some budget together and contract with a beltway bandit-type company 
  to study the issue and draft a report or summary, based on the primary literature. 
  This report is then reviewed by a committee composed of experts usually hired 
  individually for this task from industry and academia. These experts write a 
  recommendation to the Administrator of the EPA (who I doubt sees it, actually 
  the same EPA people who started do read it.) EPA has a Scientific Review Board 
  also composed of outside experts that then reviews the recommendation. Since 
  these classifications are not regulations per se, I'm not sure what the public 
  comment is. Although I do not have personal knowledge of the people involved, 
  I would bet the selection of "experts" tends to be skewed towards 
  people who give the EPA the answers they want. There are many experts out there, 
  but most are busy people and not available to do this sort of consulting. 
  Q. Three methods were discussed for handling the parameter uncertainties. Using 
  the EPA defaults were seen as being the cheapest. Is this one used the more 
  than the second method? Or is there some sort of combination of the two? Is 
  the third one so expensive that it is often not used?
  A. Pay now or pay later. If using the default assumptions results in an answer 
  you like, you are home free. If they result in an expensive cleanup, you want 
  to check them closer, if you can.
  Q. Acrolor 1260 is considered 60% chlorine by weight, what about 1016? Does 
  that have the same naming convention? 
  A. Aroclor 1016 is an exception to the naming rule. It is a random mixture of 
  1 through 6 chlorines per molecule and is about 41% chlorine by weight.
Q. What is BTEX? 
  A. Benzene, Toulene, Ethyl benzene, and Xylene. These have 6,7 and 8 carbons 
  and are light, volatile aromatics. They are all considered toxic. 
Q. If the hydrocarbon groupings are useful for regulatory purposes, but are 
  not very valuble scientifically, than why are they used? I hope its not just 
  because its cheaper. 
  A. One man's trash is another man's treasure. 
  The toxicity needs to be evaluated for each chemical, which is impractical for 
  hydrocarbons, because most are mixtures that contain hundreds of chemicals. 
  The same is true, but to a lesser extent for fate and transport. So, it is practical 
  to "bundle" certain groups of chemicals together, "gasoline range 
  organics" GRO for example. And make some sort of rational decision about 
  it. For example, (I'm making these number up) "if the GRO in the soil is 
  less than 100 mg/kg, it is unlikely to be a significant risk to wells more than 
  500 feet away." If you wanted to be scientific, you would need to analyze 
  that GRO for its constituent chemicals, probably about 30, and do a risk analysis 
  for each. But why bother, unless you are working for a wealthy client or her 
  lawyer who want to prove that it is harmful, or wants to confuse a jury.
Q. Your professional opinion: As you have probably noticed, I have a problem 
  with computer models. Yet, I have done some biological and chemical work and 
  have used statistical analysis methods and I am able to accept what I get from 
  statistics, with some caution. Do you think that there is a large difference 
  between computer models and statistical analysis?
  OK, see articles above regarding sensitivity and error terminology.
Q. Under "Special Chemical", was there any reason why in particular 
  you chose the chemicals you did to discuss? I began thinking about that, and 
  there are many others that you could have mentioned or discussed
so I guess 
  I am asking what was the madness behind it? Was it just because we encounter 
  these chemicals more often? I suppose these are the killers, so to speak.
  A. "...and all the children are above average." We are all special 
  and so are our chemicals. These particular chemicals are mentioned because they 
  are very common chemicals in hazardous waste sites (but not fresh spills) and 
  their risks are computed using a slightly different algorithm then most chemicals. 
Does statistics fully consider the uncertainty parameters? Or is it a mere 
  jugglery of numeric data.
  I think if it fully depends on the data collected then our results fully depend 
  on how the data is collected (I mean without any errors).
  I suspect any small error in data collection may lead to big mistakes in statistical 
  drawings. Thus statistics should also consider that.
  OK, see articles above regarding sensitivity and error terminology.