Comments on Does Statistical Significance Stink?TypePad2011-01-05T06:26:02ZMark Thomahttps://economistsview.typepad.com/economistsview/tag:typepad.com,2003:https://economistsview.typepad.com/economistsview/2011/01/does-statistical-significance-stink/comments/atom.xml/Calvin commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e2017ee56a33c8970d2012-11-20T00:48:48Z2012-11-20T00:48:48ZCalvinhttp://zuwengo.comIt's not always what makes sense logically. Also, stats can be very misleading.<p>It's not always what makes sense logically. Also, stats can be very misleading.</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7673654970c2011-01-08T01:13:36Z2011-01-08T01:13:36ZAntoniohttp://omwo.blogspot.com> to pretend that once those priors are plugged in, >the results are somehow more "objective". They >aren't, of course...<p>> to pretend that once those priors are plugged in, >the results are somehow more "objective". They >aren't, of course</p>
<p>It is not about being more "objective", it is about answering the question that is actually being asked -and in a self-consistent way. (The "subjective/objective" red herring is useless unless you define what those words mean mathematically - and once you do it, it really is very simple).</p>
<p>It is reasonable to ask that a method be justified at least within its own theory. </p>
<p>Now, calculating posteriors from priors follows directly from the axioms of bayesian theory.</p>
<p>But calculating p-values and using them to decide on a hypothesis does not follow from anything in classical probability. It is an ad-hoc statistical method. If you want to test the hypothesis then what you want is the probability of that hypothesis given the data. The p-value is *not* that probability, as is well know. Hence, it does not answer the question. It is a probability of something else - of a value of the data even more extreme being observed *given* the hypothesis. This is an ad-hoc method that in some cases is related to the actual probability we care for but can't calculate within the framework, and in other cases it may break up badly. But nobody denies it is an ad-hoc method: it does not follow logically from the axioms, cannot be deduced from them, it is just proposed as a solution that seems to make some sort of sense, *some* of the time. It is used simply because classics can calculate p(data>x|H) but have no way of calculating p(H|data), so they calculate what they can. It is a case of looking for the keys under the lamp because you can see better there, even though that's not where you lost the keys.</p>
<p> This part of the problem has nothing to do with the choice of arbitrary cut-off values or priors. And it has nothing to do with bad practicioners (both fields have those, it is another red herring to argue about practioners). It is about bad *methods*.</p>
<p>Here is another related difficulty exclusive to classics: Since the p-values are not the probabilities of the hypothesis H conditioned on the data, even if you attribute losses to each outcome, you are still limited because you don't know what the probability of each outcome is. Hence you cannot really map your decision costs to the probabilities that matter, only to values whose meaning is not the desired one - as the Jedi would say, "these are not the probabilities you are looking for".</p>
<p>You are calculating p(data>*|H) when what you need is p(H|data). No matter how much you squirm within that frame there is no way to deny the fact that you simply do not have anything equivalent to the bayesian case. Hey, the H is just on the wrong side! Pretty simple to see.</p>
<p>>They aren't, of course </p>
<p>Is that a theorem?</p>
<p>>(at least, not in cases like these</p>
<p>Ah!....at least not in...</p>
<p>"the theorem is true under the conditions necessary for it to be true"</p>
<p>What objective criteria does one have to actually know when that supposed equivalence applies? One asks the expert? The attraction of classical statistics: the inner sanctum defined by how many ad-hoc recipes and arbitrary conditions of application your stomach can muster to learn by rote in order to claim expert status. "I have found over the years that a correcting term of X works well for a sample size of around Y". In other fields one learns theorems instead. </p>
<p>It is one thing to state that in this case the problem followed from a bad application of rules.</p>
<p>It is another to state that two methods are equivalent when they are clearly not, and to deny a problem of consistency that is both real and well known. And it is simply amazing that a whole field should simply choose to stick to an old method after an inconsistency has been shown and solved simply because it is the old method and it *sort* of works *most* of the time. It is only in statistics proper that this routine seems acceptable, in grownup fields of mathematics and physics this sort of nonsense consistently makes heads shake in disbelief. It is only statisticians that deny their own disrepute.</p>
<p>Damn those "obnoxious" bayesians, always coming around and claiming time-honoured methods are wrong just because they happen to be wrong. Really no sense of decorum.</p>Darren McHugh commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e15d432c970b2011-01-08T00:04:46Z2011-01-08T00:04:46ZDarren McHugh"The conflating of "significance" meaning importance, and "significance" meaning likely to be a real effect as opposed to random sampling...<p>"The conflating of "significance" meaning importance, and "significance" meaning likely to be a real effect as opposed to random sampling error, in the article is unfortunte."</p>
<p>You don't seem to realize that this conflation of the two meanings of the word 'significant' is endemic to statistical practice and teaching. It is by no means confined to this article.</p>Darren McHugh commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c766da88970c2011-01-08T00:02:49Z2011-01-08T00:02:49ZDarren McHugh"If it is not statically significant is not relevant" "Relevant" to what? To a decision you are making?Your decision rule...<p>"If it is not statically significant is not relevant"</p>
<p>"Relevant" to what? To a decision you are making?Your decision rule is wrong. This is the classical error, in a nutshell.</p>
<p>The truth of the matter, of course, is that depending on the relative costs of type I and type II error, the "insignificant" result may be very relevant to the decision indeed.</p>NotMarkT commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7645fd1970c2011-01-07T16:43:18Z2011-01-07T16:43:18ZNotMarkTAnother case where the courts have latched onto a bright line statistical rule is in equating "preponderance of evidence" with...<p>Another case where the courts have latched onto a bright line statistical rule is in equating "preponderance of evidence" with a minimum relative risk of 2.0 in epidemiologic studies under Daubert review of the admissibility of causal scientific evidence. (The relative risk is the relative frequency observed in the exposed group divided by the relative frequency observed in the unexposed group.) In practice, there are often conflicting epi studies involving different populations, methods, etc. that make it difficult to confidently assert that the true value is above or below the threshold.</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e159b80d970b2011-01-07T13:54:09Z2011-01-07T13:54:09ZAntoniohttp://omwo.blogspot.comCorrection above: Not "if A implies B and A then B becomes more probable", ", meaning, $p(A|B,I)\geq p(A|B)$. (GOD what...<p>Correction above: Not</p>
<p>"if A implies B and A then B becomes more probable", ", meaning, $p(A|B,I)\geq p(A|B)$.</p>
<p>(GOD what a mess!! I was in a hurry so by habit started writing modus ponens automatically then just skidded off the road completely) but:</p>
<p>"if A implies B, and B, then *A* becomes more probable" , meaning $p(A|B,I)\geq p(A|I)$, where "I" is any other info you have.</p>
<p>Sorry , bit of a rush, I'm writing too much precisely because I don't have enough time to condense right now. Hope something still makes sense. Fully understand if TL;DR</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e15906a3970b2011-01-07T11:51:40Z2011-01-07T11:51:40ZAntoniohttp://omwo.blogspot.comNow regarding the part where i agreed with you: even if you are a bayesian, and you state priors, and...<p>Now regarding the part where i agreed with you:</p>
<p>even if you are a bayesian, and you state priors, and then calculate the posterior probability, you can still, if you want, be an idiot and decide on a magical cutoff level, just as in classical statistics. So, yeah, although it *wouldn't be* equivalent (because at least you are cutting off on probability values of what you are actually testing and not on the p-value which is the probability of something altogether different), you can still screw up in a similar fashion. *BUT* if you do that you clearly going against the very basis of you discipline. I get students who insist upon doing that kind of thing because they are so used to it from using p-tests. It gets so bad that I did this: I spend the start of the semester just teaching how to calculate posteriors and I refuse to allow any estimators or general reductions of information, or any conclusions except "the probability of this model being the right one, assuming this prior, is such and such" but never allowing any "therefore we accept this model" or "we estimate this to be the 'best' value". Only later do I allow for that, always stressing that "best estimate" has to answer "best for what?". The thing is, the formalism *allows* this. You cannot do that in classical statistics. The very basis of that formalism is giving you estimates of parameters, results of tests for models, hypothesis, etc. In Bayesian, I can totally separate the two aspects. A student can later screw it up completely, but he has to go out of his way to do it, and really what he does is technically *just wrong*, no matter if he calls it "bayesian" or not. That is a huge difference.</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e158ea33970b2011-01-07T11:30:12Z2011-01-07T11:30:12ZAntoniohttp://omwo.blogspot.comi hate saying "the probability of you hypothesis"...mixing the two formalisms here...but you get my drift.<p>i hate saying "the probability of you hypothesis"...mixing the two formalisms here...but you get my drift.</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7628b0e970c2011-01-07T11:27:28Z2011-01-07T11:27:28ZAntoniohttp://omwo.blogspot.comOk...wait a minute...on second thought it seems I didn't get you wrong the first time... when you state "usual p-test...<p>Ok...wait a minute...on second thought it seems I didn't get you wrong the first time... </p>
<p>when you state "usual p-test (...) *logic*" you just mean it as an informal phrase, right? You mean "the usual logic behind using p-tests". Ok, then, that's what I understood the first time.</p>
<p>And you were stating that my first post was wrong because actually *neither* p-tests (even when properly used) nor bayesian methods (which, you think, just formalize the same thing) get away from being what you call "arbitrary" and "subjective".</p>
<p>Am I right about this or am I failing to get your meaning?</p>
<p>If I am right about this, then my first post stands. I disagree, and in fact your statement is factually wrong. You state both methods are "arbitrary" and "subjective". It is factually wrong, because *both* methods *are* subjective BUT bayesian methods are *NOT* arbitrary unlike p-tests.</p>
<p>1-Both methods are *subjective* (I agree with you here) in the sense that they depend on *priors* (stated, in the bayesian case, unstated in the other case). BUT! - you always have to have priors, because not even in logic you can start without initial assumptions, and the priors are the equivalent of that in probabilistic terms. </p>
<p>2-Bayesian methods are *not* arbitrary as p-tests. Why? Because you can prove that bayesian logic ("logic" used as a formal term here) is the only generalization of classical logic that is consistent with "common sense" (See Cox's theorem for the formal requirements of "common sense"). The product and sum rules of Bayesian prob. are the formalization of weak sillogisms of the type "if A implies B and A then B becomes more probable", meaning, $p(A|B,I)\geq p(A|B)$.</p>
<p>Now, again, unless you mean by "p-test logic" something more than the usual meaning of p-tests, you simply cannot state equivalence of methods. You still, whether using p-tests "properly" or not, cannot calculate posteriors. You still cannot calculate directly the probability of your hypothesis. Am i wrong on this? If so, why? If I am wrong, then tell me why specifically how do you calculate the posterior? If you can't, then how do you state the theories are *equivalent*? Can you calculate the actual probability of you hypothesis being correct? That is certainly not what classical statisticians usually claim, and certainly not what I have learned anywhere.<br />
</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e158a052970b2011-01-07T10:42:07Z2011-01-07T10:42:07ZAntoniohttp://omwo.blogspot.com>igh. So you snip the part after "p-test" - by accident, right? - er....what are you sugesting, that I was...<p>>igh. So you snip the part after "p-test" - by accident, right? - </p>
<p>er....what are you sugesting, that I was trying to quote you out of context...when your comment was ...right above mine? Here is the reason: I didn't *read* that word. It was separated by that parenthesis and somehow I missed it, so I thought you were talking about a p-test and answered that. Now, you could just have pointed that out politely and I would have said "oh, sorry about that, I really am not familiar with ´p-test logic´, could you elaborate´? Instead I don't feel much like apologizing now since you are acting a bit like a...you know. But I will apologize anyway, not because you deserve it, but because I am that kind of guy. </p>
<p>Sorry.</p>
<p>>(which, yes, does cover Type I and II errors, their significance,specificity vs sensitivity, etc) (...)</p>
<p>Does it allow you to set the prior assumptions as probabilities over logical propositions and calculate the posteriors of the parameters, and separately assign the utility functions? You said it was all the same, that is what I questioned. Is it? Honest question.</p>
<p>>Then you go on about me being "completely wrong"</p>
<p>About a specific assertion. (I didn't kill your dog or insult your mother or state that you are generally a loser in life, so what's up with the defensiveness?)</p>
<p>>but then allow later on that yeah, I'm not really wrong,</p>
<p>About another assertion.</p>
<p>>Why do I get the impression you're some sort of >libertarian wank? </p>
<p>Something to do with your priors? (Or because...OhmyGod, are you a communist???? Now, that was silly, wasn't it?)</p>
<p>>One who, furthermore, hasn't had much in the way >of formal statistics training, let alone has ever >taught a class in statistics?</p>
<p>Taught sampling theory for...about 7 years now, I think, and started the first bayesian course at my department, and next year I may teach it at the masters program, not that I need to tell you this, answering the arguments should be enough without going into personal insults, much less diverging into speculations on political affiliations. I guess that just goes to illustrate how not separating your priors from your values makes for bad logic?</p>
<p>> (said instant expertise also a hallmark >libertarian trait)</p>
<p>And where did you get that instant agressiveness towards someone who disagrees with you? Too much caffeine?</p>
<p>Did I even state I was an expert? Does it matter if I am? Again, answer the arguments (or don't, if you don't care to) and leave out the silly speculation. I really don't need this. If you must know, I made my undergrad in physics and my grad in algebraic geometry before I started dabbling in statistics, so I happily agree that there's lots I don't know - I know very well some areas of statistics and am totally ignorant of whole other fields, so I talk about the former and listen regarding the latter. I came to bayesian methods because for I was taught classic methods in physics and they really, really sucked. I stumbled into bayesian methods and things became clear: it was clean, it was consistent, it was logical, that's all. But if I find a better theory tomorrow I'll dump it without a thought, I don't have my ego invested in it like some classical statisticians seem to do, perhaps because they hope never to have to learn anything new again, or to have to say they were wrong (a pretty strange habit of mind if you choose science as a way of life, I'd say). So if you care to teach me something/give me references I'll certainly listen rather than growl at you.</p>
<p>Regarding arguments: All I stated in my first messsage was that bayesian methods handle the problem that relates to the original post we are discussing. You answered me stating that you disagreed with my post. But apparently you state that your own preferred theory handles it just as well as bayesian methods (since they are "just a formalization" of the way your preferred method works). If that is your point (whether you are right about it or not) than in what way were you really disagreeing with my original statement? Seems like you just wanted to insert an unrelated rant against bayesian methods and a plug for your own pet, so I kind of feel dirty and used. :p</p>
<p>Again, please either clarify in a way that teaches me something or if you feel like raging and insulting someone just spare me.</p>ScentOfViolets commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7614c6b970c2011-01-07T07:06:24Z2011-01-07T07:06:24ZScentOfVioletsActually, in cases like this, it's six of one, half a dozen of the other. Let me cite something from...<p>Actually, in cases like this, it's six of one, half a dozen of the other. Let me cite something from the wiki which is mostly correct:</p>
<p><br />
"When comparing two hypotheses and using some information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis at a particular significance level, and frequentists would all agree that the hypothesis should be rejected or not at that level of significance.[dubious – discuss] However, there is no normative methodology to choose levels of significance.[citation needed] Bayesian methods would suggest that one hypothesis was more probable than the other, but individual Bayesians might differ about which was the more probable and by how much, by virtue of having used different priors; but that's the same thing as disagreeing on significance levels"</p>
<p><br />
There's a lot of "dubious - discuss" tags there, I know, but the takeaway is that in situations like this, the differing approaches result in identical outcomes. Which is as it should be - these are interpretations after all!</p>
<p>The problem with the Bayesian approach (aside from - I must say - the obnoxiousness with which it is pushed by a lot of people who apparently haven't had much formal instruction in statistics) is that both approaches have a strong subjective component, in one case on what (roughly) p-value to use, in the other, what priors go into the model. Now, deciding on which p-value to use makes the judgment call pretty up front and explicit. However - ain't it always the case - there is a regrettable tendency in those using the Bayesian approach to pretend that once those priors are plugged in, the results are somehow more "objective". They aren't, of course (at least, not in cases like these), and that's a real danger. Sadly, there's also a lot of ego involved in getting one's model of the priors accepted as being the "correct" one.</p>
<p>Let me emphasize that in this particular situation, either approach will work, and work equally well . . . if they are done right. But is clear here is that what is considered the traditional approach wasn't done right, in fact, looks as if it was deliberately short-circuited. Trying to suggest that a Bayesian approach would have prevented this from happening is not just wrong, it's wrong-footed. Which again should be obvious, because the defendants are using the "who could have known" defense - in the counterfactual Bayesian suit, the attorneys would be using the same defense, only it would be applied to the model of the priors, not to the p-value.</p>Rick commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e157326b970b2011-01-07T05:31:50Z2011-01-07T05:31:50ZRickhttp://economistsview.typepad.com/economistsview/2011/01/does-statistical-significance-stink.htmlI haven't looked at the case so what I say is speculative but assuming the issues are captured in the...<p>I haven't looked at the case so what I say is speculative but assuming the issues are captured in the following quote </p>
<p>"Should drug manufacturers and other companies be required to report the adverse effect of a product on users, if the effect is not statistically significantly different from zero at the 5% level? Can people file claims against companies which fail to warn shareholders about adverse effects that are not statistically significant at the 5% level?" </p>
<p> there are some observations one can make. The first is that the case may turn on law and precedent more than statistics, for the Court follows its own muse. Second the questions are poorly phrased, for they reflect a magic view of statistics and significance level as they seem to posit as a possible answer that, for legal purposes at least, if an effect is not significant at the .05 level it does not exist and if it is present at the .05 level it exists and is important. The questions the law should be asking is when based on the evidence is the possibility of a serious side effect sufficiently likely or, if not necessarily likely, so incapable of being rejected with sufficient confidence given the harm caused if the side effect does exist that there is (should be) an obligation on the part of the drug manufacturer to report it to the FDA and/or to potential consumers. One may use classical statistics to get a handle on this but classical statistics does not entail always applying a .05 decision rule. As has been pointed out sample size matters greatly in deciding whether the evidence provides a reasonable basis for rejecting the null hypothesis, and the cost of type one and type two error should also figure in. In this case, however, if I have been properly educated by the above discussion, a Bayesian approach or even a Bayesian perspective on classical results has much to commend it. If it is long established that zinc can destroy a person's sense of smell, the chance that loss of smell found after exposure to zinc was caused by the exposure is and should, as a subjective matter, be considerably higher than would be estimated if there were no a priori scientific or experimental reasons to think this effect might exist. And a company or scientist whose subjective judgment based on the evidence and prior knowledge suggests a reasonable likelihood of danger should report this, or at least this is what I think the legal system should require.</p>cm commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7605959970c2011-01-07T03:44:15Z2011-01-07T03:44:15ZcmGood point. But I suspect in the end it comes down what is the meaning of a word versus merely...<p>Good point. But I suspect in the end it comes down what is the meaning of a word versus merely a connotation. "Self-centered" certainly eliminates some connotations.</p>
<p>Other than that, it's probably a pretty thin line. I'm inclined to believe that people who care about others' well-being also generally have a circle/peer-group of people whose opinion (in general and pertaining to themselves) they consider important. I'm not an expert in nonconformists, but it seems to me that people who don't care much what *everybody* thinks about them also don't care much in general. But maybe that's not what you meant.</p>
<p>As for (non)conformism, that's such a minefield of loaded terms that I'd rather not step into it.<br />
</p>ScentOfViolets commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e1560cf3970b2011-01-07T01:39:48Z2011-01-07T01:39:48ZScentOfVioletsSigh. So you snip the part after "p-test" - by accident, right? - so that your quote doesn't read "p-test...<p>Sigh. So you snip the part after "p-test" - by accident, right? - so that your quote doesn't read "p-test logic" (which, yes, does cover Type I and II errors, their significance, specificity vs sensitivity, etc). Then you go on about me being "completely wrong" but then allow later on that yeah, I'm not really wrong, but I didn't say my words right . . . or at least I didn't say them in the way you liked.</p>
<p>Why do I get the impression you're some sort of libertarian wank? One who, furthermore, hasn't had much in the way of formal statistics training, let alone has ever taught a class in statistics? (said instant expertise also a hallmark libertarian trait)</p>
<p>In any event, it's too close to New Years for me to break any of my resolutions by wasting any more effort replying to an obvious time waster, so all I'll say is "Grow up."</p>Patricia Shannon commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e15579f0970b2011-01-06T23:45:04Z2011-01-06T23:45:04ZPatricia Shannonhttp://www.patriciashannon.blogspot.comI would say some who is individualistic is not a conformist. But they aren't necessarily anti-conformist. They do things in...<p>I would say some who is individualistic is not a conformist. But they aren't necessarily anti-conformist. They do things in a way that makes sense to them. They don't care a lot what people think about them, but that doesn't mean they don't care about the welfare of other people.</p>jrossi commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e1550880970b2011-01-06T22:20:03Z2011-01-06T22:20:03ZjrossiAnother example of pure evil from our friends in Pharma. Every stinkin' medical student in this country takes a biostatistics...<p>Another example of pure evil from our friends in Pharma. Every stinkin' medical student in this country takes a biostatistics class in the first or second year (or if they don't, they should; I know they take at Univ. of Washington anyhow) that pounds into their brains the difference between clinical significance and statistical significance. Some things are the former but not the latter--appendectomy for appendicitis (never been subject to a trial for statistical significance, for obvious reasons), others are the latter but not the former(living and extra month longer in the ICU when given drug X instead of drug Y). </p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e153da8e970b2011-01-06T18:47:13Z2011-01-06T18:47:13ZAntoniohttp://omwo.blogspot.com>The problem here is that those utility functions and >priors are just formalizing the usual p-test This is just factually...<p>>The problem here is that those utility functions and >priors are just formalizing the usual p-test </p>
<p>This is just factually wrong. I don't know what you've been reading, but this is just not true. </p>
<p>A p-test looks at how likely something is to have happened assuming a null-hypothesis. Meaning, you are looking at the wrong thing already! If what you want is the probability of the null-hypothesis being true/false, then why are you using that as the condition instead of the main proposition? Because you can't do it the right way around, since you refuse to make the inversion through bayes theorem, and that in turn because you refuse to state your priors clearly. And all of this because you state that some things are random variables and others aren't (god knows why). Listen, it is not "formalizing p-values", because, for one, I can calculate the posterior distributions of variables that you don't even agree have distributions at all, so clearly there is no possible equivalence. There is a whole different point of view: Classical statistics is bean counting - it is about frequencies and repeatable experiments; Bayesian probability is the generalization of classical logic - it gives you the probability of a proposition being true assuming the truth of your priors, whether these are related to frequencies or not. </p>
<p>> and are just as arbitrary and prone to subjective judgment</p>
<p>There is no way of not being "subjective". What classics call being "subjective" is just "being subject to the hypothesis in use" - but in those terms, logic itself is "subjective". Just as in logic, there is no way to reason non-tautologicaly without making assumptions. The difference between bayesian and classic is that bayesian formalism forces you to state all your hypothesis in your model and prior, while classic statistics states the model but hides the hypothesis related to the prior under the mechanics of the arbitrary "test" or "statistic". The problem with that is that you can't quite make out what the assumptions are, you just see them very vaguely through the machanics of the test. In Bayesian, you can just look at the prior and say "I don't buy your prior", and then we know where you differ, and we can run it though your prior and so on.</p>
<p>Now, p-tests and all that jazz hide and mix together the priors *and* the utility. Perhaps what you meant by "just formalizing" is that bayesian prob. "just" separates and exposes the two of them? Well, then your meaning of "just" has a pretty large order of magnitude. On the order of "Relativity theory is *just* classical realtivity with a different invariant form". :)</p>
<p>Bayesian probability is actually a pretty modest theory: unlike classical statistics it says "I can't really decide". You feed it the assumptions (priors) and it gives you the probability of something being true but doesn't tell you what that probability means you should do. You have to specify what each outcome means to you in terms of cost. And that means you have to go out of probability and into bayesian decision theory.</p>
<p>I cannot stress enough how important it is "just" to make those separations! You *cannot* avoid assuming things and making *arbitrary* attributions of value to each possible outcome. What you *can* do, is:</p>
<p>1) make sure that all those assumptions are logically self-consistent</p>
<p>2) separate and clearly state what your assumptions are, what you criteria of choice is, so that, when a conclusion is achieved, you know where it comes from. </p>
<p>*That* is what you get from Bayesian formalism. That, and a formalism that at least seems to be internally consistent.</p>
<p>>About the only good thing I can say for Bayesian >analysis at this point is at least they >acknowledge the existence and significance of Type >I and Type II errors and their differential >impact.</p>
<p>I really don't know what you've been reading, but this is just a strange thing to say.</p>
<p>>Anyone else know a certain discipline where models >are confidently trotted out, numbers plugged in . >. . and totally wrong predictions come out?</p>
<p>Classical statistics? :))</p>
<p>Now seriously, you must be kidding. From astrophysics to sound and image processing, to genetics, there's a large body of probabilistic bayesian methods that are known to be more efficient than their classical equivalents, and, more important, there are methods that work where the classical ones can't even formulate the problems themselves. That was the reason people in those areas took up bayesian methods, when really they didn't care either way about statistis per se - they did it beacause it was the only thing that did the job.</p>
<p>Certainly I've noticed you can find a lot of nonsense with the bayesian buzzword attached to it; if you've been reading that nonsense (and I've seen some horrible stuff) then I can understand where you come from. But you can say that about any buzzword. Mostly, there's a bunch of new converts from classical statistics that were better not converted. They have arrived and written a lot of nonsense and invented a lot of "tests" and "estimators" just like they used to do in their old field, and they attach the term "bayesian" to it, and it gets to be a mess...well, what can one do? </p>
<p>I recommend you really read through the older stuff, like Jaynes book ("probability, the logic of science"- which you can read free online, I think) to see how things are really simple and elegant when you don't screw it up with unrelated ad-hoc methods, and, most of all, I think you'll see it really doesn't conform to the notions you seem to be referring to. </p>cm commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e15050da970b2011-01-06T07:30:32Z2011-01-06T07:30:32ZcmPlus the ads will have to be even longer to list all the additional disclosed side effects, which often enough...<p>Plus the ads will have to be even longer to list all the additional disclosed side effects, which often enough sound more debilitating than the thing the medicine is supposed to address.</p>cm commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c759e634970c2011-01-06T07:27:21Z2011-01-06T07:27:21ZcmThe issue of whether the trial methodology, data interpretation, or whatever else was flawed is probably only relevant to a...<p>The issue of whether the trial methodology, data interpretation, or whatever else was flawed is probably only relevant to a possible neglect or malfeasance aspect. If a causality between taking the drug and the damage can be proven at court standards, the company will still be liable for tort. This will not require incompetence or adverse intent.<br />
</p>cm commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e1504498970b2011-01-06T07:20:21Z2011-01-06T07:20:21ZcmI believe there are no longer *any* accepted tags, for some time. Something that looks like an http URL will...<p>I believe there are no longer *any* accepted tags, for some time. Something that looks like an http URL will be expanded into a link, but I'm not sure any a tags of your own will be honored. You have to quote or emphasize using text characters.<br />
</p>cm commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c759d7f5970c2011-01-06T07:15:55Z2011-01-06T07:15:55ZcmOther than being more descriptive, what do you think is the difference?<p>Other than being more descriptive, what do you think is the difference?</p>ScentOfViolets commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14ee751970b2011-01-06T02:47:45Z2011-01-06T02:47:45ZScentOfVioletsZicam is a tricky case because it presents a hybrid problem. First, suppose that there is a problem with a...<p>Zicam is a tricky case because it presents a hybrid problem. First, suppose that there is a problem with a drug, how material a hit to the business profits would that be? Second, what is the likelihood that undue skepticism in drug testing contexts is going to cause the drug to have a regulatory problem that is in turn an economic problem for that product? Third, how likely is it that there actually is a problem with the drug that wasn't revealed in studies and is that likelihood so low that a reasonable scientist doing that kind of work would dismiss that risk as not empirically supported by lab tests? (i.e. were the scientists unduly skeptical in their drug testing)?</p>
<p>This is the well-known problem of balancing of making a Type I error vs a Type II error. Or rather it's the problem of sensitivity (the ratio of true positives to true and false positives) coming at the expense of specificity (the ratio of true negatives to true and false negatives.) Ideally, it's possible to push both numbers as close to 100% as you like (by increasing the size of the sample). In practice, one almost always comes at the expense of another.</p>
<p>So it's possible to game the testing procedures by having tests that are very sensitive to postives, but not very specific, ie, by desiging the protocols to minimize the number of false negatives at the expense of increasing the number of false positives.</p>
<p>And vice versa of course when you're testing for safety. Which is what these guys may have done. I'm not saying they did, of course, I'm merely being long-winded about the usual cant concerning liars and statistics.</p>NotMarkT commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14e631d970b2011-01-06T01:14:45Z2011-01-06T01:14:45ZNotMarkTThere is a need to distinguish between the "clinical significance" or severity and permanence of adverse events and their "statistical...<p>There is a need to distinguish between the "clinical significance" or severity and permanence of adverse events and their "statistical significance." In many cases, adverse effects are anticipated, perhaps based on pharmocology, but the potential benefits (e.g., for life threatening conditions) are thought to outweigh the downside risks and clinical trials are allowed to proceed. Second, it is well known that clinical trials are too small to reliably detect rare adverse events. This is why there is a post-market adverse event reporting system.</p>
<p>Regarding the failure to report adverse events observed during a clinical trial, FDA's non-binding "Guidance on Statistical Principles for Clinical Trials" (<a href="http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM129505.pdf)" rel="nofollow">http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM129505.pdf)</a> while not mandatory, is pretty clear:<br />
“Later phase controlled trials represent an important means of exploring, in an unbiased manner, any new potential adverse effects, even if such trials generally lack power in this respect.... All adverse events should be reported, whether or not they are considered to be related to treatment.”</p>
<p>Institutional review boards that perform the "ethics" oversight of human trials often complain that they are swamped with adverse event reports.</p>Patricia Shannon commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14db6fb970b2011-01-05T22:59:24Z2011-01-05T22:59:24ZPatricia Shannonhttp://www.patriciashannon.blogspot.comThere have been numerous cases of drug companies hiding adverse results.<p>There have been numerous cases of drug companies hiding adverse results.</p>Patricia Shannon commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7574a39970c2011-01-05T22:57:32Z2011-01-05T22:57:32ZPatricia Shannonhttp://www.patriciashannon.blogspot.comI see "individualism" used a lot when I think the proper term should be "self-centeredness".<p>I see "individualism" used a lot when I think the proper term should be "self-centeredness".</p>ohwilleke commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7571788970c2011-01-05T22:18:39Z2011-01-05T22:18:39Zohwillekehttp://washparkprophet.blogspot.comThe conflating of "significance" meaning importance, and "significance" meaning likely to be a real effect as opposed to random sampling...<p>The conflating of "significance" meaning importance, and "significance" meaning likely to be a real effect as opposed to random sampling error, in the article is unfortunte.</p>
<p>Yes, 5% (aka two sigma) is an arbitrary number. In physics, nobody even pays attention to finding until they hit the three sigma level, and results are considered established until you hit five sigma. In other circumstances (for example, when you must make one of two choices no matter what), insisting on two sigma support for an option is excessively skeptical.</p>
<p>The legal concept of materiality at issue in the case also involved both probability and stakes in the issue. An almost certainly wrong $3 mistatement on a corporate earnings report of $3 billion dollars is still immaterial. A modest possibility of a $600 million dollar missatement on a corporate earnings report of $3 billion dollars is material. Using a 5% threshold to sweep real risks of high stakes problems under the rug is a problem. </p>
<p>The appropriate statistical significance cutoff in a given situation depends on context. Five percent is a good number for lots of circumstances, but shouldn't have the force of law. </p>
<p>Zicam is a tricky case because it presents a hybrid problem. First, suppose that there is a problem with a drug, how material a hit to the business profits would that be? Second, what is the likelihood that undue skepticism in drug testing contexts is going to cause the drug to have a regulatory problem that is in turn an economic problem for that product? Third, how likely is it that there actually is a problem with the drug that wasn't revealed in studies and is that likelihood so low that a reasonable scientist doing that kind of work would dismiss that risk as not empirically supported by lab tests? (i.e. were the scientists unduly skeptical in their drug testing)? </p>
<p>Before the U.S. Supreme Court the next issue is to what extent questions one, two and three are questions of law (i.e. what is the legal standard used to answer each), and to what extent are questions one, two and three questions of fact to which you defer to the finder of fact without creating precedents for future cases.</p>
<p>The easy out for the U.S. Supreme Court is to call all three of the questions factual issues to be decided by juries (or by judges in bench trials) on a case by case basis.</p>Min commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14d5876970b2011-01-05T21:47:29Z2011-01-05T21:47:29ZMin"If it is not statically significant is not relevant. I don't want people filing and winning lawsuits based of flukes,...<p>"If it is not statically significant is not relevant.<br />
I don't want people filing and winning lawsuits based of flukes, pure randomness and background noise, and low probability events. Nothing is perfect and carries risks. Are more people helped or harmed by a product?"</p>
<p>That is a policy decision. I tend to agree with you. :)</p>
<p>However, it was well known, without making clinical trials, that pain was an indicator that the zinc was damaging nasal cells, including smell receptors, and that immediate steps needed to be taken to prevent further damage. It was also known that permanent loss of the sense of smell was possible in such cases. </p>
<p>Now, many parents, my grandmother included, believe that medicine often tastes bad or causes discomfort, even pain. Without a warning, such parents might continue treatment, telling the child, "It's good for you." Others might stop administering the medicine, but not realize the need to wash out the nose immediately. For Matrixx not only to fail to give a warning, but to fail to give instructions about what to do in case of pain was highly irresponsible.</p>Min commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c756ceb8970c2011-01-05T21:28:38Z2011-01-05T21:28:38ZMin""which evidently causes some users to permanently lose their sense of smell. " Matt Young: "What evidence does he have?...<p>""which evidently causes some users to permanently lose their sense of smell. "</p>
<p>Matt Young: "What evidence does he have? None, he just pointed out he had no evidence."</p>
<p>Poor writing, perhaps. The Matrixx data do not provide sufficient evidence that zinc can cause permanent loss of the sense of smell. Other data do.</p>Alex Tolley commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14cae27970b2011-01-05T19:55:08Z2011-01-05T19:55:08ZAlex TolleyI wasn't aware that drug companies had a choice in reporting adverse effects. If the drug is approved, physicians certainly...<p>I wasn't aware that drug companies had a choice in reporting adverse effects. If the drug is approved, physicians certainly will.</p>
<p>It therefore seems foolish to not report any adverse reactions, especially if subsequent post release work shows the adverse reaction is significant.</p>
<p>I would hazard a guess that the manufacturer wants to remove such a finding because they think that this will inhibit sales. What would be better is that all incidence probabilities are documented so that physicians and patients can assess the risks for themselves. Clearly a chef, parfumier or wine taster will want to know the risks as the impact on their livelihoods is so large. </p>Andy Harless commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7561a5d970c2011-01-05T19:32:58Z2011-01-05T19:32:58ZAndy Harlesshttp://blog.andyharless.com/I'm happy with classical statistics in practice, but there need to be two different criterion levels. One says, "This result...<p>I'm happy with classical statistics in practice, but there need to be two different criterion levels. One says, "This result does not appear to be due to chance." That could be the conventional 5% level (but maybe it should be stronger, like 1%). The other says, "This result could very easily be due to chance, but it's interesting enough to be worth reporting." That criterion has to depend on the cost of ignoring the result if it turns out to be real. In the end it has to be a judgment call (whether or not you formalize the judgement with Bayesian priors and utility functions). But for effects that could do serious damage, the appropriate criterion for that would be maybe more like the 20% level or possibly even weaker. The "5% or it didn't happen" rule is blatantly irrational.</p>
<p>Such a duality of criteria ought, I believe, to be reflected in future legal (as well as academic) standards. However, without some evidence of "consciousness of guilt," I would be hesitant to hold a business responsible for ignoring results that don't rise to the level of currently established legal/academic standards. I can't see expecting businesses, whenever they face a decision with potential legal ramifications, to debate what the relevant standards possibly ought to be rather simply responding to what they are.</p>CraigDB commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14bb6ab970b2011-01-05T18:19:13Z2011-01-05T18:19:13ZCraigDBIf it is not statically significant is not relevant. I don't want people filing and winning lawsuits based of flukes,...<p>If it is not statically significant is not relevant.</p>
<p>I don't want people filing and winning lawsuits based of flukes, pure randomness and background noise, and low probability events. Nothing is perfect and carries risks. Are more people helped or harmed by a product? </p>
<p>You give a million people some drug,(or really any product), and it will kill a few of them directly or indirectly. Sorry, it sucks to draw the short straw but that is life and I and hopefully the courts do not think your are entitled to damages. </p>Matt Young commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14b9708970b2011-01-05T17:55:44Z2011-01-05T17:55:44ZMatt YoungWe have the Savage-Sotomayor legal doctrine, Preemptive Due Process. We are allowed to sue just in case.<p>We have the Savage-Sotomayor legal doctrine, Preemptive Due Process.</p>
<p>We are allowed to sue just in case.</p>Bill commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14b7cf8970b2011-01-05T17:35:58Z2011-01-05T17:35:58ZBillEverybody is a bit off-base here -- Zicam, the court briefers, and the interesting repliers above. In randomized controlled medical...<p>Everybody is a bit off-base here -- Zicam, the court briefers, and the interesting repliers above.</p>
<p>In randomized controlled medical trials, some differences between the treatment and control groups will be due to chance variation. Tests of statistical significance are indeed required to tell us the odds -- given the size of randomized groups -- any differences might be due to just ordinary chance variation between the groups.</p>
<p>After that essential first step in the data analysis of true experiments, if a relationship is statistically significant (i.e., unlikely to be due to chance), we must then ask "so what?" and interpret its substantive, practical importance.</p>
<p>Actually, a trickier interpretation problem requires incorporating SAMPLE SIZE, something that has not been mentioned here. If the randomized groups were small (as is usually the case with preliminary medical studies), it is always going to be difficult to detect (via st. sig. tests) relatively rare side effects and be confident they are not due to chance. With larger groups, it becomes easier.</p>
<p>Detecting true side effects in medical studies is especially challenging due to high reactivity from both placebo effects (the powerful way people tend to assume medicine works) and misattribution (hypochondria as well as ordinary physical ups and downs that patients erroneously blame on the placebo or test drug). As a result, researchers cannot take all posttest claims at face value. Some people in both groups will always have headaches, upset stomachs, cramps, and so forth.</p>
<p>Anyway, statistical significance tests are necessary tools but the results must still be interpreted in light of the sample size and put into context.</p>
<p>To make "sense" out of this case, among other things, it would be crucial to know the size of the Zicam groups and how many in each group said they suffered from scent loss.</p>don commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14b767c970b2011-01-05T17:31:22Z2011-01-05T17:31:22Zdon"Matrixx, the maker of Zicam, claims that the adverse effect (loss of smell) was not in clinical trials "statistically significant"...<p>"Matrixx, the maker of Zicam, claims that the adverse effect (loss of smell) was not in clinical trials "statistically significant" at the 5% level of statistical significance and thus they claim, incorrectly it turns out, that the adverse effect was not "important" to human health nor to shareholders' wealth."</p>
<p>Yeah, the old type I error, type II error thing. Is the burden of proof to be borne by those who say the adverse effect exists, or by those who say it doesn't? Is the burden reversed when we move from the decision as to whether to allow the drug to be sold, to the decision as to whether those who sold the drug can be held to account for the damage? </p>
<p>I greatly enjoyed McCloskey's old paper (in the AER) on the distinction between practical and statistical significance. If other economic papers were as well written, it would make the study of economics much more pleasurable.</p>Vince commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14b6738970b2011-01-05T17:20:00Z2011-01-05T17:20:00ZVinceScience advances one funeral at a time.<p>Science advances one funeral at a time.</p>Vince commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14b638e970b2011-01-05T17:17:03Z2011-01-05T17:17:03ZVinceYes! Thanks Steve for mentioning Bayes. The real question is not whether X is "statistically significant" from zero (the 5%...<p>Yes! Thanks Steve for mentioning Bayes. </p>
<p>The real question is not whether X is "statistically significant" from zero (the 5% threshold is a holdover for when statistical distributions were looked up on tables in books, we've long been able to calculate exact p-values).</p>
<p>The real question is "what is the probability of X occurring, given the observed data?", which is exactly what Bayes offers.</p>
<p>Then of course the discussion will be around what are acceptable probabilities, which will lead to cost-benefit analysis versus our constitutionally guaranteed rights. But that is exactly the discussion we SHOULD be having, not about some arbitrary cut-off for statistical significance.</p>
<p>Which, by the way, also has impact on the FDA approval of new drugs. New drugs have to show they are statistically better than existing medications, but with large samples, even the smallest improvements can be statically significant, but offer no practical significance. A Bayesian result of "what is the probability this new medication is better than existing ones" may be only 50.01% regardless of how much data is collected, or not much better than a coin flip.</p>
<p>Given the implications this case could have on the pharmaceutical industry, I fully expect the Supreme Court to decide for the defendants. Status quo, here we go!</p>Matt Young commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c754d942970c2011-01-05T17:00:28Z2011-01-05T17:00:28ZMatt YoungSo we have no significant reason to believe the drug caused the problem, no measurement we can produce that ties...<p>So we have no significant reason to believe the drug caused the problem, no measurement we can produce that ties the drug to the loss of smell.</p>
<p>Whoops!</p>
<p>"which evidently causes some users to permanently lose their sense of smell. "</p>
<p>What evidence does he have? None, he just pointed out he had no evidence.</p>
<p>This guy needs some Bayes, or aspirin or something.</p>ScentOfViolets commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c754b779970c2011-01-05T16:36:44Z2011-01-05T16:36:44ZScentOfVioletsEr, that first paragraph up there isn't mine, it's a quote. I guess blockquote isn't an accepted tag.<p>Er, that first paragraph up there isn't mine, it's a quote. I guess blockquote isn't an accepted tag.</p>ScentOfViolets commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c754b577970c2011-01-05T16:35:08Z2011-01-05T16:35:08ZScentOfVioletsFor instance, suppose we assume that a coin is unbiased (null hypothesis), yet when we flip it we see 10...<p>For instance, suppose we assume that a coin is unbiased (null hypothesis), yet when we flip it we see 10 heads in a row. We may then decide to reject the null hypothesis. But maybe that result was a fluke, and the coin really is unbiased, as more flips would reveal. Similarly, we might not reject the null hypothesis when more data would reveal that we should.</p>
<p>Usually - well at least the way I usually teach this - is that we then look at the probability of Type I errors vs Type II errors and look at their significance. For a study like this, I'd have no problem with using the numbers from those results in wagering a bar bet. The odds seem pretty good, relatively speaking, and if I lose, well, I lose only the price of a drink. Making a wager using those same numbers with someone's life hanging in the balance? No way.</p>
<p>I suspect that most of these people learn their statistical methodology in classes like mine, which gives them a rather skewed view of how applicable the numbers are. The example I like to use are the various sampling techniques when conducting a poll. As I tell my class, the hard part isn't reducing the data to a few singular numbers to get, say, Dewey defeats Truman. The hard part is collecting the data :-)</p>Min commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c754a6af970c2011-01-05T16:24:32Z2011-01-05T16:24:32ZMinIn my explanation of statistical significance, I should has said that the test looks not only at the data, but...<p>In my explanation of statistical significance, I should has said that the test looks not only at the data, but at even more unlikely hypothetical data, to determine the level of significance. If that confuses you, you have company. ;)</p>mark commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c754a055970c2011-01-05T16:19:24Z2011-01-05T16:19:24ZmarkFascinating, and thanks for publishing. Definitely endorse the idea that statistical significance is "neither necessary nor sufficient" for any action...<p>Fascinating, and thanks for publishing. Definitely endorse the idea that statistical significance is "neither necessary nor sufficient" for any action to be taken. That should include publishing research too... That phrase should be required disclosure on every policy think tank publication and on every journalist who reports on science and social sciences.</p>
<p>But don't agree that "billions of lives are at stake"!</p>
<p>Also, there is a bit of the pundit's one-sidedness in his argument - it's all about the flaws in statistical significance but there are huge problems with the alternatives he references (but does not critique), most notably the inability to quantify hedonic functions and other incommensurability problems.</p>im1dc commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14b1014970b2011-01-05T16:15:24Z2011-01-10T13:51:42Zim1dchttp://profile.typepad.com/im1dcAn even bigger problem for scientists, imo, before statistical aberration and anomaly, are the ASSUMPTIONS hidden within their premises, hypotheses...<p> An even bigger problem for scientists, imo, before statistical aberration and anomaly, are the ASSUMPTIONS hidden within their premises, hypotheses and theories beginning with subject selection bias:</p>
<p><a href="http://www.foreignpolicy.com/articles/2011/01/02/ideas_weird_science" rel="nofollow">http://www.foreignpolicy.com/articles/2011/01/02/ideas_weird_science</a></p>
<p>Weird Science</p>
<p>"Most of what we know about how the world thinks comes from research on a handful of American undergrads."<br />
<br />
BY JOSHUA E. KEATING ... JANUARY/FEBRUARY 2011</p>
<p>"If you're reading this, there's a pretty good chance that you're one of the weirdest people on Earth. Don't be insulted, though -- most of your friends and acquaintances probably are too. Recent psychological research suggests that people from Western, educated, industrialized, rich, and democratic societies -- WEIRD, for short -- not only live differently from the vast majority of the world's population, but think differently too. </p>
<p>The unsettling realization for psychologists -- the vast majority of whom are WEIRD themselves -- is that they don't actually know much about how the rest of the world thinks. A recent study by Joseph Henrich, Steven Heine, and Ara Norenzayan at the University of British Columbia notes that in the top international journals in six fields of psychology from 2003 to 2007, 68 percent of subjects came from the United States -- and a whopping 96 percent from Western, industrialized countries. In one journal, 67 percent of American subjects and 80 percent of non-American subjects were undergraduates in psychology courses.</p>
<p>Does this really matter? Aren't we all the same, after all? Not really, it turns out. WEIRDos tend to be more individualistic and more competitive than people from non-industrialized Asian and African societies. In tests measuring how groups of people work together, Westerners -- and Americans in particular -- are far more likely to look out for themselves. They even perceive space differently. When viewing the classic Müller-Lyer illusion (>--), Americans are far more often and more easily fooled than Africans, possibly as a consequence of living in a world of concrete square angles rather than natural shapes.</p>
<p>But while undergrads in Wisconsin clearly perceive the world differently from Masai warriors or Vietnamese farmers, psychologists continue to make sweeping generalizations about how the human mind works. For example, the notion that choice is unquestionably a good thing is a very American idea, says Heine, "yet psychologists have traditionally assumed that everyone wants to go to Starbucks and choose one of 10,000 different permutations of coffee."</p>
<p>Of course, thanks to globalization, even poor countries are now getting exposure to the WEIRD world's culture of individualism (and getting their own Starbucks), but Heine doesn't see the gap narrowing. "American society is becoming more individualistic as well," he says. So don't worry, America: Your WEIRDness is still in a class of its own."<br />
</p>Min commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7549716970c2011-01-05T16:12:36Z2011-01-05T16:12:36ZMin"Matrixx, the maker of Zicam, claims that the adverse effect (loss of smell) was not in clinical trials "statistically significant"...<p>"Matrixx, the maker of Zicam, claims that the adverse effect (loss of smell) was not in clinical trials "statistically significant" at the 5% level of statistical significance and thus they claim, incorrectly it turns out, that the adverse effect was not "important" to human health nor to shareholders' wealth."</p>
<p>As a Bayesian, I do not hold much truck with significance tests. However, the reality of an effect is much harder to establish than reaching the 5% level of significance. So if an effect is unproven by standard statistics at the 5% level, it is unproven by Bayesian methods, as well.</p>
<p>Thanks to the good ole internet, I quickly found this reference: <a href="http://www.george-eby-research.com/anosmia/anosmia.html" rel="nofollow">http://www.george-eby-research.com/anosmia/anosmia.html</a><br />
:)</p>
<p>It really seems that Matrix did not do their homework. The fact that zinc can damage olfactory receptors was established in 1938!!! (Zinc sulphate was thought to help prevent polio.) Furthermore, pain(!) -- let me repeat that, PAIN -- indicates damage to cells in the nose, including olfactory receptors. The nose should be irrigated immediately, to prevent further damage. Furthermore, 15% -- more than one in seven -- children in the 1938 report experienced anosmia after receiving the anti-polio treatment. Permanent anosmia was rare.</p>
<p>Let's review, for the lay reader, what a test of significance means. We start with the hypothesis of no effect (called the null hypothesis) and try to disprove it, statistically. We assume that the null hypothesis is true, and if the data are sufficiently improbable, given that assumption, we are willing to act as though there is an effect. For instance, suppose we assume that a coin is unbiased (null hypothesis), yet when we flip it we see 10 heads in a row. We may then decide to reject the null hypothesis. But maybe that result was a fluke, and the coin really is unbiased, as more flips would reveal. Similarly, we might not reject the null hypothesis when more data would reveal that we should.</p>
<p>Note that the best evidence might not be statistical. If we examined the coin and found that it had two heads, we would not have to flip it. Even in lesser cases of bias, direct examination of the coin may provide best evidence. </p>
<p>In the case of Zicam, the fact that zinc can damage cells as well as viruses is clear, right? Given that, ***any*** incidence of pain and loss of smell in the clinical trials, no matter how rare, is enough to require a warning. (I remember when a report of tingling -- not even pain -- was enough for the FDA to pull Thalidomide from the market.) Apparently, ***permanent*** anosmia was rare enough not to meet the test of statistical significance, and the company relied upon that. </p>
<p>The 5% level of significance is a low hurdle, but it seems plain that the reason that the effect was not statistically significant is that not enough data were collected.</p>
<p>And what kind of "clinical" trials did Matrixx conduct? A hefty percentage of their subjects experienced pain and at least temporary loss of smell. What clinician would ignore that data? </p>
<p>I am surprised that Matrixx did not settle quickly. </p>ScentOfViolets commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14afb3a970b2011-01-05T16:01:03Z2011-01-05T16:01:03ZScentOfVioletsSpeaking as someone who's taught a bit of stat himself (college undergraduate) for several years, I couldn't disagree with Antonio...<p>Speaking as someone who's taught a bit of stat himself (college undergraduate) for several years, I couldn't disagree with Antonio more.</p>
<p>Yeah, I know the entire Bayesian schtick forwards and backwards. The problem here is that those utility functions and priors are just formalizing the usual p-test (the way it should be done, not the way it's usually done, sigh) logic - and are just as arbitrary and prone to subjective judgment as anything else in statistics.</p>
<p>About the only good thing I can say for Bayesian analysis at this point is at least they acknowledge the existence and significance of Type I and Type II errors and their differential impact.</p>
<p>Anyone else know a certain discipline where models are confidently trotted out, numbers plugged in . . . and totally wrong predictions come out?</p>Bill Jefferys commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c75454f9970c2011-01-05T15:24:59Z2011-01-05T15:24:59ZBill Jefferyshttp://bayesrules.net@Antonio: Well stated, I could not have said it better.<p>@Antonio: Well stated, I could not have said it better.</p>Bill Jefferys commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c75451f9970c2011-01-05T15:22:31Z2011-01-05T15:22:31ZBill Jefferyshttp://bayesrules.netIoannides: "Why Most Published Research Findings Are False" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/ Wagenmakers et. al.: "Why Psychologists Must Change the Way They Analyze...<p>Ioannides: "Why Most Published Research Findings Are False"</p>
<p><a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/" rel="nofollow">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/</a></p>
<p>Wagenmakers et. al.: "Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi". (While specialized to psychology, this paper has wider implications for the use of p-values and significance tests)</p>
<p><a href="http://www.ruudwetzels.com/articles/Wagenmakersetal_subm.pdf" rel="nofollow">http://www.ruudwetzels.com/articles/Wagenmakersetal_subm.pdf</a><br />
</p>Antonio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14ac08c970b2011-01-05T15:19:49Z2011-01-05T15:19:49ZAntoniohttp://omwo.blogspot.comWhy do people keep either using arbitrary paleo-statistical thinking or re-inventing the wheel? All of this is handled automatically by...<p>Why do people keep either using arbitrary paleo-statistical thinking or re-inventing the wheel? All of this is handled automatically by Bayesian probability. You cannot decide to dispose of a hypothesis without considering both the alternatives and the value you attribute to each outcome. This is standard Bayesian decision theory. Without considering a set of hypothesis you can't calculate meaningful probabilities, and without a utility function you cannot calculate meaningful decisions. Amazing how denial still subsists in both statistics and economics departments in this regard. I can teach a first year student bayesian probabilities and he will learn it faster than classical probabilities and find it far more natural, and he will never, ever, fall into this nonsense of arbitrary significance levels, or arbitrary estimators for that matter.</p>
<p>Muddled thinking has a way of being tenacious, though...arbitrary estimators and methods are a great way to publish papers, and a great way to make an arbitrary assumption or decision take the status of "scientifically justified".<br />
</p>Sandwichman commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20147e14a4a7f970b2011-01-05T13:58:25Z2011-01-05T13:58:28ZSandwichmanhttp://profile.typepad.com/sandwichmanHooray for Ziliak and McCloskey! I can just see the Matrixx (irony much?) arguing after they lose the Supreme Court...<p>Hooray for Ziliak and McCloskey! I can just see the Matrixx (irony much?) arguing after they lose the Supreme Court case that the outcome was "not "statistically significant".</p>derangedlemur commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c753809d970c2011-01-05T12:56:30Z2011-01-05T12:56:30Zderangedlemurhttp://en.wikipedia.org/wiki/Lemur"The oral argument is scheduled for January 10, 2011" When's the nasal one?<p>"The oral argument is scheduled for January 10, 2011"</p>
<p>When's the nasal one?</p>Steve Bannister commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c7533cb6970c2011-01-05T12:09:33Z2011-01-05T12:09:33ZSteve BannisterBravo bravo bravo. Ziliak and McCloskey are obviously right. I have heard Deirdre argue the point convincingly. It doesn't make...<p>Bravo bravo bravo. Ziliak and McCloskey are obviously right. I have heard Deirdre argue the point convincingly. It doesn't make interpreting the results we get when we "push the button" any easier, but such is life. </p>
<p>On a technical point, Bayesian methods can offer some respite, and are now computationally tractable. </p>Ignacio commented on 'Does Statistical Significance Stink?'tag:typepad.com,2003:6a00d83451b33869e20148c752ac8e970c2011-01-05T10:09:26Z2011-01-05T10:09:26ZIgnacioThat cult to statistical significance smells like, or stinks like the cult to VAR (Value at Risk) models that recently...<p>That cult to statistical significance smells like, or stinks like the cult to VAR (Value at Risk) models that recently proved to be significantly stupid. </p>