Science

TexasTowelie

(112,150 posts) Tue Apr 21, 2015, 11:47 PM Apr 2015

Scientists Perturbed by Loss of Stat Tool to Sift Research Fudge from Fact

The journal Basic and Applied Social Psychology recently banned the use of p-values and other statistical methods to quantify uncertainty from significance in research results

Psychology researchers have recently found themselves engaged in a bout of statistical soul-searching. In apparently the first such move ever for a scientific journal the editors of Basic and Applied Social Psychology announced in a February editorial that researchers who submit studies for publication would not be allowed to use a common suite of statistical methods, including a controversial measure called the p-value.

These methods, referred to as null hypothesis significance testing, or NHST, are deeply embedded into the modern scientific research process, and some researchers have been left wondering where to turn. “The p-value is the most widely known statistic,” says biostatistician Jeff Leek of Johns Hopkins University. Leek has estimated that the p-value has been used at least three million scientific papers. Significance testing is so popular that, as the journal editorial itself acknowledges, there are no widely accepted alternative ways to quantify the uncertainty in research results—and uncertainty is crucial for estimating how well a study’s results generalize to the broader population.

Unfortunately, p-values are also widely misunderstood, often believed to furnish more information than they do. Many researchers have labored under the misbelief that the p-value gives the probability that their study’s results are just pure random chance. But statisticians say the p-value’s information is much more non-specific, and can interpreted only in the context of hypothetical alternative scenarios: The p-value summarizes how often results at least as extreme as those observed would show up if the study were repeated an infinite number of times when in fact only pure random chance were at work.

This means that the p-value is a statement about imaginary data in hypothetical study replications, not a statement about actual conclusions in any given study. Instead of being a “scientific lie detector” that can get at the truth of a particular scientific finding, the p-value is more of an “alternative reality machine” that lets researchers compare their results with what random chance would hypothetically produce. “What p-values do is address the wrong questions, and this has caused widespread confusion,” says psychologist Eric-Jan Wagenmakers at the University of Amsterdam.

Read more: http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tool-to-sift-research-fudge-from-fact/

5 replies

= new reply since forum marked as read

Highlight:

Scientists Perturbed by Loss of Stat Tool to Sift Research Fudge from Fact (Original Post) TexasTowelie Apr 2015 OP

Maybe a psychology degree should require more math credits than it currently does? drm604 Apr 2015 #1

Hypothesis testing is studied during the second semester of statistics TexasTowelie Apr 2015 #2

It's not just psychology, it's widely misused in all areas of science. bananas Apr 2015 #5

I think this journal (or the editor) is caught in a time warp.... Sancho Apr 2015 #3

This is ridiculous. StatGirl Apr 2015 #4

drm604

(16,230 posts)

1. Maybe a psychology degree should require more math credits than it currently does?

Reply to TexasTowelie (Original post)

Wed Apr 22, 2015, 12:27 AM

Apr 2015

I have no idea how much math is required for such a degree, but maybe it's not enough? If they don't understand one of their basic statistical tools then something's not right.

TexasTowelie

(112,150 posts)

2. Hypothesis testing is studied during the second semester of statistics

Reply to drm604 (Reply #1)

Wed Apr 22, 2015, 12:58 AM

Apr 2015

so most psychology students will have taken it during their undergraduate course load.

Here is an approximation of what it means in layman's terms, although I have a degree in math:

There is a phrase "correlation does not imply causation" which means that just because there is a likelihood of two events occurring does not mean that one event is dependent upon the other. In effect, this applies to the testing of the null hypothesis. It is possible for a high p-value (erroneously meaning a high probability) to occur when in fact there is no relation between the two events mentioned in the null hypothesis. The error is not necessarily related to the statistical aspect, but to the associations that are being compared under an erroneous premise between two separate events. The high p-value is sometimes erroneously used to substantiate that the association between the two events is not due to random chance.

bananas

(27,509 posts)

5. It's not just psychology, it's widely misused in all areas of science.

Reply to drm604 (Reply #1)

Thu Apr 23, 2015, 09:15 AM

Apr 2015

The article mentions gene researchers:

Not that psychology has a monopoly on publishing results that collapse on closer inspection. For example, gene-hunting researchers in large-scale genomic studies used to be plagued by too many false-alarm results that flagged unimportant genes. But since the field developed new statistical techniques and moved away from the automatic use of p-values, the reliability of results has improved, Leek says.

It's also misused by the FDA for drug evaluation:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC534930/

It is widely known that the Agency will conclude that a trial is considered “positive” if the p value generated for the between-treatment contrast is less than or equal to 0.05.

It's widely misused in all areas of science.

Sancho

(9,067 posts)

3. I think this journal (or the editor) is caught in a time warp....

Reply to TexasTowelie (Original post)

Wed Apr 22, 2015, 07:39 AM

Apr 2015

social scientists have been arguing about hypothesis testing, preset alpha, and p-levels for decades. In the early 80's, I heard these discussions in graduate classes, and those arguments were nothing new. I read articles about alternatives (Bayes, CI's, etc.) even back then. Is there any stat class in the last 50 years that didn't discuss Type I and Type II errors, power, and similar concerns? Students don't get it at first, but everyone is clearly taught the probability distributions are theoretical, so replication, etc. must confirm conclusions based on hypothesis testing.

Authors need to make the case that their study is valid and generalizable. Whatever tools and techniques they use is up to them if it gets the job done: hypothesis testing is just one avenue.

BASP seems to be a typical publisher based journal: http://www.tandfonline.com/action/journalInformation?show=readership&journalCode=hbas20#.VTeCbFy2afw

Many of these journals are not profitable any more because researchers can publish in on-line, open-access journals. No need for print and paper any more. Maybe this is a way to get some publicity? Judging from the title, I'm wondering what empirical research there is that is not either "basic or applied", so maybe it's a dated title. Not much else except an editorial or literature review (meta-analysis). Interesting.

At any rate, a typical article in the latest issue is "Should He Chitchat? The Benefits of Small Talk for Male Versus Female Negotiators". I don't see why that kind of study couldn't use hypothesis testing and reach useful conclusions.

Maybe the journal wants to go qualitative, but empirical studies will almost always end up with some statistics, even if they don't report probabilities. I would hope that most consumers of scholarly research would be familiar with the pros and cons of hypothesis testing. Banning the technique seems over reacting to an imaginary problem to me, and will likely reduce the number of quality submissions.

StatGirl

(518 posts)

4. This is ridiculous.

Reply to TexasTowelie (Original post)

Wed Apr 22, 2015, 12:55 PM

Apr 2015

If they refuse to quantify uncertainty, then basically anyone can draw any conclusions they like from the data.

I've worked with people who insist on seeing relationships when there is very clearly insufficient evidence for them, and it's aggravating. They project their own wishful thinking on randomness, and eventually it will bite them when their pet theory is repeatedly not replicated. But in the meantime, tons of money will have been wasted on useless research.

These are basically turning their journal into "The Bible Code". The fact is, humans are pattern-recognizing machines, and we will see patterns where none exist. It's what we do. Hence all the pictures of the face of Jesus in burnt toast.

Statistical procedures exist to protect us from this tendency.

Reply to this discussion