A PhD student from a Turkish university called to interview to be a visiting scholar for 6 months. Her dissertation was on a topic that was only indirectly related to our Lab's mission, but she really wanted to come and we had the room, so I said "Yes."He got a lot of negative responses, such as:When she arrived, I gave her a data set of a self-funded, failed study which had null results (it was a one month study in an all-you-can-eat Italian restaurant buffet where we had charged some people ½ as much as others). I said, "This cost us a lot of time and our own money to collect. There's got to be something here we can salvage because it's a cool (rich & unique) data set." I had three ideas for potential Plan B, C, & D directions (since Plan A had failed). I told her what the analyses should be and what the tables should look like. I then asked her if she wanted to do them.
Every day she came back with puzzling new results, and every day we would scratch our heads, ask "Why," and come up with another way to reanalyze the data with yet another set of plausible hypotheses. Eventually we started discovering solutions that held up regardless of how we pressure-tested them. I outlined the first paper, and she wrote it up, and every day for a month I told her how to rewrite it and she did. This happened with a second paper, and then a third paper (which was one that was based on her own discovery while digging through the data).
At about this same time, I had a second data set that I thought was really cool that I had offered up to one of my paid post-docs (again, the woman from Turkey was an unpaid visitor). In the same way this same post-doc had originally declined to analyze the buffet data because they weren't sure where it would be published, they also declined this second data set. They said it would have been a "side project" for them they didn't have the personal time to do it. Boundaries. I get it.
This is a great piece that perfectly sums up the perverse incentives that create bad science. I'd eat my hat if any of those findings could be reproduced in preregistered replication studies. The quality of the literature takes another hit, but at least your lab got 5 papers out.and:
This may sound snarky, but I am genuinely curious. How many of your other 500 publications resulted from similar data fishing expeditions?His use of bad statistics eventually did him in, and he had to resign from Cornell.You may be unfamiliar with the heightened risk of false positives when conducting multiple comparisons. I highly recommend Daniel Lakens' Coursera course, "Improving Your Statistical Inferences": https://www.coursera.org/learn/statistical-inferences
The sad part is that he was probably following practices that were considered acceptable in the field 20 years ago, and did not know that they were statistically bogus. Few scientists are trained in statistics.
A lot of people think that the whole point of learning fancy statistical tests is to extract conclusions from what seems like random data. Actually that rarely happens. If you have a scientifically valid result, it is almost always possible to find a way to show it in a graph, without using any fancy statistics.
I just post this as an example of how bad science gets done. He was succeeding in academia, and publishing in peer-reviewed journals. His peers were not statisticians either. Entire fields can be bogus, without papers getting rejected.
Another famous example is the rat dck paper. People complained about the hilarious AI-generated figures. I do not know whether the science was valid.
It is thought that the overpubliblication of statistically bogus results was only discovered in 2010 or so, but here is a 1990 paper that spells out the problem pretty clearly.
I once told my mother (who was also a mathematician):
ReplyDeleteIt is a well established fact that 100 percent of people who have ever eaten a vegetable have eventually died.
She was not amused.
She informed me there was a near 0 percent chance of me getting out of dining room alive without eating my green beans.
I was not amused.