Peter A. Lawrence of the Department of Zoology, University of Cambridge, and the MRC Laboratory of Molecular Biology, Cambridge has written a beautifully argued article, The Mismeasurement of Science. It appeared in Current Biology, August 7, 2007: 17 (15), r583. [Download pdf]
It should be read by every scientist. Even more importantly, it should be read by every vice chancellor and university president, by every HR person and by every one of the legion of inactive scientists who, increasingly, tell active scientists what to do.
Here are some quotations.
“The use of both the H-index and impact factors to evaluate scientists has increased unethical behaviour: it rewards those who gatecrash their names on to author lists. This is very common, even standard, with many people authoring papers whose contents they are largely a stranger to.”
“. . . trying to meet the measures involves changing research strategy: risks should not be taken . . .”
“. . . hype your work, slice the findings up as much as possible (four papers good, two papers bad), compress the results (most top journals have little space, a typical Nature letter now has the density of a black hole), simplify your conclusions but complexify the material (more difficult for reviewers to fault it!), . . . .it has become profitable to ignore or hide results that do not fit with the story being sold — a mix of evidence tends to make a paper look messy and lower its appeal.”
“These measures are pushing people into having larger groups. It is a simple matter of arithmetic. Since the group leader authors all the papers, the more people, the more papers. If a larger proportion of young scientists in a larger group fail, as I suspect, this is not recorded. And because no account is taken of wasted lives and broken dreams, these failures do not make a group leader look less productive.”
“It is time to help the pendulum of power swing back to favour the person who actually works at the bench and tries to discover things.”
The position of women
Lawrence argues eloquently a point that I too have been advocating for years. It is well known that, in spite of an increased proportion of women entering biomedical research as students, there has been little, if any, increase in the representation of women at the top. This causes much hand-wringing among university bureaucrats, who fail to notice that one reason for it is the very policies that they themselves advocate. Women, I suspect, are less willing to embrace the semi-dishonest means that are needed to advance in science. As Lawrence puts it
“Gentle people of both sexes vote with their feet and leave a profession that they, correctly, perceive to discriminate against them [17]. Not only do we lose many original researchers, I think science would flourish more in an understanding and empathetic workplace.”
The success of the LMB
It is interesting that Peter Lawrence is associated with the Laboratory for Molecular Biology, one of the most successful labs of all time. In an account of the life of Max Perutz, Danielle Rhodes said this.
“As evidenced by the success of the LMB, Max had the knack of picking extraordinary talent. But he also had the vision of creating a working environment where talented people were left alone to pursue their ideas. This philosophy lives on in the LMB and has been adopted by other research institutes as well. Max insisted that young scientists should be given full responsibility and credit for their work. There was to be no hierarchy, and everybody from the kitchen ladies to the director were on first-name terms. The groups were and still are small, and senior scientists work at the bench. Although I never worked with Max directly, I had the great privilege of sharing a laboratory with him for many years. The slight irritation of forever being taken to be his secretary when answering the telephone—the fate of females—was amply repaid by being able to watch him work and to talk with him. He would come into the laboratory in the morning, put on his lab-coat and proceed to do his experiments. He did everything himself, from making up solutions, to using the spectrophotometer and growing crystals. Max led by example and carried out his own experiments well into his 80s.”
Max Perutz himself, in a history of the LMB said
“Experience had taught me that laboratories often fail because their scientists never talk to each other. To stimulate the exchange of ideas, we built a canteen where people can chat at morning coffee, lunch and tea. It was managed for over twenty years by my wife, Gisela, who saw to it that the food was good and that it was a place where people would make friends. Scientific instruments were to be shared, rather than being jealously guarded as people’s private property; this saved money and also forced people to talk to each other. When funds ran short during the building of the lab, I suggested that money could be saved by leaving all doors without locks to symbolise the absence of secrets.”
That is how to get good science.
Now download a copy of Lawrence’s paper and send it to every bureaucrat in your university.
Follow up
- The Times Higher Education Supplement, 10 Aug 2007. had a feature on this paper. Read it here if you have a subscription, or download a copy.
- In the same issue, Denis Noble and Sir Philip Cohen emphasise the importance of basic research. Cohen says
“In 1994, after 25 years in the relative research wilderness, the whole thing changed.
“Suddenly I was the best thing since sliced bread,” Sir Philip said. “We set up the Division of Signal Transduction Therapy, which is the largest-ever collaboration between the pharmaceutical industry and academia in the UK.”
But the present research funding culture could prevent similar discoveries. “In today’s climate that research would not have been funded,” Sir Philip said. “The space programme hasn’t allowed us to colonise the universe, but it has given us the internet – a big payoff that industry could never have envisaged.” (Download a copy.)
- Comments from Pennsylvania at http://other95.blogspot.com
- How to slow down science. Another reference to Lawrence’s paper from a US (but otherwise anonymous) blog, BayBlab.
How to select candidates
I have, at various times, been asked how I would select candidates for a job, if not by counting papers and impact factors. This is a slightly modified version of a comment that I left on a blog, which describes roughly what I’d advocate
After a pilot study the entire Research Excellence Framework (which attempts to assess the quality of research in every UK university) made the following statement.
“No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs”
It seems that the REF is paying attention to the science not to bibliometricians.
It has been the practice at UCL to ask people to nominate their best papers (2 -4 papers depending on age). We then read the papers and asked candidates hard questions about them (not least about the methods section). It’s a method that I learned a long time ago from Stephen Heinemann, a senior scientist at the Salk Institute. It’s often been surprising to learn how little some candidates know about the contents of papers which they themselves select as their best. One aim of this is to find out how much the candidate understands the principles of what they are doing, as opposed to following a recipe.
Of course we also seek the opinions of people who know the work, and preferably know the person. Written references have suffered so much from ‘grade inflation’ that they are often worthless, but a talk on the telephone to someone that knows both the work, and the candidate, can be useful, That, however, is now banned by HR who seem to feel that any knowledge of the candidate’s ability would lead to bias.
It is not true that use of metrics is universal and thank heavens for that. There are alternatives and we use them.
Incidentally, the reason that I have described the Queen Mary procedures as insane, brainless and dimwitted is because their aim to increase their ratings is likely to be frustrated. No person in their right mind would want to work for a place that treats its employees like that, if they had any other option. And it is very odd that their attempt to improve their REF rating uses criteria that have been explicitly ruled out by the REF. You can’t get more brainless than that.
This discussion has been interesting to me, if only because it shows how little bibliometricians understand how to get good science.
I think we urgently need the funding rules changed so that Postdocs can hold their own grants. In the publish as much as you can world group leaders can often only survive by taking their postdoc’s and PhD student’s ideas and making them their own.
I took a research idea to a group leader (the best boss I ever had btw). It was my idea and not an area exactly congruent with the boss’s past. We eventually got it funded by the BBSRC but the best I could be on MY idea was named researcher and we had to justify my inclusion on my own grant (it wasn’t my first postdoc).
Now in the age of the RAE one of the catch 22s of getting your own lab is that you must be able to demonstrate an ability to attract funding. How do I do that if I cannot hold my own grants?
I actually got a letter published in the Guardian on this very subjet once, but nothing has changed. Talent and creativity will leave science when they realise they have no chance of a career and their minds are being plundered. I have left science as I cannot get another postdoc, punished for for not achieving the unachievable.
I am also a basic researcher, applied research does not interest me. In today’s world basic research is being squeezed so hard only established people have any chance of funding. So if anyone wants their muscle mutants analysed they can whistle.
In the US we seem to be a little behind the RAE spirit but we are clearly heading in that direction. The question to entertain, from my perspective, is “What can we give the beancounters as an alternative?”. (Let us recognize that in terms of career progress, the “beancounters” are quite often ourselves, meaning our chairs and division heads and grant reviewers.)
You suggest evaluation should be put back in the hands of the experts. Who? If you are really interested in gender issues, you must be familiar with issues of bias. I find people in unrelated (sub)fields at one’s own Institute or Department copping out with an excuse that “I cannot evaluate your science so I have to rely on objective measures”.
Everyone is motivated to come up with metrics that make their own career look good. It ends up in a circular process with those who are successful by currently popular measures defining what the measure should be. So thank you for speaking up…
Ah well, just to liven up the discussion, I think I’ll say a word in defense of impact factors. Here in France an economics magazine has just published the results of a bibliometric study of biomedical research to try and compare different towns, institutions, research units and indeed researchers across bureaucratic borders (which for us are the CNRS, INSERM, and universities).
One of the interesting things to come out of this publication was a ranking of different research units within the same discipline. Now for labs that I know – both good and bad – I found the bibliometric classification to be refreshingly accurate. And it was all done automatically using some kind of objective data.
Now I contrast this with what I have seen of an official CNRS evaluation, which as far as I can tell consisted of asking the people in the research unit to say how wonderful they were, and then putting it into al into a lovely report. As DC points out, there is no substitute for reading the paper and drawing your own conclusions as to its value. But if, as I suspect, the people on the evaluation committees don’t bother to do that, they end up relying on other sources of information. Now I ask you, which is likely to be a more accurate assessment of a group’s performance; their own opinion of how good they are, or their publication record over the past five years?
However, that being said, application of bibliometric assessment to individual scientists, ostensibly to identify the best biomedical researchers in France, showed up some of the limitations of the approach. Heavy weighting was given to position on the list of authors, so only last authors existed more or less, and this rewarded of course many of the sharp practices referred to in the Lawrence article (senior scientists imposing themselves as last authors on papers that were conceived, performed and written up by post-docs and graduate students comes immediately to mind).
So overall I say yes to bibliometric evaluation of research structures, but not for individuals.
DMcIlroy
I can’t say I’m convinced. The institutional assessment is just the aggregate result of assessing individuals by methods which you agree are inappropriate.
If that method were adopted, the Institution would pressurise its members to behave in precisely the same bad ways as Lawrence has described, in order to increase the aggregate score (see also http://www.dcscience.net/goodscience/?p=4 )
Just to give the link to the data, the page with links to the “write-up” of the study is
http://www.lesechos.fr/info/france/300189884.htm
Each chapter on this page gives you a pdf with either the objectives of the study, an explanation of the methodology, or a bunch of tables and graphs with the different levels of organisation under comparison.
Chapters IX-XII are where the results are. For research units, for example, there are tables by research area, and the research units in the top and bottom categories are listed. Anyone can have a look at their own
research area, and see whether the research units at the top end correspond to places where good quality work is being done.
In terms of methodology, I think there was an effort made to use impact factors in an intelligent way. That is, rather than just add up the numerical value of IFs, journals were classified into three groups; superb, excellent, and also-rans, where superb is
Nature/Science/Nature Medicine/NEJM, and excellent was supposed to represent the top journals in each particular discipline. Again, looking at the way the journals were classified, it seemed to me that the most
widely read and regarded journals (in immunology) had been correctly identified. There was one howler though. Nature Immunology was somehow classified as a “superb” journal, so anyone who happened to have published there automatically became a “superb” researcher, even though I would say that the overall level of publications in Nat Immunol is equivalent to,
say Nature Cell Biology, or Nature Genetics.
Well, I suppose you might object that the whole exercise is fundamentally unsound, as it doesn’t necessarily follow that publication record really reflects more lasting scientific impact. To which I would reply that it is very unlikely that any “real” scientific output is going to come out of
the research units that aren’t producing any “surrogate markers” in the form of publications.
It is important to identify these places, just to stop giving them resources in the form of technicians and researchers.
Having some kind of measure to compare sites that is at least independent
of cronyism and regional politics is a good thing in my view. Otherwise how is one to know whether scientific research is better in UCL than in Teesside Uni, or vice versa?
I think you could very easily distinguish UCL from less research intensive places on the basis of peer-reviewed grants (or just ask anyone), Furthermore, people in Teesside (to take your example) should not be punished en masse. A grant application from there should have the same chance as one from UCL. It is people who do research, not departments or universities.
The lack of any detectable correlation between citations and impact factor is enough, alone to dispose of any idea that you can classify research as “superb” on the basis of where it is published. And citations are almost equally unreliable as an indicator, as shown by the real life examples at http://www.dcscience.net/goodscience/?p=4
I remain unconvinced.
I agree, using grant income would be an easily quantifiable, objective assessment of research activity, that perhaps has more of a basis in reality than impact factors or citation scores. However, if you apply a “just ask anyone” approach then you’ll end up ranking everyone according to their reputation within the peer group. Or rather, their reputation as perceived by the people on some evaluation committee or other.
Now, if the evaluators were brilliant scientists themselves, that wouldn’t be so bad. But most people want to get on with their research rather than sit on committees, so I fear that the people you get doing the evaluation are beancounters and apparatchiks. Since I have little confidence in their subjective scientific judgement, I would rather have them base their decisions on at least some kind of objective measure.
I know you still won’t be convinced by bibliometric (sorry, bibliomic) measures, but what is the alternative? I read the linked article, but I didn’t find an answer.
I didn’t say grant income. The cost of research in different areas differs too much, and using income just encourages over-size groups with poor supervision and all the other problems that Lawrence points out. But the peer review process for getting a grant is probably the best we’ve got, as long as the funding body has people who are sufficiently well-informed to pick appropriate referees.
And of course, as you say, committees of apparatchiks are useless. Furthermore the grant criterion still won’t work for research in subjects like mathematics, which don’t cost anything. I notice, sadly, that even pure mathematicians are now coming under pressure to get grants. In their case the only way to assess their quality is to ask top mathematicians in their field, And the problem remains that the apparatchiks may not be sufficiently well-informed to pick the right referees.
I guess we are all familiar with the phenomenon of second rate research being praised by second rate referees, chosen by a second rate committee.
[…] government, and many vice-chancellors, are getting it wrong. Read also the wonderful essay. The Mismeasurement of Science, by Peter Lawrence (who is one of the signatories of the […]
[…] still put up with a system that seems, at times, not much different from slavery. See, for example, The Mismeasurement of Science by Peter A. […]
[…] This post was mentioned on Twitter by David Colquhoun, David Colquhoun. David Colquhoun said: @michael_nielsen: On the mismeasurement of science http://icio.us/2ck2af Another great piece with same title http://bit.ly/gfzfmq […]
[…] data. Such numbers can be reached only by unethical behaviour, as described by Peter Lawrence in The Mismeasurement of Science. Like so much managerialism, the rules provide an active encouragement to […]
[…] Western Blot. David Colquhoun tells of how he would weed out weak candidates at interview simply by asking them about papers on which they were co-authors. Often this direction was way off the path for the interviewee…who knew little of the content of […]
[…] Grimm’s death is, ultimately, the fault of the use of silly metrics to mismeasure people. If there were no impact factors, no REF, no absurd university rankings, and no […]
[…] should spend Christmas reading Peter Lawrence’s wonderful essay on The Mismeasurement of Science. Please download a […]
[…] This leads to exploitation of young scientists by older ones (who fail to set a good example). Peter Lawrence has set out the […]
[…] at best an addition to the historical record of the current sorry conception of scientific assessment. Let’s dive […]
[…] combat, since it’s unspoken. Once again, I’m reminded of Peter Lawrence’s essay, The Mismeasurement of Science. Speaking of the perverse incentives and over-competitiveness that has invaded academia, he […]