Imagine a forensic technique so powerful that it can reliably identify the author of a document such as a will or a confession. Not only that, but it can detect if even a single paragraph has been surreptitiously added. It may sound far-fetched, but this is exactly what the Reverend Andrew Morton, a researcher in linguistic analysis at the University of Glasgow, claims for a procedure called 'cusum' - a contraction of cumulative sum.
When it is used on a document, Morton says cusum identifies 'consistent habits' in the author's use of language. Deviations from this profile can constitute evidence of tampering. Linguistic techniques have been used to investigate instances of disputed authorship, such as Jacobean plays, for decades. However they use large samples of text and are backed by extensive statistical testing.
In the past few years, Morton's technique has been enthusiastically used by defence lawyers in Britain seeking to cast doubt over confessions allegedly made by their clients. But now, questions have been raised about the validity of cusum. This month, the Crown Prosecution Service will receive a report based on seven independent studies of the technique. Compiled by David Canter, professor of psychology at the University of Surrey, it raises serious doubts over cusum as a forensic technique.
The concern over cusum goes much deeper, however. It raises questions about how juries and the judiciary can be expected to assess complex scientific evidence in court.
Morton, who strenuously defends the technique, has been involved in a number of high-profile cases. He was consulted, for example, over whether the police could be prosecuted after the convictions of the Birmingham Six were quashed. That case is now in abeyance. His greatest triumph came in July 1991 when a man convicted of armed robbery and given a 12-year sentence was freed by the Court of Appeal. Cusum evidence, which cast doubt over the authenticity of the defendant's confession, played a vital part in persuading the appeal judges that he had been wrongly convicted. 'If that evidence had been before the jury we feel that the jury may well have been swayed,' said Lord Justice Taylor, who is now the Lord Chief Justice.
In court, Morton said cusum had been applied to the statements of 2000 people and found to be valid in every case. Yet not a single refereed academic paper has been published giving detailed statistical support for this degree of reliability. Morton says this is because no journal could be expected to carry such details.
There is no shortage, however, of peer-reviewed research reaching altogether different conclusions about the reliability of cusum. Over the past few months three papers have been published in respected journals all criticising the technique.
According to Morton, the rate at which people use certain types of words constitutes 'consistent habits' that can identify the speaker or author. Such linguistic clues include the number of two and three-letter words, nouns or words beginning with vowels, that people use in every sentence.
Morton generates a chart which, he says, reflects the writer's use of language. The chart plots, for example, how a writer's use of short words and sentence lengths varies in a text. One line represents each habit, and disparities between the lines can reveal evidence of multiple authorship, says Morton . Critics have focused on two key questions: why should cusum work and what evidence is there that it does work?
Morton's claim that cusum can detect as few as ten sentences inserted into a text has raised worries over the way normal variation in the way people write and speak, would create statistical 'noise' in the technique's findings. In the current issue of Literary and Linguistic Computing, Michael Hilton, professor of computer science at the University of South Carolina and David Holmes of the University of the West of England, who is a specialist in the quantitative analysis of language, show that the 'consistent habits' used by cusum can be overwhelmed by this noise. 'This is just not a valid technique for authorship detection,' says Hilton.
Others, such as Tony Hardcastle, a former Home Office scientist who works for Document Evidence, a forensic science consultancy in Birmingham, want to know why Morton concentrates only on 'habits' such as two or three-letter words. 'Morton has advanced no theoretical basis as to why the features he examined should be consistent in a person's utterances, or why they should be characteristic of a person,' he says.
Morton says he chose the habits after research spanning 30 years. 'We had pretty good hints from previous work on what was likely to work, and carried out lots of tests on them,' he says. What these tests were is not clear.
Canter, who is a specialist in forensic psychology, says no aspect of human behaviour - even measurements of intelligence - has shown such a high level of consistency as that claimed for cusum.
Other questions have been raised over the way cusum findings are presented to juries. For years, scientists have used standard statistical techniques to measure the significance of their results or the degree of correlation between two variables. Without such objective tests, data analysis becomes a free-for-all: a researcher could claim to have found something 'significant', only to have another dismiss it as a fluke.
Morton dismisses such techniques as a 'red herring'. 'A grasp of probability theory is not something you can expect of a judge, a lawyer or a layman,' he says. In court, Morton uses the cusum charts - graphs that are easy for juries to understand. It is an approach that concerns some legal experts. 'It is very difficult to cross-examine a witness when the similarities and dissimilarities are displayed, frequently magnified, for all, including the jury, to see,' says David Carson, editor of the British forensic journal, Expert Evidence.
A study by Hardcastle in the latest issue of the Journal of the Forensic Science Society crystallises these concerns. He applies cusum technique to texts written by known authors, including some that had been analysed by Morton. Hardcastle discovered that there was considerable scope for interpretation. 'Where Morton produced charts with compressed scales, a consistency was often identified,' says Hardcastle. 'On the other hand, where he produced charts with expanded scales, differences tended to be reported.'
Hardcastle condemns the 'deplorable lack of scientific rigour' used in cusum, and concludes that the technique 'is ill-defined'. Morton insists that detailed statistical analysis would go over the heads of most people.
But when standard statistical techniques are applied to cusum charts, the results are far from encouraging. In Expert Evidence (No 3, p 93, 1992), Canter analyses more than 100 cusum charts using the 'Spearman rank correlation test'. This provides a quantitative assessment of the agreement between the two lines on a cusum chart. If the relationship between these habits is more than coincidence, then a Spearman test on charts from the same author will produce a result close to 1.0. If the two lines show no similarities, the result will be zero.
In psychological research, a finding below 0.9 is commonly held to be unreliable. The cusum charts failed to reach this value in almost half the texts tested. 'This means in virtually half the cases, cusum analysts would have mistakenly reached the conclusion of multiple authorship,' says Canter.
In further tests on 130 passages by known authors that had been deliberately doctored, Canter found that 65 per cent gave correlation values above 0.9. 'In other words, the cusum comparisons would have mistakenly supported the claim for single authorship in two out of three cases.' Like Hardcastle, Canter is in no doubt about cusum analysis. 'It fails at the level of theory, it fails at the level of practice - it doesn't work.' Nonetheless, Morton says that a forthcoming paper for the Royal Statistical Society will answer his critics.
He also has his supporters. Michael Farringdon, a computer scientist at the University College of Swansea's European Business Management School, and a co-author with Morton on cusum, says Canter's statistical arguments are suspect. He adds that his statistical analysis confirms that the subjective interpretation of cusum results are reliable. But his analysis remains unpublished. Farringdon's wife, Jillian, who is a researcher in linguistics, says she will publish new cusum findings in The Swansea Review and D H Lawrence Review this year.
Critics of cusum remain concerned over the ease with which it has been accepted by courts. Last month, the Royal Commission on Criminal Justice rejected the idea of setting up an accreditation scheme for expert witnesses. It argued instead that courts should be guided by a witness's professional qualifications and affiliations.
In the US, the judicial system has dealt with new scientific techniques by subjecting them to the 'Frye test', which permits only testimony based on 'generally accepted' scientific techniques to be put before a jury. Academic opposition to cusum would almost certainly have prevented it passing the Frye test, says Carson.
But in June, the US Supreme Court voted to abandon the Frye test. From now on, expert testimony will not have to undergo the rigours of peer review, and the decision of whether something is 'scientific' will be left to judges.
Faced with a scientific controversy as complex as that now raging over cusum can lawyers, judges and juries be expected to decide between the scientific and the specious?
Robert Matthews is science correspondent of The Sunday Telegraph.
All comments should respect the New Scientist House Rules. If you think a particular comment breaks these rules then please use the "Report" link in that comment to report it to us.
If you are having a technical problem posting a comment, please contact technical support.