This is the third in our series of Bytes on this significant procurement case. The first Byte looked at record keeping in procurement. The second looked at the issue of the calling, preparation and payment of witnesses. In this Byte we look at how the court dealt with some of Energysolutions' allegations of manifest error in NDA's evaluation and scoring of their tender.
It is well established that a court may interfere with a contracting authority's evaluation of a tender (including by setting aside the authority's score and substituting its own) where that evaluation has been carried out in manifest error. The court exercises a supervisory jurisdiction in this respect and is not permitted to embark on a remarking exercise substituting its own views for those of the authority. Where (as in many cases) an evaluative judgment has been involved in arriving at a particular score, the authority enjoys a "margin of appreciation" and a mere difference of opinion is not sufficient to have the score set aside or changed. The court may only interfere where an error "has clearly been made" . The courts have recently stated that the test for "manifest error" is broadly comparable to the test of irrationality or Wednesbury unreasonableness in domestic judicial review. While a margin of appreciation exists in relation to evaluative judgments, contracting authorities enjoy no such margin in relation to the obligations of equal treatment and transparency.
The court's conclusions on manifest error
The judge in the NDA case carefully directed himself, over the course of some 30 paragraphs, on the legal test to be applied to the question of manifest error, and undertook a comprehensive review of the cases which have considered and defined that test. The court's detailed examination of NDA's alleged errors of evaluation takes up most of the lengthy judgment (573 out of 948 paragraphs). In relation to the NDA's evaluation of tenders the court concluded that:
- some of the scores given to RSS (the consortium bidder of which Energysolutions was part) and CFT (the winning bidder) were manifestly erroneous;
- the corrected scores given by the court meant that RSS in fact submitted the most economically advantageous tender and therefore should have won; and
- in any event the winning bidder should have been disqualified from the competition but had not been because evaluators gave manifestly erroneous high scores in respect of "threshold" questions precisely in order to avoid that outcome.
Examples of manifest error
The range of areas in which the judge found manifest error provides a salutary lesson for contracting authorities. A sample are set out below.
1. Failing correctly to apply the requirements in the tender and Good Industry Practice
This arose in relation to the requirement to identify "critical assets". A dispute arose as to whether a risk assessment was required in relation to identification of an asset, or only in relation to its management. RSS's bid was marked down because its assessment of the criticality of an asset was based, in part, on an assessment of risk.
The Judge found that NDA's approach was:
a) An inconsistent application of principles derived from other sources (Good Industry Practice, and Standards expressly referenced in the procurement documents); and
b) Inconsistent with NDA's own guidance documents (written – unfortunately for NDA – by its main witness). That guidance related to NDA's own approach to identifying critical assets (which incorporated a risk assessment). The court found the NDA's approach to the evaluation of the questions relating to the identification of critical assets was manifestly erroneous because it did not comply with Good Industry Practice and was inconsistent with NDA's own guidance.
There was also inconsistency in the marking of questions which contained the same requirement (relating to the identification of critical assets). Six questions were scored "1"; one was scored "5" (which score the Judge said was "plainly right"). The Judge was very critical of NDA's attempt to argue that "5" was an error, rather than to accept the "obvious errors" in scoring the others "1":
"Applying what is obviously the wrong test … (and a test directly contrary to Good Industry Practice) is precisely the sort of manifest error … that is susceptible to review by the court in the exercise of its supervisory jurisdiction. I do not find that the margin of appreciation available to NDA in matters of evaluative judgment permits it to escape a finding that the scoring of these Requirements was manifestly erroneous."
2. Unequal application of the requirements
Again in relation to the identification of critical assets, the court accepted NDA's submission that whether something is critical involves an evaluative judgment on the part of NDA (and is therefore an area in which the authority enjoys a margin of discretion). However, in one case both RSS and the winning bidder had failed to identify the asset in question as critical. Despite this common error, RSS scored "1" and the winning bidder scored "3". The Judge held that either an asset is critical or it is not:
"It would be wholly irrational for the [evaluators] to conclude that groundwater ingress was important at Dungeness such that RSS should be considered to have made a material error in this respect, but not CFT."
3. Whether answers contained "material omissions"
In relation to the re-scoring of manifestly erroneous scores NDA contended for a re-score to 3 (from 1) rather than 5 as contended for by Energysolutions. The definition for a score of 3 was that the submission contained a "material omission". The judge was very critical of NDA's "artificial" attempts to identify omissions in RSS' bid (this needs to be seen in light of the deletion and shredding of contemporaneous notes as we highlighted in our first Byte). In light of that artificiality he found there were in fact no material omissions in RSS' answers and therefore the only possible score was a 5.
4. Misunderstanding NDA's own requirements:
Again in relation to the identification of critical assets, RSS were scored "1" for failing to identify a crane as such. NDA's explanation of this score was based on an erroneous understanding (which NDA admitted) of what the crane was actually used for. The judge noted that:
"The accepted error of fact [as to what the crane did] is, in my judgment, sufficient on its own to demonstrate that the evaluation was performed in manifest error."
The Judge goes beyond admitted errors of fact, though, and finds manifest error in NDA's evaluative judgment as to whether something is in fact a critical asset by looking at what it does. In response to one question RSS had scored "1" because it failed to identify a Pond Water Treatment Plant "PWTP" as a critical asset. The Judge reviewed what the PWTP in fact did and concluded that it was not a critical asset, so the score was in manifest error.
This seems to come quite close to substituting the court's own view in place of the evaluator's (which is not permitted). Notably the Judge had reminded himself a few paragraphs earlier that this was not the court's function. It is arguable that the judge in this example (which is not unique) went beyond the bounds of the court's supervisory jurisdiction. This was not a case of the court applying the criteria and methodology to re-score a question having found manifest error in the authority's approach. Rather it appears to be the court imposing its own judgment about what an authority requires (in this case whether an asset should be identified as critical) and marking the tender responses accordingly.
What led the Court to exercise its supervisory jurisdiction in this manner? One may reasonably speculate that by the time the case concluded, the Court had assimilated extensive technical submissions, and had relatively little by way of audit trail from the contracting authority to explain what the NDA's approach had been. In our experience, the Courts are far less likely to interfere where a comprehensive audit trail exists.
5. Mis-application of scoring criteria
In relation to one question the relevant requirement was stated to be:
"A description of [bidder's] strategy for the delivery of Common Support Functions and the high level objectives of this strategy."
In order to score 5 (top marks) a bidder had to: "demonstrate … consideration of the … available and needed competencies." RSS scored 3. NDA initially justified this score on grounds that RSS's response had not "clearly or fully defined" the competencies. The court found that there was no requirement to "clearly or fully define" them; it was only to "demonstrate consideration" of them. NDA's argument was that "something more" was required in order to score full marks but there was no requirement for "something more" in the criteria nor did that requirement come within the scoring definitions. The Judge also noted that NDA's witness was unable to define what "something more" was. This is a common issue in that evaluators often justify not awarding top marks on similarly vague grounds ("could have scored higher by providing more detail"). Unless the criteria and/or scoring methodology make it clear that extra marks are available for this reason, this approach can invite challenge from bidders who comply with the strict letter of the requirements.
6. Threshold questions
Certain questions were to be scored pass/fail and were "threshold requirements" in that a failure would result in exclusion from the competition. During evaluation, NDA became aware of the severe consequences of a "fail" and NDA accepted in evidence that there may have been a reluctance on the part of evaluators to reach a finding that would lead to failing a threshold requirement and therefore exclusion. NDA argued that they were permitted to "lean against disqualification" but the court rejected that suggestion: the criteria had to be applied as they were, not with consideration to what the consequences of that application might be:
"There is no further step of consideration available to the NDA after any bidder had failed a threshold Requirement, to ask itself "was that threshold Requirement really that important?", to arrive at the conclusion that it was not, and then use that conclusion to justify increasing the score to a higher one than the content merited (or to justify failing to disqualify that bidder). That would be unlawful, in my judgment, for three reasons at least. Firstly it would be a failure to apply the terms of the [requirements] in respect of that particular bidder. Secondly, it would be scoring the Requirement in question manifestly erroneously. Thirdly, and equally importantly, that increased score (for those requirements that had a score associated with them) would then comprise a component of the bidder's overall total score towards the eventual total. The artificially inflated score would count towards that bidder's overall percentage in the competition, thus potentially distorting the whole result of the competition."
Lessons to be learned
The case illustrates the court's willingness to forensically examine – and to correct where manifest error is found – the scores awarded to bidders, with dramatic consequences: in this case finding that the winning bidder should have been excluded from the competition and that Energysolutions should have won. The case will now proceed to a quantum assessment of Energysolutions' damages claim, which it values at some £100m. Manifest errors can come at a high price.
- Know and apply your requirements: Particularly in procedures involving negotiation/dialogue in which requirements are developed and refined, it is essential that evaluators fully understand and apply the final requirements precisely as formulated. Where applicable those requirements should be formulated with regard to Good Industry Practice unless there are good reasons to depart from that and these reasons are clearly communicated to all bidders.
- Avoid inconsistent scoring: Where two bidders both fail to answer (or provide the same answer to) a question there should be very good reasons for different scores being awarded. An effective moderation process should pick up and resolve any such inconsistencies.
- Have an open mind about possible errors: Mistakes happen: the judge in this case reserved some of his strongest criticism for NDA's refusal to accept that mistakes were made:
"The attempts by the NDA to cling to their convoluted explanations concerning this glaringly obvious error demonstrated the degree to which those at the NDA found themselves unable to admit to any mistakes. … [T]he NDA's overall approach in this trial … was never to accept that any mistakes were made, regardless of evidence to the contrary."
When concerns are first raised about marking, consider seeking an independent review in order to avoid your judgment being clouded by the understandable desire not to have to admit that a mistake was made. Early admissions are always less expensive than convoluted denials that are later proved unfounded.
- Keep an evaluation audit trail: We looked at this in our first Byte. The absence of notes makes it difficult to justify a particular score to a disappointed bidder and to the court. The absence of an explanation can only increase the risk that the court will find that a given score is incapable of rational explanation (which was the basis on which some scores in the NDA case were criticised and changed).
- Apply the criteria irrespective of the consequences: A contracting authority is not permitted to "soften" the scoring criteria because a particular score may result in exclusion of a bidder. A desire not to exclude bidders is understandable in that it ensures a healthy competition. However, as the court noted, the cumulative effect of "leaning against disqualification" could alter the outcome of the whole competition and needs to be carefully guarded against.