The panel majority, eager to reprimand the Commissioner of Social Security for what it deems to be inexcusably sloppy practices, disregards — I suggest, respectfully — the deference we owe under law to the agency’s determinations. Rather than observing the standard for harmless error that our precedents have previously prescribed, the majority has erroneously presumed that the Commissioner’s ostensible error has prejudiced Stephanie Garcia, the claimant in this case. I respectfully dissent from this regrettable exaggeration of our Court’s properly limited role in the adjudication of Social Security disability benefits claims.
I
Congress has carefully prescribed a minimal role for the Federal courts in adjudicating claims of disability under the Social Security Act. See 42 U.S.C. § 405(g). Accordingly, we have only limited authority to nullify the decisions of the agency and its administrative law judges with which we disagree. As the majority opinion correctly notes, we may not disturb an ALJ’s denial of benefits unless “it is not supported by substantial evidence or is based on legal error.” Robbins v. Soc. Sec. Admin., 466 F.3d 880, 882 (9th Cir. 2006). Legal error alone, furthermore, is not sufficient to warrant our interference: for example, we generally must stay our hand if it is “clear from the record” that any ostensible error “was inconsequential to the ultimate nondisability determination.” Tommasetti v. Astrue, 533 F.3d 1035, 1038 (9th Cir.2008) (internal quotation marks omitted).
Indeed, one such error that we have identified in past cases has been an ALJ’s failure “fully and fairly [to] develop the record and to assure that the claimant’s interests are considered.” Celaya v. Halter, 332 F.3d 1177, 1183 (9th Cir.2003). This “special” and “independent” duty of the ALJ exists in all circumstances, although, when the applicant is uncounseled, the responsibility to ensure an adequate record is heightened. See Tonapetyan v. Halter, 242 F.3d 1144, 1150 (9th Cir.2001); Smolen v. Chater, 80 F.3d 1273, 1288 (9th Cir.1996). Despite our solicitude in this regard, we have nevertheless clearly limned the outer boundaries of such responsibility. “An ALJ’s duty to develop the record further is triggered only when there is ambiguous evidence or when the record is inadequate to allow for proper evaluation of the evidence.” Mayes v. Massanari, 276 F.3d 453, 459-60 (9th Cir. 2001) (emphasis added).
More recently, we have refined — in the context of the ALJ’s duty to develop the record — the standard by which we appraise whether any such error prejudiced the claimant. In McLeod v. Astrue, the unsuccessful applicant for disability benefits contended that the “ALJ erred by failing to develop the record adequately,” specifically by not “request[ing] more explanation from two of his treating physicians” and by not obtaining “whatever VA disability rating” he may have had. 640 F.3d 881, 884 (9th Cir.2011). We determined that the ALJ had shirked this duty *935to develop the record, but nevertheless that this dereliction was not alone sufficient warrant for reversal. Rather, we explained that “the burden is on the party attacking the agency’s determination to show that prejudice resulted from the error.” Id. at 887. But “where the circumstances of the case show a substantial likelihood of prejudice,” the reviewing court can remand the case so the agency may reconsider the claimant’s eligibility for benefits. Id. at 888. We emphasized, nevertheless, that a “mere probability” of prejudice “is not enough.” Id. Either the claimant must himself shoulder the burden of demonstrating prejudice, or otherwise such prejudice must be apparent on the face of the record or the “circumstances of the case.”
II
The majority’s opinion turns this duty-to-develop doctrine on its head. Even assuming, arguendo, that the ALJ committed legal error by not ordering Dr. McDonald to perform another round of IQ tests on Miss Garcia,1 the majority misstates — and misapplies — the proper standard for assessing any prejudice such error caused.
In the first place, the majority correctly acknowledges that “[w]e will not reverse an ALJ’s decision on the basis of a harmless error,” which occurs “when it is clear from the record that the ALJ’s error was inconsequential to the ultimate nondisability determination,” maj. op. at 932 (internal quotation marks omitted). Although the majority does not expressly state that such rule is the exclusive standard by which to assess the harm caused by an error, its reasoning assumes so. For the majority detects prejudice in “a genuine probability” that a complete set of IQ test scores may have altered the medical reports or provided another basis for Miss Garcia to challenge the ALJ’s determination. Id. at 933. McLeod, however, specifically forecloses this basis for reversing a denial of benefits: a “mere probability,” *936no matter how “genuine,” simply does not suffice. 640 F.3d at 888. The majority articulates an exclusive standard for harmless error that presumes prejudice unless such error appear “inconsequential” on the face of the record. Such may be the ordinary analysis for determining the prejudice caused by legal error. In the special context of the ALJ’s duty to develop the record, however, our Court has already clearly explained that we cannot find prejudice unless and until demonstrated by the claimant or the record and circumstances of the case.
Furthermore, the majority offers no basis, either in law or in fact, for simply asserting that the absence of a full set of IQ test scores would have had any likely effect on the ALJ’s disability determination. The majority first observes that “[b]oth Dr. Middleton and Dr. Murillo considered Garcia’s incomplete IQ test results in assessing her ability to support herself through gainful employment.” Maj. op. at 933. Indeed, the medical experts considered the test scores — but they also considered sundry other relevant data, such as her employment history, educational and recreational activities, financial independence, grooming, and the cooperation and comprehension she displayed during her clinical evaluation. The majority does not indicate any basis from these experts’ reports that the partial test scores figured decisively in their recommendations. Nor does the majority opinion advert to any item in the record or the “circumstances of the case” that suggests the slightest chance — let alone a “genuine probability” — the ALJ would have concluded differently had he seen a full set of IQ test scores.
Even Miss Garcia’s own briefing does not attempt such an argument. In her opening brief, she emphasizes only that, deprived of a full battery of test scores, she lost the opportunity to qualify for automatic disability benefits under Listing 12.05 C or D, see 20 C.F.R. § 404, subpt. P, app. 1. She does not, however, attempt affirmatively to link the incomplete IQ tests with the medical reports and the ALJ’s determination of her residual functional capacity. Only in her supplemental brief does Miss Garcia clearly assert such a connection — and, even there, she does not offer any reason why we may expect the medical experts would have substantively revised their reports in light of complete test results.
The majority assures us, however, that an alternative finding by the ALJ “seems particularly plausible” based on Miss Garcia’s “considerably lower” test results as a juvenile. Maj. op. at 933. But this is a non sequitur. The ALJ determined Miss Garcia not to be disabled in light of her record as a whole: he did not explain that the partial IQ test score carried dispositive weight. Nothing in the record to which either Miss Garcia or the majority point suggests a necessary connection between marginally lower IQ scores and a RFC finding that would prevent her from procuring and performing gainful employment. This “genuine probability” of a different outcome that the majority identifies, accordingly, appears little more than an unsubstantiated hunch.
In addition, Listings 12.05 C and D require not only a sufficiently low IQ test score, but also additional impairments, before the applicant may qualify for disability benefits thereunder. Miss Garcia does not, before this court, argue that she may have qualified under Listing 12.05 B, which she would satisfy simply by scoring below 60 on any of her tests without presenting any other additional impairments.2 *937Nevertheless, the majority, pointing to her substantially lower testing results as a juvenile, predicts that Miss Garcia may have scored low enough to qualify as disabled under Listing 12.05 B. For such reason, the majority finds prejudice in Dr. McDonald’s failure to administer the entire battery of IQ tests and in the ALJ’s acceptance of these partial scores. In effect, this reasoning says — bizarrely—that Miss Garcia wins an argument she does not make. Since she never claimed on appeal that she would have qualified under Listing 12.05 B, the possibility that she could have so qualified should not be a grounds that she suffered prejudice.
Ill
The majority’s reasoning, furthermore, threatens to undermine the highly deferential standard under which we review the Commissioner’s decisions. When presented with an appeal from an unsuccessful applicant, we may not second-guess the Commissioner’s determination or reverse him simply because we disagree with the result. Our authority to order relief is more limited: if substantial evidence exists in the record to support the agency’s fact-bound conclusions, our analysis must generally come to an end. Here the majority opinion does not suggest an absence of substantial evidence to ballast the ALJ’s nondisability finding; rather, it posits that, despite any such substantial evidence, the ALJ might have reached an alternative conclusion if the record had contained a full set of IQ scores.
Such holding opens a potentially fatal breach in the substantial-evidence framework. Indeed, the majority determines that the ALJ committed legal error by not developing the record to include a full set of test scores; and, indeed, “legal error” is a basis distinct from the lack of substantial evidence for reversal. Nevertheless the relationship between these two standards, in the context of the ALJ’s legal duty to develop the record, should be apparent enough. Claimants previously required to disprove the existence of substantial evidence will now plead an incomplete record and, citing the majority opinion, will assert that the outcome of their case “might have been different,” maj. op. at 933. Seldom will be the occasion where the ALJ could not have examined more reports or ordered more tests. In Mayes, we specifically rejected a challenge from a claimant who contended, in effect, that substantial evidence did not support the ALJ’s denial because he did not adequately develop the record. 276 F.3d at 459. The substantial-evidence standard protects against precisely such attacks on the administrative process: the courts may not overturn the agency’s findings, substantiated by sufficient data, even in the presence of compelling countervailing evidence. Claimants ought not be able to circumvent this standard by invoking hypothetical evidence that the ALJ could have but neglected for one reason or another to consider. Id. Our procedure, elucidated in McLeod, for assessing the prejudice caused by an inadequately developed record reinforces these principles. The ALJ’s duty to develop “is triggered only” in certain circumstances, Mayes, 276 F.3d at 459, and, unlike other contexts, we do not presume prejudice until the claimant or the record demonstrates otherwise, see McLeod, 640 F.3d at 887-88.
The majority’s doctrinal innovation destabilizes this framework, substantially lowering the burden for plaintiffs seeking the intervention of the Federal courts in the Commissioner’s decision-making processes and portending to make substantial-evidence review a dead letter. Such result *938contravenes the precedents of this Court, the intent of Congress, and the separation of powers.
IV
For the foregoing reasons, I respectfully dissent.
. I remain unconvinced that, at least in the circumstances of this case, the ALJ erred by not ordering a new, and complete, round of IQ tests. The majority opinion does not assert that the partial test scores constitute “ambiguous evidence” or make the “record ... inadequate” for the purposes of assessing residual functional capacity. See Mayes, 276 F.3d at 460.
At most, the majority opinion gleans from the regulations an expectation of or a preference for “multiple scores” from a Wechsler series IQ test, maj. op. at 931. Whether such regulatory intimations can "trigger[j” the ALJ’s duty further to develop the record, 276 F.3d at 459, does not appear compelled by our precedents. And the majority does not pause to explain why.
Furthermore, the majority scarcely indicates what countervailing constraints — if any — may defeat the regulations’ preference for or expectation of multiple test scores. Dr. McDonald’s purported reasons for not administering the complete Wechsler series IQ test were "the constraints of time and the slowness with which [Miss Garcia] worked.” The majority simply deems this explanation an “excuse,” dismissing it as "troublesome” and scolding the district court, which in its judgment "should not have accepted it in the absence of some more compelling reason.” Maj. op. at 927-28 & n. 1.
I strongly resist this lecture to medical practitioners. Not only does the record lack any clear implication of either excuse-making or duty-shirking, but also it is not self-evident that the time Dr. McDonald did devote to administering the tests and interviewing Miss Garcia was insufficient or otherwise imprudent. We should be reticent to craft, in footnotes to our opinions, legal rules governing the minutiae of medical practice — such as how and when to schedule tests and interviews — where Congress has not legislated and where the agency has not regulated. And especially not where the record and the parties’ briefings do not present an adequate basis for determining which sort of constraints are reasonable and which are merely "excuses.”
. In her opening brief, Miss Garcia specifically argued that "a valid IQ score on one of the *937two missing IQ tests may provide satisfaction of the Listing at § 12.05(C) or (D).”