ORDER AND JUDGMENT*
MONROE G. McKAY, Circuit Judge.Plaintiff Jerry A. Garcia appeals from a district court order affirming the Commissioner’s denial of social security disability benefits. We review the Commissioner’s decision to ensure that it is supported by substantial evidence and adheres to applicable legal standards. Chambers v. Barnhart, 389 F.3d 1139, 1142 (10th Cir.2004). As explained below, we reverse and remand for further agency proceedings.
Mr. Garcia’s primary physical complaint is back pain with associated limitations caused by herniated lumbar/sacral discs. In August 1993, he had a hemilaminotomy at the L4-5 level and a hemilaminectomy at the L5-S1 level. He returned to work and re-injured his back in December 1993, and since that time he has not engaged in substantial gainful activity, though he has assisted in minor ways with a child care business his wife runs in their home. On several occasions in early 2000, he saw Dr. Frank Maldonado, who ordered an MRI and new x-rays that showed: “disc herniation at L4-5 and L5-S1” with “marked narrowing at the L5-S1 interspace [and] encroachment of the neural foramina ... partly from the disc and partly from bone.” App. II at 181. Dr. Maldonado recommended a “two level lumbar diskectomy, L4-5 and L5-S1 with interbody fusion,” but he also acknowledged that “there were no guarantees, particularly with a re-operated back.” Id. He noted that in the meantime Mr. Garcia remained totally disabled. Id. Eight months later, Mr. Garcia applied for benefits.
The Commissioner sent Mr. Garcia to Dr. Debra Schenck for a consultative examination. Dr. Schenck concluded that Mr. Garcia’s abilities to sit, stand and walk were unlimited, that he could lift up to ten pounds frequently and twenty occasionally, and, consequently, that he was capable of performing a full range of light work. Id. at 173-74, 179. Mr. Garcia returned to Dr. Maldonado to obtain a more specific assessment of the functional limitations reflected in the broad conclusion he had made regarding disability in his prior report. Dr. Maldonado found that Mr. Garcia could not sit or stand/walk for more than one to two hours (each) in an eight *762hour day and could not lift more than ten pounds occasionally. Id. at 213-14. He also noted limitations on pushing/pulling, as well as environmental restrictions (heights, yibrations, moving machinery) due to balance and agility issues. Id. at 215. The reports of . Dr. Maldonado and Dr. Schenck are the primary medical assessments in the record.
This case has been heard by two administrative law judges. The first ALJ found Mr. Garcia capable of performing light work and, given his age and (limited) educational level, concluded that he was not disabled under the Medical Vocational Guidelines, 20 C.F.R., pt. 404, subpt. P, app. 2, Rule 202.18. See App. II at 15-16. The district court reversed that decision, because the ALJ had not properly addressed Dr. Maldonado’s opinions and had failed to mention those of a Dr. Hood, whom Mr. Garcia had seen in 1994 shortly after re-injuring his back. On remand, the second ALJ found Mr. Garcia capable of only a limited range of sedentary work, but concluded he was not disabled based on expert vocational testimony identifying three jobs within the limitations specified. This second decision is the subject of our review here.
Mr. Garcia raises two broad issues: (1) the ALJ’s handling of the opinions of Dr. Maldonado and Dr. Hood (who again was not mentioned) on remand was still inadequate and led to undue weight being given the contrary opinion of agency consultant Dr. Schenck; and (2) the ALJ failed to obtain an explanation from the vocational expert (VE) regarding a discrepancy between her testimony and the job descriptions in the Dictionary of Occupational Titles (DOT), as required by Haddock v. Apfel, 196 F.3d 1084, 1087 (10th Cir.1999) and Social Security Ruling (SSR) 00-4p, 2000 WL 1898704. We hold that the ALJ’s error in the first respect, particularly with respect to Dr. Maldonado, requires reversal and remand for proper consideration of the medical opinion evidence. In light of that holding, we need not decide if the second issue requires reversal, although we do explain why, if a similar VE-DOT discrepancy arises on remand, it should be addressed as prescribed in the authorities noted.
MEDICAL OPINION EVIDENCE
Dr. Maldonado recognized significant limitations, particularly an inability to sit, stand, and/or walk for more than a total of two to four hours in a work day, which were not included in the hypothetical posed to the VE. Indeed, consistent with SSR 96-8p (clarifying that functional capacity assessment looks to “ability to do sustained work-related ... activities in a work setting on a regular and continuing basis,” i.e., “8 hours a day for 5 days a week, or an equivalent work schedule,” 1996 WL 374184, at *1 (emphasis added)), such limitations would dictate a finding of disability, as the Commissioner has conceded in other cases. Bladow v. Apfel, 205 F.3d 356, 359 (8th Cir.2000); Kelley v. Apfel, 185 F.3d 1211, 1214 (11th Cir.1999); see also Rodriguez v. Bowen, 876 F.2d 759, 763 (9th Cir.1989) (pre-SSR 96-8p case directing award of benefits “[bjecause capability to work only a few hours per day does not constitute the ability to engage in substantial gainful activity”). Thus, it was critical for the ALJ to properly assess these findings in light of Dr. Maldonado’s treatment relationship with Mr. Garcia, as prescribed by 20 C.F.R. § 404.1527(d).
Our review of the matter is hampered by two basic deficiencies in the ALJ’s decision. First, the ALJ never addressed Dr. Maldonado’s treatment relationship with Mr. Garcia, in particular whether he qualified as a “treating source” ordinarily entitled to special deference under *763§ 404.1527(d). The Commissioner now insists he was not a treating source. Given the general prohibition on post hoc justification of agency decisions, see Allen v. Barnhart, 357 F.3d 1140, 1145 (10th Cir.2004); Knipe v. Heckler, 755 F.2d 141, 149 (10th Cir.1985), we would not uphold the agency’s decision on this basis unless we could confidently say that no reasonable ALJ considering our record and following the proper analysis could have resolved the matter in any other way, Allen, 357 F.3d at 1145. Such a determination is not warranted by our record, which shows that Dr. Maldonado saw Mr. Garcia several times (beginning eight months before Mr. Garcia applied for benefits), performed repeated physical examinations, ordered and reviewed x-rays and an MRI, and, on the basis of these efforts, arrived at a recommendation for treatment involving significant surgical intervention, the risks and benefits of which he fully explained to Mr. Garcia. This case is thus unlike Doyal v. Barnhart, 331 F.3d 758 (10th Cir.2003), on which the Commissioner relies, where a physician was held not to be a treating source because the record showed only that he had seen the claimant two times some seven years apart.1 See id. at 763-64. Under the circumstances, we assume for purposes of this appeal that Dr. Maldonado is a treating source and review the ALJ’s analysis of his opinions accordingly. We do not, however, purport to usurp the ALJ’s role as first-line decisionmaker, see Allen, 357 F.3d at 1144, and, thus, we do not foreclose further consideration of Dr. Maldonado’s treating relationship with Mr. Garcia on remand should the Commissioner wish to pursue that.
The second impediment to our review is the ALJ’s failure to discuss the two primary opposing medical sources, Dr. Maldonado and Dr. Schenck, specifically in relation to each other and the scheme set out in § 404.1527(d) for the evaluation of medical source opinions. As a result, we are left with gaps and unmade connections in the administrative analysis under review. In any event, filling these as best we can by inference from the ALJ’s stated analysis still leaves us with the firm conviction that the decision denying benefits cannot stand on the rationale given.
This court has detailed the principles governing the assessment of medical opinions on numerous occasions. See, e.g., Langley v. Barnhart, 373 F.3d 1116, 1119 (10th Cir.2004); Hamlin v. Barnhart, 365 F.3d 1208, 1215 (10th Cir.2004). For our purposes, it will suffice to note that (1) “[t]he ALJ is required to give controlling weight to the opinion of a treating physician as long as the opinion is supported by medically acceptable clinical and laboratory techniques and is not inconsistent with other substantial evidence in the record,” and (2) “[w]hen an ALJ rejects a treating physician’s opinion, he must articulate specific, legitimate reasons for his decision,” Hamlin, 365 F.3d at 1215 (quotation omitted). Here, the ALJ “g[a]ve Dr. Maldonado’s conclusions no significant weight,” App. II at 239, for three reasons that do not stand up to scrutiny in light of the record and controlling law.
First, the ALJ concluded that “Dr. Maldonado relied excessively on the claimant’s subjective complaints,” id., citing one page from the notes attached to Dr. Maldonado’s October 2001 functional assessment, where he said: “I had a long talk with the *764patient. He is miserable and his life is miserable. He says he is not happy. I told him I could see that in his face, as well as in the examination,” id. at 211. The same notes, however, and other treatment records the ALJ failed to mention, show that Dr. Maldonado also had multiple physical examinations, as well as xray/MRI confirmation of recurrent disc herniation, to rely on for his assessment. See id. at 181-83, 189-91, 210-11. This court has made it clear that when an ALJ rejects a medical opinion under such circumstances, based on his speculation that the doctor was unduly swayed by a patient’s subjective complaints, the ALJ deviates from correct legal standards and his decision is not supported by substantial evidence. Langley, 373 F.3d at 1121.
Second, the ALJ stated that Dr. Maldonado’s functional assessment was “not consistent with Dr. Maldonado’s own objective findings.” App. II at 239. Some of the findings the ALJ cited to support this statement are simply irrelevant, such as the normal sensory function in Mr. Garcia’s right foot and the “okay” range of motion in his knees. Id. The finding that Mr. Garcia’s “hip motion was okay,” id., might be relevant, but this one positive indication is drained of any significance in light of plainly pertinent findings that the ALJ ignored, such as “significant tenderness to percussion over the lower lumbar spine. There is mild to moderate lumbar paraspinous muscle spasm. Forward bending is very minimal. Extension is almost non-existent. The lateral bend is not able to be done”—all clearly functionally related to Mr. Garcia’s objectively demonstrated underlying disc herniation.2 Id. at 210. Finally, the finding that Mr. Garcia was “temporarily totally disabled from even the most sedentary activity because of inability to sit still for more than a few minutes,” id. at 239, does not undercut Dr. Maldonado’s RFC assessment that Mr. Garcia could sit for no more than one to two hours in a workday; indeed, it reflects an even more extreme, albeit more specific, postural limitation. To the extent the ALJ was focusing on the temporal qualification in the cited finding (suggested by the ALJ’s alteration of the original “temporarily totally disabled ...,” id. at 212, to “only temporarily totally disabled ...,” id. at 239 (emphasis added)), there is nothing inconsistent in so qualifying a limit on the ability to sit continuously without changing position and not so qualifying a limit on the total amount of time one can sit (in any manner) in an eight hour day. In sum, the ALJ’s criticism that Dr. Maldonado’s objective findings contradicted his RFC assessment is unfounded.3
Third, the ALJ held that Dr. Maldonado’s assessment was “not consistent with the record as a whole.” Id. The only supporting reference given by the ALJ *765was a general allusion to “evidence of symptom magnification.” Id. Earlier in his decision, the ALJ had discussed Dr. Schenck’s consultative report stating that Mr. Garcia “exhibited marked pain behavior ... much more evident when observed than when unobserved,” noting that his “antalgic gait [i.e., limp] walking across the parking lot ... got much worse when he got into the clinic,” and contrasting his “exaggerated pain behavior ... during the entire exam” with the fact that “when distracted, he was able to sit and fill out his paperwork, do an eye exam, and sit in the hearing booth.” Id. at 238. Mr. Garcia explained, however, that the latter seated activities were of short duration (five to fifteen minutes) and were alleviated by intervening periods of standing and walking. Id. at 115. He also raised issues about the accuracy, completeness, impartiality, and reliability of the consultative report—on which we need not and do not comment. The decisive point here concerns, rather, the ALJ’s inversion of the prescribed of medical opinions, which gives precedence to a treating opinion over a consulting opinion unless, given the substance and substantiation of the latter, it demonstrably outweighs the former. See Hamlin, 365 F.3d at 1219 (explaining that “consultative physician’s report should be ‘examined to see if it “outweighs” the treating physician’s report, not the other way around).’ ” (quoting Reyes v. Bowen, 845 F.2d 242, 245 (10th Cir.1988)). Contrary to that rule, the ALJ rejected Dr. Maldonado’s treating opinion because a consulting physician simply disagreed with him over the interpretation and assessment of the symptoms of Mr. Garcia’s impairment.
There may be cases in which a consultant’s suspicion of exaggeration, if well-substantiated and corroborated, could displace a treating physician’s opinion that had been based predominantly on subjective symptoms, but we do not have such circumstances here. Dr. Maldonado’s opinion was based on a convergence of objective tests and physical examinations and was tied to a recommendation for significant surgical intervention; it was not a mere recitation of subjective complaints related by the claimant. Dr. Schenck’s suspicion of exaggeration, on the other hand, was an inference from inconclusive observations4 made during a single visit (by a physician not responsible for successfully treating the condition in question) and was not voiced by any other examining physician in our record (and there were at least five doctors who saw Mr. Garcia at least once and noted his symptoms of pain without suggesting any exaggeration).5 If we held such a suspicion sufficient to de*766feat a properly substantiated treating opinion, .the treating-physician rule would become a rather hollow principle.
Dr. Schenck also offered her own functional assessment of Mr. Garcia, in which she discounted any limitations on his ability to sit, stand, or walk, and opined that he would be able to perform light work. Given the direct opposition between these findings and those of Dr. Maldonado, and the absence of any other comparable medical source evidence,6 what we have just said about the relative priority of treating and consulting opinions is implicated in this regard as well.
Although the ALJ summarily concluded that Dr. Maldonado’s findings were contrary to the record, and the source of contradiction in the record was Dr. Schenck’s report, the ALJ never specifically held, much less explained why, Dr. Maldonado’s opinions were outweighed by those of the consultant. Indeed, the ALJ did not adopt Dr. Schenck’s findings either—she had opined that Mr. Garcia had no stand/walk restrictions and could therefore perform light work, while the ALJ held he could perform only a limited range of sedentary work. If an ALJ cannot summarily accept and elevate a consulting opinion over a treating opinion, a fortiori the ALJ here could not reject Dr. Maldonado’s opinion simply based on its contradiction by a consulting opinion the ALJ also did not accept.7 Given the ALJ’s failure to address these medical opinions with reference to the controlling regulation, and his resultant failure to adequately justify his rejection of the treating physician’s opinion (under which the claimant would clearly be disabled), the decision denying benefits must be reversed.
VE-DOT DISCREPANCY
Mr. Garcia also argues that the ALJ improperly relied on VE testimony that conflicted with the DOT. Specifically, in response to the ALJ’s hypothetical questioning that indicated Mr. Garcia could perform only “routine, repetitive” and “simple” work, App. II at 284, 285; see also id. at 240-41, the VE identified jobs (charge account clerk and surveillance system monitor) for which, as explained below, the DOT specifies a higher reasoning level (level 3).
This court adheres to the rule, adopted in Haddock, 196 F.3d at 1087, reaffirmed in Hackett v. Barnhart, 395 F.3d 1168, 1175 (10th Cir.2005), and codified in SSR *76700-4p, that an ALJ must elicit a reasonable explanation for any material conflicts between a VE’s testimony and occupational information in the DOT. More specifically relevant here, in Hackett we found a facial conflict between a claimant’s “inability to perform more than simple and repetitive tasks” and the “level-three reasoning” required in the DOT for jobs identified by the VE, and, consequently, reversed and remanded for an explanation, if any, that would resolve the conflict so as to permit reliance on the VE’s testimony. Hackett, 395 F.3d at 1176. Mr. Garcia argues Hackett requires reversal and remand here.
In response, the Commissioner again resorts to post hoc justifications to support the ALJ’s decision. As noted earlier, such justifications cannot succeed unless they are so conclusive that no reasonable ALJ could have resolved the disputed matter in any other way. Allen, 357 F.3d at 1145. We doubt any of the Commissioner’s efforts are so compelling; indeed, some are inadequate on their face. For example, the Commissioner insists the ALJ was referring to physically simple and routine tasks and, thus, not suggesting any mental restriction. But, as Hackett itself reflects, “simple and routine” is a limitation associated with the mental, not physical, aspects of work (indeed, it is a cognitively-based limitation typically restricting the claimant to physical labor, not a physical restriction on such labor), and the context does not suggest the ALJ used the phrase in another sense. Rather, the context indicates the ALJ was addressing Mr. Garcia’s lack of learned skills and limited education (and in that regard we note Mr. Garcia had been held back a grade in elementary school and had failed the written test for military service). See App. II at 283. The Commissioner also argues that the evidence does not actually support a limitation to simple and routine work. This line of argument, however, is foreclosed by the fact “that the ALJ himself noted the limitation.” Saiz, 392 F.3d at 399.
We need not respond point-for-point to all of the Commissioner’s efforts to justify the ALJ’s decision insofar as it implicates the VE-DOT conflict, nor must we resolve whether the one occupation identified by the VE unaffected by the conflict (jewelry preparer, involving 160 state jobs) reflects work in sufficient numbers to conclusively establish the requisite numerical significance, see Allen, 357 F.3d at 1144-45, as we are remanding the case on account of other error in any event. But we do emphasize the analytical significance and consequences of the ALJ’s finding regarding the cognitive demands Mr. Garcia is able to meet and reaffirm that, if the ALJ seeks to properly account for these on remand, he must heed the guidance provided by this court’s decisions in Hackett and Haddock.
The judgment of the district court is REVERSED and the cause is REMANDED with instructions to remand, in turn, to the Commissioner for further proceedings consistent with this order and judgment.
After examining the briefs and appellate record, this panel has determined unanimously to grant the parties’ request for a decision on the briefs without oral argument. See Fed. R.App. P. 34(f); 10th Cir. R. 34.1(G). The case is therefore ordered submitted without oral argument. This order and judgment is not binding precedent, except under the doctrines of law of the case, res judicata, and collateral estoppel. The court generally disfavors the citation of orders and judgments; nevertheless, an order and judgment may be cited under the terms and conditions of 10th Cir. R. 36.3.
. The Commissioner mistakenly cites Frey v. Bowen, 816 F.2d 508, 513-15 (10th Cir.1987), as another example where treating-source status was rejected. Actually, Frey reversed a denial of benefits because the treating physician's opinions should have been accorded controlling weight.
. It bears repeating that Dr. Maldonado’s findings of pain and impairment were severe enough to prompt him to recommend a second invasive spinal surgery (a two level lumbar diskectomy with interbody fusion) even though he could not offer any guarantees that it would provide relief. See App. II at 181.
. The Commissioner attempts to buttress the ALJ’s analysis with some more specific points. The Commissioner notes that Mr. Garcia told Dr. Maldonado that "sitting for more than two hours aggravates the pain in almost any kind of chair,” App. II at 187, 190, but we fail to see how that statement fatally undermines the doctor's opinion that Mr. Garcia should sit no more than two hours in a workday. The Commissioner also notes that Dr. Maldonado’s more detailed functional assessment was prepared several months after Mr. Garcia’s insured status expired, but given its consistency with Dr. Maldonado’s initial opinion of total disability and his treatment notes from over a year before the expiration of Mr. Garcia’s insured status, the timing issue raised post hoc here is more form than substance.
. For example, Dr. Schenck did not say that Mr. Garcia fabricated a limp, but only that she perceived a worsening of the limp admittedly evident as he crossed the parking lot. App. II at 178. She commented on his “moaning and groaning” during his physical exam, id., but she had no comparable experience with similar exertional (flexion/extension/abduction/rotation) activity by Mr. Garcia from which to gauge the presence or extent of the exaggeration she inferred.
. The ALJ also noted that an athletic trainer indicated a "tendency toward symptom magnification,” App. II at 238, based on Mr. Garcia’s responses to a pain questionnaire, see id. at 168. An athletic trainer, however, is not a "medical source” to be weighed under the scheme for evaluating medical opinion evidence, see § 404.1527(a)(2) (referring to statements of "physicians and psychologists or other acceptable medical sources”); § 404.1513(a)(l)-(5) (specifying "acceptable medical sources”). In any event, considered as an informed lay observer, the trainer still noted significant sitting limitations (one hour at a time, no more than four hours a day) that are much more consistent with Dr. Maldonado’s findings than those of Dr. Schenck. See App. II at 119.
. Other medical records, including those of Dr. Hood, relate to Mr. Garcia’s condition leading up to and following his initial surgery in 1993-94, and do not contain the detailed functional findings provided in the reports from 2000-01. We note, however, that Dr. Hood's records, which the ALJ again failed to address as directed on remand, did state that Mr. Garcia should avoid "long periods of standing or sitting or any like activity,” App. II at 132—a restriction much more consistent with Dr. Maldonado’s findings than those of Dr. Schenck. It was error for the ALJ to ignore this supporting evidence from Dr. Hood, see Sullivan v. Hudson, 490 U.S. 877, 886, 109 S.Ct. 2248, 104 L.Ed.2d 941 (1989) (“Deviation from the court’s remand order in the subsequent administrative proceedings is itself legal error, subject to reversal on further judicial review.”), though the Commissioner argues the omission was harmless. We need not resolve this collateral dispute, as the ALJ’s deficient assessment of Dr. Maldonado’s findings necessitates reversal in any event.
. Similarly, lifting restrictions of fifty and thirty-five pounds noted in 1994 by Dr. Garland and Dr. Reiter, respectively (cited by the Commissioner as a broad indication of greater functional capacity than that found by Dr. Maldonado) are not relevant given that the ALJ specifically found Mr. Garcia’s "maximum lifting ability is 10 pounds,” App. II at 240. See Saiz v. Barnhart, 392 F.3d 397, 399 (10th Cir.2004) (holding Commissioner's argument "undone by the obvious point that the ALJ himself noted [a contrary] limitation”).