Case: 21-1747 Document: 48 Page: 1 Filed: 04/21/2022
United States Court of Appeals
for the Federal Circuit
______________________
MID CONTINENT STEEL & WIRE, INC.,
Plaintiff-Appellee
v.
UNITED STATES,
Defendant-Appellee
PT ENTERPRISE INC., PRO-TEAM COIL NAIL
ENTERPRISE INC., UNICATCH INDUSTRIAL CO.,
LTD., WTA INTERNTIONAL CO., LTD., ZON MON
CO., LTD., HOR LIANG INDUSTRIAL
CORPORATION, PRESIDENT INDUSTRIAL INC.,
LIANG CHYUAN INDUSTRIAL CO., LTD.,
Defendants-Appellants
______________________
2021-1747
______________________
Appeal from the United States Court of International
Trade in Nos. 1:15-cv-00213-CRK, 1:15-cv-00220-CRK,
Judge Claire R. Kelly.
______________________
Decided: April 21, 2022
______________________
ADAM H. GORDON, The Bristol Group PLLC, Washing-
ton, DC, argued for plaintiff-appellee. Also represented by
PING GONG.
Case: 21-1747 Document: 48 Page: 2 Filed: 04/21/2022
2 MID CONTINENT STEEL & WIRE v. US
MIKKI COTTET, Appellate Staff, Civil Division, United
States Department of Justice, Washington, DC, argued for
defendant-appellee. Also represented by BRIAN M.
BOYNTON, JEANNE DAVIDSON, PATRICIA M. MCCARTHY;
VANIA WANG, Office of the Chief Counsel for Trade Enforce-
ment and Compliance, United States Department of Com-
merce, Washington, DC.
NED H. MARSHAK, Grunfeld, Desiderio, Lebowitz, Sil-
verman & Klestadt LLP, New York, NY, argued for defend-
ants-appellants. Also represented by MAX F. SCHUTZMAN;
DHARMENDRA NARAIN CHOUDHARY, ANDREW THOMAS
SCHUTZ, Washington, DC.
______________________
Before NEWMAN, LOURIE, and TARANTO, Circuit Judges.
TARANTO, Circuit Judge.
In 2015, the United States Department of Commerce
issued an antidumping duty order covering steel nails from
Taiwan. In 2019, we ordered a remand to Commerce for
further explanation of one aspect of the methodology it had
adopted to determine whether there was “a pattern of ex-
port prices . . . that differ significantly among purchasers,
regions, or periods of time” under 19 U.S.C.
§ 1677f-1(d)(1)(B)(i). Mid Continent Steel & Wire, Inc. v.
United States, 940 F.3d 662, 675 (Fed. Cir. 2019) (CAFC
2019 Op.). The present appeal involves Commerce’s rede-
termination on remand from our 2019 decision.
In this proceeding, as in others, Commerce, in order to
assess the significance of the difference between the prices
of two groups of sales, stated that it was using a widely
known statistical measure called the Cohen’s d coefficient.
As applied to groups of sales, that coefficient is a ratio
whose numerator is the difference between means of the
prices of the two groups and whose denominator is a figure,
reflecting the general dispersion of the pricing data, that
Case: 21-1747 Document: 48 Page: 3 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 3
serves as a benchmark against which to judge the signifi-
cance of the difference stated in the numerator. Commerce
used, for that benchmark, a figure based on the standard
deviations of the prices in the two groups; it squared the
standard deviations of the prices of each group (yielding
the variances), added them together and divided by two,
then took the square root. The middle step—adding to-
gether and dividing by two—is “simple averaging,” which
gives equal weight in the average to each group, even if
they are very different in size (e.g., if the first group reflects
sales of 5 units and the second group reflects sales of 95
units). A “weighted average” approach, in contrast, would,
at the middle step, assign weights proportionate to each
group’s share of the total (e.g., multiplying the first group’s
variance by 5 and the second by 95, then dividing the sum
by 100, thus giving 5/100 weight to the first group and
95/100 weight to the second group). In 2019, we held that
Commerce did not adequately explain why it was reasona-
ble to use simple averaging. Id. at 673–75. On remand
from our decision, Commerce again chose to use simple av-
eraging for its version of a Cohen’s d denominator.
The Court of International Trade (Trade Court) upheld
Commerce’s decision. Mid Continent Steel & Wire, Inc. v.
United States, 495 F. Supp. 3d 1298, 1308 (Ct. Int’l Trade
2021) (CIT 2021 Op.). The Taiwanese producers and ex-
porters of the steel nails at issue appeal. We conclude that
the relevant statistical literature cited by Commerce uni-
formly uses weighted averaging in the Cohen’s d denomi-
nator calculation and that Commerce has not offered a
reasonable justification for its departure from the cited lit-
erature. We therefore vacate the Trade Court’s decision
and require a remand to Commerce for further considera-
tion of its methodology for applying § 1677f-1(d)(1)(B)(i)
here.
Case: 21-1747 Document: 48 Page: 4 Filed: 04/21/2022
4 MID CONTINENT STEEL & WIRE v. US
I
A
In an antidumping duty investigation, when Com-
merce seeks to determine whether the foreign-originated
merchandise of a foreign producer or exporter is being sold
in the United States at less than fair value, see 19 U.S.C.
§ 1673, it must compare the home-country “normal value”
(often the sale price in the home country) with the actual
or constructed “export price” reflecting the price at which
the merchandise is sold into the United States. CAFC 2019
Op., 940 F.3d at 665. That comparison usually calls for use
of an “average-to-average” method. When the normal
value is based on home-country sales prices of a foreign
producer or exporter who is a respondent in the proceeding,
the average-to-average method compares “the weighted av-
erage of the respondent’s sales prices in its home country
during the investigation period to the weighted average of
the respondent’s sales prices in the United States during
the same period.” Stupp Corp. v. United States, 5 F.4th
1341, 1345 (Fed. Cir. 2021); CAFC 2019 Op., 940 F.3d at
666; see also 19 U.S.C. § 1677f-1(d)(1); 19 C.F.R.
§ 351.414(b)(1), (c)(1). But that average-to-average com-
parison is not the only authorized method: two other meth-
ods are authorized, of which one is at issue here.
The statute permits comparisons on a “transaction-to-
transaction” basis in unusual circumstances, 19 U.S.C.
§ 1677f-1(d)(1)(A)(ii); 19 C.F.R. § 351.414(c)(2), but that
method is not at issue here. What is at issue is a third
method authorized by Congress under certain circum-
stances—an “average-to-transaction” method. This
method calls for the “weighted average of normal values”
in the home country to be compared to the “export values
(or constructed export values) of individual transactions”
in the United States. 19 U.S.C. § 1677f-1(d)(1)(B); 19
C.F.R. § 351.414(b)(3). The object is to uncover “targeted”
dumping, a label for an exporter’s unduly low pricing in
Case: 21-1747 Document: 48 Page: 5 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 5
portions (less than all) of its overall U.S. sales, which would
be “masked” (offset) by the exporter’s other, higher-priced
sales if only overall averages are considered. See Stupp, 5
F.4th at 1345. Congress directed that Commerce may use
the “average-to-transaction” method only if
(i) there is a pattern of export prices (or constructed
export prices) for comparable merchandise that dif-
fer significantly among purchasers, regions, or pe-
riods of time, and
(ii) the administering authority explains why such
differences cannot be taken into account using a
method described in paragraph (1)(A)(i) [average-
to-average] or (ii) [transaction-to-transaction].
19 U.S.C. § 1677f-1(d)(1)(B).
The statute does not specify how Commerce should de-
termine whether those conditions are met. Stupp, 5 F.4th
at 1346. Starting in 2014, Commerce has used a two-stage
“differential pricing” analysis. See Differential Pricing
Analysis; Request for Comments, 79 Fed. Reg. 26,720,
26,722, (May 9, 2014) (Differential Pricing RFC); see also
Stupp, 5 F.4th at 1346–48. The first stage of that process
corresponds to the inquiry in paragraph (i)—whether
“there is a pattern of export prices . . . that differ signifi-
cantly among purchasers, regions, or periods of time”—and
itself has two parts: the “Cohen’s d test,” followed by the
“ratio test.” Differential Pricing RFC at 26,722–23. The
second (final) stage involves a “meaningful difference” as-
sessment to make the determination required in paragraph
(ii). Id. The present case involves the Cohen’s d test—the
first part of the first stage of Commerce’s overall differen-
tial pricing analysis.
Under the method as described in 2014, Commerce,
considering all sales in the United States by an exporter, is
to select a specific purchaser, region, or period of time, form
a “test group” consisting of all the exporter’s sales meeting
Case: 21-1747 Document: 48 Page: 6 Filed: 04/21/2022
6 MID CONTINENT STEEL & WIRE v. US
that criterion, and put all the exporter’s remaining U.S.
sales in the “comparison group.” Id. at 26,722. That is,
Commerce is to compare sales to one purchaser to sales to
all others, sales in one region to sales in all others, and
sales in one period to sales in all others—in fact, to do so
for each purchaser, each region, and each period. For each
such test group, Commerce is to compute the Cohen’s d co-
efficient by comparing the average price of sales within the
test group to the average price of sales within the corre-
sponding comparison group. Id. 1 How Commerce did that
comparison to calculate the Cohen’s d in this matter—
which appears to be representative of its general ap-
proach—is the subject of the dispute before us.
Commerce explained that it started with the following
formula from Cohen’s textbook to calculate d:
𝑚𝑚𝐴𝐴 − 𝑚𝑚𝐵𝐵
𝑑𝑑 =
σ
J.A. 1079 (quoting, with font changes, Jacob Cohen, Statis-
tical Power Analysis for the Behavioral Sciences 20 (2d ed.
1988) (Cohen)). 2 In that formula, 𝑚𝑚𝐴𝐴 is the mean of the test
group (here, the weighted average of the prices of sales in
1 “The Department calculates the Cohen’s d coeffi-
cient with respect to comparable merchandise if the test
and comparison groups of data each have at least two ob-
servations, and if the sales quantity for the comparison
group accounts for at least five percent of the total sales
quantity of the comparable merchandise.” Id.
2 It appears that Commerce may have used the “two-
tailed” version of the test to account for differences in either
direction (𝑚𝑚𝐴𝐴 > 𝑚𝑚𝐵𝐵 or 𝑚𝑚𝐴𝐴 < 𝑚𝑚𝐵𝐵 ), taking the absolute value
of the coefficient, which is not shown in the formula in the
text. See Stupp, 5 F.4th at 1346; Cohen at 20. That choice
is not in dispute here, and the issue before us is unaffected
by the presence or absence of absolute value signs in the
formula.
Case: 21-1747 Document: 48 Page: 7 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 7
the group), 𝑚𝑚𝐵𝐵 is the mean of the comparison group (here,
the weighted average of the prices of sales in that group),
and σ is “the standard deviation of either population [the
test group or the comparison group] (since they are as-
sumed equal).” Cohen at 20. Where, as here, the groups
consist of sales at known prices, 𝑚𝑚𝐴𝐴 − 𝑚𝑚𝐵𝐵 is in price units
(e.g., dollars per kilogram), and so is σ, so the ratio d is a
pure (unitless) number.
Commerce then changed the denominator to a figure,
also drawn from Cohen, designed to be applied when the
two groups, though of the same size, have different stand-
ard deviations. Specifically, for this new denominator σ′,
Commerce used the following formula:
σ2A + σ2B
σ′ = �
2
J.A. 1080 (quoting Cohen at 44). In this formula, σ2A and σ2B
are the squared standard deviations (variances) of the
prices in the test and comparison groups, respectively. The
simple average is used under the square-root sign (with no
weighting by the sizes of groups A and B), reflecting the
fact that, in the situation addressed in the section of Cohen
containing this formula, groups A and B are of the same
size: “nA = nB .” Cohen at 43. This formula involves “pool-
ing” the data from the two groups, and the name “pooled
standard deviation” is used for both the above formula and
also the variation where a weighted average is used instead
of a simple average. E.g., CIT 2021 Op., 495 F. Supp. 3d at
1300; see also CAFC 2019 Op., 940 F.3d at 673 (referring to
the expression as the “pooled variance” because σ2A and σ2B
are the variances of the prices in the two groups).
The disputed feature of the formula is that it does not
use the size of the groups to weight the two figures
(squared standard deviations, i.e., variances) being aver-
aged. It is undisputed that, when the groups are of the
same size, simple averaging equals weighted averaging.
Case: 21-1747 Document: 48 Page: 8 Filed: 04/21/2022
8 MID CONTINENT STEEL & WIRE v. US
But Commerce used the formula without group-size
weighting even when, unlike in the situation described in
the Cohen section from which the formula is borrowed, the
groups are of different sizes. In that circumstance, it is un-
disputed, simple averaging does not equal weighted aver-
aging. Commerce noted: “To be sure, the use of a simple
versus weight[ed] average yields very different results.”
J.A. 667.
The steps following the calculation of Cohen’s d in
Commerce’s analysis are not in dispute. Nor, we note, has
Commerce relied on those steps to help justify the simple-
averaging choice it has made for the denominator at the
first step. We briefly summarize the remaining steps.
Upon calculating d for a test group of sales, Commerce
described the test group as having “passed” the Cohen’s d
test if d for that group exceeded 0.8, i.e., if the difference in
means was at least 80% of the pooled standard deviation.
See Mid Continent Steel & Wire, Inc. v. United States, 219
F. Supp. 3d 1326, 1338–39 (Ct. Int’l Trade 2017) (CIT 2017
Op.). 3 Commerce then computed, for the sales of the sub-
ject merchandise of a given respondent, the ratio of (a) the
total value of those sales which were part of any group that
passed the Cohen’s d test (whether by a purchaser, region,
or period comparison) to (b) the total value of all the re-
spondent’s sales being studied by Commerce. Id. at 1343
n.24. Because that “ratio test” produced a ratio between 33
and 66 percent in this matter, Commerce tentatively de-
cided to use average-to-transaction comparisons in part.
See CAFC 2019 Op., 940 F.3d at 671–72.
3 A “pass” thus indicates that the test group’s prices
are sufficiently different from the comparison group’s
prices to contribute to a finding of targeted dumping. In
this way, the label means the opposite of the word’s usual
connotation of success in avoiding trouble.
Case: 21-1747 Document: 48 Page: 9 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 9
To make its final determination whether to use an av-
erage-to-transaction method, Commerce asked, pursuant
to § 1677f-1(d)(1)(B)(ii), whether the pricing differences
found “cannot be taken into account using” average-to-av-
erage or transaction-to-transaction comparisons. For that
determination, Commerce asked whether using a compari-
son other than average-to-transaction would make a
“meaningful difference” in the result. Commerce found
that there would be such a difference and so adopted the
average-to-transaction method. See CAFC 2019 Op., 940
F.3d at 672.
B
1
In response to a petition by Mid Continent Steel &
Wire, Inc., Commerce initiated an antidumping duty inves-
tigation of certain steel nails from Taiwan and certain
other countries. See CAFC 2019 Op., 940 F.3d at 665. The
investigation of nails from Taiwan—for the period April 1,
2013, to March 31, 2014—was broken out separately, and
Commerce selected PT Enterprises, Inc. and its affiliated
producer Pro-Team Coil Nail Enterprise Inc. as mandatory
respondents. In May 2015, Commerce issued an affirma-
tive final determination of less-than-fair-value sales in the
United States and determined that the appropriate
weighted-average dumping margin for those respondents
was 2.24%. Certain Steel Nails from Taiwan: Final Deter-
mination of Sales at Less Than Fair Value, 80 Fed. Reg.
28,959, 28,961 (Dep’t of Commerce May 20, 2015) (Final
Determination). Following the International Trade Com-
mission’s affirmative determination of material injury to a
domestic injury, Commerce issued an antidumping duty or-
der. In 2017, following an appeal to the Trade Court, Com-
merce revised the dumping margin for the respondents to
2.16%. The all-others rate was also set at 2.16%.
Those respondents and other Taiwanese producers and
exporters (collectively, PT) and Mid Continent brought
Case: 21-1747 Document: 48 Page: 10 Filed: 04/21/2022
10 MID CONTINENT STEEL & WIRE v. US
actions in the Trade Court to challenge Commerce’s deter-
mination. The Trade Court sustained Commerce’s applica-
tion of the Cohen’s d test in determining whether “there is
a pattern of export prices . . . for comparable merchandise
that differ significantly among purchasers, regions, or pe-
riods of time,” 19 U.S.C. § 1677f-1(d)(1)(B)(i), and in partic-
ular approved Commerce’s decision “to use a simple
average to calculate the pooled standard deviation in the
Cohen’s d test of the differential pricing analysis.” CIT
2017 Op., 219 F. Supp. 3d at 1330. In 2019, we mostly af-
firmed the Trade Court’s decision, but we vacated it in part,
holding that Commerce’s explanation of its use of “a simple
average, rather than a weighted average, to calculate the
pooled variance used in the Cohen’s d calculation” was in-
sufficient, requiring a remand to Commerce “for further ex-
planation.” CAFC 2019 Op., 940 F.3d at 673, 675.
Specifically, we noted that (1) “Commerce said that it
was simply using a widely accepted statistical test; yet it
did not acknowledge that the only cited literature source
for the relevant aspect of the test itself calls for the use of
weighted averages”; (2) Commerce’s statement that
weighted averaging produces “skewing” was a “mere con-
clusion” without independent explanation of what the stat-
ute calls for; (3) Commerce’s rebuttal of PT’s argument
against the simple average was unsupported and also was
not itself an affirmative argument for simple averaging;
and (4) Commerce’s “predictability” concern seemed tied to
the manipulability of reporting sales by number of trans-
actions and Commerce did not indicate why the concern
would be present if the average used weighting by quanti-
ties or weight of nails sold (nails seemingly being priced per
kilogram). Id. at 674 (cleaned up). We did not preclude
Commerce from making the same decision on remand if it
supplied adequate reasoning in support. Id. at 675.
Case: 21-1747 Document: 48 Page: 11 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 11
2
In December 2019, the Trade Court remanded the mat-
ter to Commerce in accordance with our decision. In early
March 2020, Commerce issued a draft redetermination de-
cision, again opting to use the simple average to calculate
the pooled standard deviation, J.A. 660–76, and attaching
portions of three statistics references: Cohen, J.A. 723–61;
Paul D. Ellis, The Essential Guide to Effect Sizes (2010) (El-
lis), J.A. 678–721; and Robert Coe, It’s the Effect Size Stu-
pid: What Effect Size Is and Why It Is Important, Paper
presented at the Annual Conf. of British Educational Re-
search Ass’n (Sept. 2002) (Coe), J.A. 763–73.
In response, PT submitted comments in mid-March
2020, J.A. 780–1004, arguing that “use of simple averaging
is both mathematically and statistically inaccurate,” J.A.
781. PT pointed to sections of Cohen (at 67), of Coe (at 6),
and of Ellis (at 10, 26, 27), all of which set forth formulas
that clearly use weighted averages when comparing groups
that have both different sizes and different standard devi-
ations (and hence variances). J.A. 790–98. 4 PT proposed a
modification, under which the variances of the two groups
(test group, comparison group) are weighted by the total
weight, in kilograms, of the goods in each group, so the de-
nominator would be
𝑊𝑊𝑎𝑎 𝑊𝑊𝑏𝑏
� σ2𝑎𝑎 + σ2
𝑊𝑊𝑎𝑎 + 𝑊𝑊𝑏𝑏 𝑊𝑊𝑎𝑎 + 𝑊𝑊𝑏𝑏 𝑏𝑏
J.A. 791–92. In that formula, 𝑊𝑊𝑎𝑎 and 𝑊𝑊𝑏𝑏 are the kilogram
weights of the test-group goods and comparison-group
goods, respectively (and σ2𝑎𝑎 and σ2𝑏𝑏 again refer to the vari-
ances of the sale prices in the test and comparison groups,
4 The Coe reference, at 6 (question 7), is the reference
discussed in our 2019 opinion. CAFC 2019 Op., 940 F.3d
at 673–74.
Case: 21-1747 Document: 48 Page: 12 Filed: 04/21/2022
12 MID CONTINENT STEEL & WIRE v. US
respectively). This formula differs in minor ways from the
specific formulas in Cohen, Coe, and Ellis, which involve
details of weighted averaging appropriate for sampling
when not all population data is known. Commerce did not
object to PT’s formula on the ground that it departed from
those models, but rather on the ground that it used
weighted averages rather than simple averages.
In May 2020, Mid Continent submitted comments ar-
guing for the simple-average approach. J.A. 1005–70. It
included in its comments a discussion of a portion of Cohen
to which Commerce, in its draft redetermination, had not
pointed. J.A. 1022–24 (citing Cohen at 360–61). Mid Con-
tinent pointed to a statement in Cohen—discussing an ex-
ample involving a researcher’s creating equal-size samples
of the groups under study, even though some of the groups
are a much smaller share of the overall population than the
others—about treating a group’s characteristic as an “ab-
stract effect quite apart from the relative frequency with
which that effect . . . occurs in the population.” Id.
In June 2020, Commerce published its final redetermi-
nation. J.A. 1073–1121. Commerce continued to use a sim-
ple average, and it “provid[ed] further explanation of [its]
methodology as requested.” J.A. 1073. Commerce ex-
plained that to determine whether there was a pattern of
export prices that “differ significantly” among purchasers,
regions, or periods, it used the widely accepted Cohen’s d
test to measure the “effect size” on price associated with
sales to certain purchasers, in certain regions, or during
certain periods of time, and it relied on Ellis, Cohen, and
Coe for elaboration. See J.A. 1077–80. It noted that the
denominator of the Cohen’s d coefficient was a “yardstick
to gauge the significance of the difference of the means,”
J.A. 1079, and it stated that the statistical literature pre-
sented different methods for computing the denominator,
“including the square root of the simple average of the var-
iances within each group,” J.A. 1080 (citing Cohen at 44).
Case: 21-1747 Document: 48 Page: 13 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 13
To justify its decision to use the simple average to cal-
culate the denominator, Commerce wrote:
[T]he purpose of Commerce’s Cohen’s d test is to
determine whether U.S. prices differ significantly
among purchasers, regions, or time periods – i.e.,
do prices to each purchaser, region, or time period
differ significantly from all other prices of the com-
parable merchandise. Although these are all prices
in the U.S. market made by the respondent, this
analysis requires that these prices be subdivided
into separate distinct groups to consider separately
whether the respondent’s pricing behavior for sales
to one specific group differs from its pricing behav-
ior for all other sales. In other words, these prices,
all of which are used to evaluate: 1) a respondent’s
pricing behavior in the U.S. market; and 2)
whether the respondent is dumping, are now con-
sidered to represent two distinct pricing behaviors
which may differ significantly. For the purpose of
this particular analysis, Commerce finds that these
two distinct pricing behaviors are separate and
equally rational, and each is manifested in the in-
dividual prices within each group. Therefore, each
warrants an equal weighting when determining
the “standard deviation” used to gauge the signifi-
cance of the difference in the means of the prices of
comparable merchandise between these two
groups. Because Commerce finds that each of
these pricing behaviors are equally genuine when
considering the distinct pricing behaviors between
a given purchaser, region, or time period and all
other sales, an equal weighting is justified when
calculating the “standard deviation” of the Cohen’s
d coefficient. To do otherwise and use an average
weighted by sales volume, sales value, or number
of transactions would give preference to one pricing
behavior over the other, and therefore would bias
Case: 21-1747 Document: 48 Page: 14 Filed: 04/21/2022
14 MID CONTINENT STEEL & WIRE v. US
the “yardstick” by which Commerce measures the
observed difference in prices between the test and
comparison groups.
J.A. 1080–81.
In responding to comments, Commerce referred to the
“abstract effect” idea invoked by Mid Continent. J.A. 1112,
1116–17. It also pointed to the difference between this con-
text, in which Commerce has the complete population data
pool (and each pairwise comparison involves the entire pop-
ulation), and the context of the cited literature involving
sampling from a population. J.A. 1109. Commerce further
said that PT’s challenge of the simple average relied on con-
clusory allegations of “skewed” results, J.A. 1081, incorrect
assumptions about the relationship between standard de-
viation and group size, J.A. 1083–84, and “cherry-picked”
data, J.A. 1084–85. It added that the simple average pro-
vides “predictability” because “the importance given to
each pricing behavior will be the same for all products,”
and it concluded that the use of a simple average was “not
only a reasonable approach but a more accurate and con-
sistent measurement.” J.A. 1087.
3
The matter returned to the Trade Court. PT submitted
comments that included extensive attachments containing
the sales information before Commerce and figures that,
according to PT, showed why weighted averaging is sub-
stantially better than simple averaging at capturing those
instances in which a test group’s prices are noticeably out-
side the dispersion of prices generally. J.A. 1122–1373.
The government responded, arguing, among other things,
that PT failed to exhaust administrative remedies as to
some of what PT presented. J.A. 1397–1428.
In January 2021, the Trade Court sustained Com-
merce’s determination. CIT 2021 Op., 495 F. Supp. 3d at
1300. It accepted Commerce’s explanation that a weighted
Case: 21-1747 Document: 48 Page: 15 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 15
average would “inappropriately move the pooled standard
deviation toward the pricing behavior of either the test or
comparison group,” id. at 1304, and agreed that an equal
weighting was justified because the prices in each test and
comparison group “separately and equally represent the re-
spondent’s pricing behavior,” id. at 1308 (quoting J.A.
1108). The Trade Court did not refer to the “abstract effect”
idea invoked by Mid Continent and Commerce. 5
PT timely appealed to this court. We have jurisdiction
under 28 U.S.C. § 1295(a)(5).
II
A
We review Commerce’s decisions using the same stand-
ard of review applied by the Trade Court, while carefully
considering the Trade Court’s analysis. CAFC 2019 Op.,
940 F.3d at 667. Commerce’s selection of a methodology for
implementing the statutory directive of § 1677f-1(d)(1)(B)
is “an interpretation of that statutory language” that we
review for reasonableness. Stupp, 5 F.4th at 1352–53; see
Ningbo Dafa Chem. Fiber Co. v. United States, 580 F.3d
1247, 1256 (Fed. Cir. 2009) (“It is well established that
statutory interpretations articulated by Commerce during
its antidumping proceedings are entitled to judicial defer-
ence under Chevron.” (cleaned up)).
5 The Trade Court reached its conclusion without
having to determine which if any submissions by PT were
objectionable under the exhaustion requirement, because
the court concluded that all of the submissions were, in any
event, answered by the just-noted rationale. Id. at 1306–
08. Our decision does not rely on the materials that were
the subject of the exhaustion dispute, which we therefore
need not address.
Case: 21-1747 Document: 48 Page: 16 Filed: 04/21/2022
16 MID CONTINENT STEEL & WIRE v. US
“Commerce has discretion to make reasonable choices
within statutory constraints.” CAFC 2019 Op., 940 F.3d at
667; see also Stupp, 5 F.4th at 1353, 1354. Commerce’s
“special expertise in administering antidumping duty law”
is one recognized basis for the “significant deference” em-
bodied in the reasonableness standard. Ningbo Dafa, 580
F.3d at 1256; see also Wheatland Tube Co. v. United States,
495 F.3d 1355, 1361 (Fed. Cir. 2007). Expertise enables an
agency to identify a reasonable interpretation and to set
forth an adequate justification for choosing it over others,
but it remains a judicial obligation to ensure that the
agency has done so, while avoiding judicial usurpation of
agency authority to make pertinent factual and policy de-
terminations. See Burlington Truck Lines, Inc. v. United
States, 371 U.S. 156, 167–69 (1962); CS Wind Vietnam Co.
v. United States, 832 F.3d 1367, 1377 (Fed. Cir. 2016). For
us to fulfill that obligation, we must ensure that Commerce
provides “an explanation that is adequate to enable the
court to determine whether the choices are in fact reason-
able, including as to calculation methodologies.” CAFC
2019 Op., 940 F.3d at 667; Stupp, 5 F.4th at 1357.
Last year, in Stupp, we held that Commerce had pro-
vided an inadequate explanation of the reasonableness of
its use of Cohen’s d in its differential-pricing analysis in
circumstances where that use seemingly departed from
what the statistical literature taught. Stupp, 5 F.4th at
1357–60. What was unjustified there was Commerce’s use
of Cohen’s d “in adjudications in which the data groups be-
ing compared are small, are not normally distributed, and
have disparate variances.” Id. at 1357. We remanded for
further consideration.
On the record presented to us here, we do the same,
focusing on a different feature of Commerce’s use of Co-
hen’s d. We hold that Commerce has not adequately justi-
fied its adoption of simple averaging for the Cohen’s d
denominator. Commerce has departed from the methodol-
ogy described in all the cited statistical literature
Case: 21-1747 Document: 48 Page: 17 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 17
governing Cohen’s d, but it has not justified that departure
as reasonable. We again remand for further consideration.
B
1
Commerce recognized that the function of the denomi-
nator in the Cohen’s d coefficient is to be a “yardstick to
gauge the significance of the difference of the means” of the
sales prices of the test and comparison groups. J.A. 1079.
The numerator of Cohen’s d is the difference in weighted
average sales prices between the test and comparison
groups. Without further context, i.e., without a basis for
comparison, it is impossible to say whether that difference
is “significant,” under 19 U.S.C. § 1677f-1(d)(1)(B)(i) or oth-
erwise. The central purpose of using the Cohen’s d ratio is
to provide the missing basis of comparison—the “yard-
stick.” Cohen’s d relates, by division, the difference in
mean prices of the two particular groups to a figure repre-
senting the magnitude of differences in (dispersion of) the
prices in the data pool more generally. See CAFC 2019 Op.,
940 F.3d at 671. If the mean-price difference is large
enough compared to the more general dispersion measure
(i.e., the ratio of the two is at least 0.8), “Commerce deems
the sales prices in the test group to be significantly differ-
ent from the sales prices in the comparison group.” Stupp,
5 F.4th at 1347; see Differential Pricing RFC at 26,722
(“The Department finds that the difference is significant,
and that the sales of the test group pass the Cohen’s d test,
if the calculated Cohen’s d coefficient is equal to or exceeds
the large threshold.”).
The cited literature makes clear that one way to form
the more general data-pool dispersion figure for the denom-
inator—seemingly the preferred way if the full set of popu-
lation data is available—is to use the standard deviation
for the entire population. But the references recognize that
entire population data may be unavailable, in which case
an alternative is needed, and the alternative is chosen with
Case: 21-1747 Document: 48 Page: 18 Filed: 04/21/2022
18 MID CONTINENT STEEL & WIRE v. US
the object of estimating (approximating) the unavailable
population standard deviation. Thus, Ellis states:
To calculate the difference between two groups we
subtract the mean of one group from the other (M1
– M2) and divide the result by the standard devia-
tion (SD) of the population from which the groups
were sampled. The only tricky part in this calcula-
tion is figuring out the population standard devia-
tion. If this number is unknown, some
approximate value must be used instead.
Ellis at 10 (emphasis added). Coe presents the formula for
measuring effect size as
[𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑜𝑜𝑜𝑜 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔] − [𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔]
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
and then states:
The “standard deviation” is a measure of the
spread of a set of values. Here it refers to the
standard deviation of the population from which
the different treatment groups were taken. In prac-
tice, however, this is almost never known, so it
must be estimated either from the standard devia-
tion of the control group, or from a “pooled” value
from both groups . . . .
Coe at 2 (emphasis added). And Cohen similarly indicates
that the ideal denominator is the full population’s standard
deviation, which may be approximated by a pooled esti-
mate. See Cohen at 27 (dividing by “the common within-
population standard deviation”); Cohen at 67 (noting that
the denominator is “the usual pooled within sample esti-
mate of the population standard deviation”—indicating
that the pooling method, based on the standard deviations
of each of the two groups, aims to estimate the standard
deviation of the overall population). When the full popula-
tion data set is unavailable, all of the cited literature points
to use of a “pooled standard deviation” of the two particular
Case: 21-1747 Document: 48 Page: 19 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 19
groups at issue to form the denominator. Cohen at 67; Ellis
at 10, 26–27; Coe at 6.
In this matter, Commerce did not use the standard de-
viation of all the data for its denominator. It made that
choice even while recognizing that it had the full set of data
for U.S. sales for the period Commerce was reviewing. J.A.
1109 (“Commerce’s analysis is based on all of the U.S. sales
data for the respondent . . . . Commerce does not sample
the respondent’s U.S. sales data used in the Cohen’s d test,
and the calculated means and variances of the U.S. prices
are the actual values of the entire population of U.S. sales
and are not estimates of those values.”). Indeed, in each
test-group/comparison-group pair, the test and comparison
groups together make up “the entire universe, i.e., popula-
tion, of the available data,” J.A. 1115, because for each test
group, the comparison group is all other sales data.
Rather than use the population standard deviation in
the denominator, Commerce used a “pooled standard devi-
ation,” pooling the standard deviations for each pair of test
and comparison groups. As discussed above, it used simple
averaging to do the pooling—even where the test and com-
parison groups have different sizes. In making that choice
to use simple averaging, however, Commerce departed
from, rather than followed, the cited statistical literature.
As we have described above, Commerce’s formula for the
denominator,
𝜎𝜎A2 + 𝜎𝜎B2
�
2
comes from a section of Cohen that addresses a situation in
which the two groups at issue are of the same size. Cohen
at 43–44; id. at 43 (“CASE 2: σA ≠ σB , nA = nB ”). By contrast,
when the sampled groups have unequal sizes, the cited lit-
erature uniformly teaches use of a pooled standard devia-
tion estimate that involves weighted averaging. See Cohen
at 67; Ellis at 26–27; Coe at 6.
Case: 21-1747 Document: 48 Page: 20 Filed: 04/21/2022
20 MID CONTINENT STEEL & WIRE v. US
The section of Cohen (at 359–61) cited by Mid Conti-
nent and Commerce for its “abstract effect” language is no
exception. It nowhere recites use of a simple average for
calculating a pooled standard deviation from groups of un-
equal size. The discussion in that section involves f, an ef-
fect size index that is related to, but not the same as, the
Cohen’s d coefficient, applicable when there are arbitrarily
many groups to compare, rather than just two. See Cohen
at 274–80. It expressly sets forth a simple average formula
for when the groups are equal in size but a weighted aver-
age formula for when the groups are of different size. Id.
at 359–60. The language of “abstract effect” is used in a
discussion of forming certain equal-size groups for the com-
parative analysis: in the example given, if the object was to
identify differences in viewpoint on a topic (attitudes to-
ward the United Nations) among three groups (Jews,
Protestants, Catholics), the researcher could form equal
groups even though random sampling from a population
would produce different-size groups. Id. at 360–61. Noth-
ing in the section applies simple averaging to pooled stand-
ard deviation estimates for different-size groups.
2
Commerce offered one principal reason for departing
from the teaching of all the cited statistical literature. It
said that the data in each group (the test and comparison
groups) represent “equally rational” and “equally genuine”
pricing choices and that, therefore, each group “warrants
an equal weighting” for calculating the pooled standard de-
viation. J.A. 1080–81. We see no basis for questioning,
here or generally, the premise of equal rationality of the
pricing behavior (and equal genuineness, if that is differ-
ent, which is not clear). But Commerce has not offered an
adequate explanation of why that premise supports the
particular step Commerce must justify: a choice of how to
form the denominator in the Cohen’s d formula.
Case: 21-1747 Document: 48 Page: 21 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 21
The fact that the seller is acting rationally and genu-
inely in its pricing choices in both the test and comparison
groups provides no apparent reason for assigning equal
weight to each group’s standard deviation when computing
the pooled standard deviation. The rationality and genu-
ineness of the seller’s pricing choices have no evident con-
nection to the undisputed purpose of the denominator
figure—to provide a dispersion figure for the more general
pool that serves as a yardstick for deciding on the signifi-
cance of the difference in mean prices of the two groups.
Both the numerator and denominator take the behavior as
a given and form certain statistical measures from the ob-
jective data that are then related in the ratio that is Co-
hen’s d. Commerce has not identified anything in the
statistical measure at issue that depends on considerations
of rationality and genuineness of the conduct that gave rise
to the objective data. Indeed, Commerce has not shown
that the numerous real-world examples used in Cohen to
illustrate the methods taught are different in the respect
Commerce now features, i.e., Commerce has not shown
that the Cohen examples (generally or, perhaps, ever) in-
volve sampled groups of data that reflect behavior that is
not “rational” and “genuine.” Thus, Commerce has not ad-
equately justified, through its central rationale, its depar-
ture from the statistical literature’s description of the
Cohen’s d coefficient.
Commerce also asserted that a simple average provides
“predictab[ility]” in that “the importance given to each pric-
ing behavior will be the same for all products.” J.A. 1087.
But Commerce did not suggest that this basis would suffice
for its denominator choice without the principal basis we
have just discussed and found inadequate. And in any
event, Commerce has not provided a reasonable explana-
tion for this predictability assertion. It is not clear from
Commerce’s language, including its “importance given to
each pricing behavior” language, what meaning Commerce
was ascribing to “predictability” independent of its equality
Case: 21-1747 Document: 48 Page: 22 Filed: 04/21/2022
22 MID CONTINENT STEEL & WIRE v. US
(of rationality and genuineness) basis. If Commerce was
referring, as “predictability” would suggest, to the ability
to predict the consequences for the dumping analysis based
on the ability to predict the weighting of a sale (the “im-
portance” component of the analysis), Commerce did not
explain why simple averaging has greater predictability
than weighted averaging (let alone than using the full pop-
ulation’s standard deviation for every d calculation). The
mathematical formulas have no identified elements of dis-
cretion, or other components, that distinguish them with
respect to prediction. Specifically, Commerce provided no
basis for an assertion of lesser “predictability” if weighted
averaging is done on the basis of weight (or dollars or
units), not transactions, as we discussed in our 2019 opin-
ion. See CAFC 2019 Op. at 674. Not having provided an
adequate explanation of “predictability,” Commerce also
did not provide an adequate explanation of what signifi-
cance this consideration should have in the overall choice
of denominator for Cohen’s d.
In its final redetermination, Commerce invoked the
“abstract effect” idea mentioned in the section of Cohen dis-
cussed above. J.A. 1112, 1116–17. As we have noted, that
section does not call for simple averaging for unequal size
groups in the denominator of Cohen’s d or in the formula
for the related f figure. And Commerce has not explained
how such simple averaging could be derived from the “ab-
stract effect” idea itself. We do not understand Commerce,
in invoking this idea, to be saying anything other than that
the statutory “differ significantly” analysis focuses on the
difference between the test and comparison groups for its
own sake, rather than for what it indicates about the over-
all population. One difficulty with this observation is that
Commerce has not explained how it affects comparisons,
such as those Commerce makes in its differential-pricing
analysis, where the groups together make up the entire
population (which was not the case in the section of Cohen
at issue). More broadly and fundamentally, Commerce has
Case: 21-1747 Document: 48 Page: 23 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 23
not explained why the fact that the focus is being placed on
the difference between the groups distinguishes the teach-
ing of the cited literature—which, as discussed, uses the
Cohen’s d coefficient precisely to provide a yardstick for de-
termining the significance of the difference in group means.
Thus, Commerce has not explained why that focus calls for
a simple-averaging yardstick figure for determining the
significance of the difference when calculating Cohen’s d
(or, even, the f statistical measure) for different-size
groups.
Commerce observes that the cited literature discusses
“sampling” from a population, whereas Commerce has the
entire population data and each of its test-comparison
group pairs involves the entire population. J.A. 1109. In
Stupp, we stated that Commerce had not explained how
this difference bears on the reasonableness of Commerce’s
use of Cohen’s d in certain respects not at issue in the pre-
sent matter. 5 F.4th at 1360. Here, too, although it is un-
disputed that sampling for estimation of an unknown
overall population figure requires certain minor alterations
of the formula for weighted averaging not needed in the
present context, compare, e.g., Cohen at 67, with J.A. 792
(PT proposal), Commerce has not explained why the basic
choice of weighted averaging of unequal-size groups fails to
apply to the present context. The cited literature nowhere
suggests simple averaging for unequal-size groups. In-
deed, when the entire population is known, the cited liter-
ature points toward using the standard deviation of the
entire population as the denominator in Cohen’s d—which
Commerce has not done.
3
Commerce’s job is not to follow a statistical test as ex-
plained in published literature for its own sake, but to im-
plement the statutory mandate to determine when prices
of certain groups “differ significantly.” 19 U.S.C. § 1677f-
1(d)(1)(B)(i). In implementing a statutory mandate, an
Case: 21-1747 Document: 48 Page: 24 Filed: 04/21/2022
24 MID CONTINENT STEEL & WIRE v. US
agency is not duty-bound to follow published literature
when, e.g., the literature is inapplicable to the specific
problem before the agency or is not itself well grounded.
But here Commerce embraced the Cohen’s d statistics
measure and relied on the literature for that measure in
making its statutory significance assessment—and that
embrace extends beyond the first step and is the founda-
tion of the remaining steps. After the calculation of Co-
hen’s d, the next step in Commerce’s analysis is to declare
what number is high enough to be significant (constituting
“passing” the Cohen’s d test), and the number it uses is 0.8,
the threshold for a “large” effect size stated in Cohen. See
Cohen at 26; J.A. 1080; Differential Pricing RFC at 26,722;
Stupp, 5 F.4th at 1347. The “passing” sales then determine
the results of the next “ratio test” step.
In this situation, Commerce needs a reasonable justifi-
cation for departing from what the acknowledged literature
teaches about Cohen’s d. It has departed from those teach-
ings about how to calculate the denominator of Cohen’s d,
specifically in deciding to use simple averaging when the
groups differ in size. And its explanations for doing so fail
to meet the reasonableness threshold (a deferential one, in
recognition of expertise) for the reasons we have set forth.
We must remand for further proceedings before Com-
merce in light of the identified deficiencies—as we did in
this matter in 2019 regarding the simple-averaging choice
and as we did in Stupp regarding other aspects of Com-
merce’s use of Cohen’s d. Commerce must either provide
an adequate explanation for its choice of simple averaging
or make a different choice, such as use of weighted averag-
ing or use of the standard deviation for the entire popula-
tion. 6
6 Mid Continent argues that, if weighted averaging
is to be done, the weighting should be based on the number
Case: 21-1747 Document: 48 Page: 25 Filed: 04/21/2022
MID CONTINENT STEEL & WIRE v. US 25
III
For the foregoing reasons, we vacate the decision of the
Trade Court and remand for further proceedings consistent
with this opinion.
No costs.
VACATED AND REMANDED
of transactions, rather than on a measure of how much is
sold (e.g., number of nails, weight of nails, dollars paid).
Mid Continent Br. 28–29. But Commerce rejected
weighted averaging altogether, so we do not have before us
for review a choice of one basis of weighting rather than
another. We make two observations relevant to Com-
merce’s consideration of that choice if it adopts weighted
averaging on remand. First, when it uses the average-to-
average method, Commerce computes average prices by
quantity sold, not by transaction. See J.A. 1111. Second,
in our earlier opinion, we recognized that Commerce had
criticized weighting by the number of transactions as sus-
ceptible to manipulation, and we noted that weighting by
quantity appears to address that issue. CAFC 2019 Op.,
940 F.3d at 674.