|
|
|
The SAT Math Equivalent (SATME)
Education in States and Nations: 1991 (ESN) Indicator 25: Note on mathematics proficiency
Notes on Figure and TablesEngland, Scotland
School
or student response rate is below the 85 percent standard employed by INES.Israel
Hebrew-speaking schools.Italy, Spain
Ninety
percent or less of the international target population was sampled.Portugal, Switzerland
School
or student response rate is below the 85 percent standard employed by INES.
Ninety percent or less of the international target population was sampled.Soviet Union
Fourteen
of fifteen republics. Russian-speaking schools only.Spain
All
regions except Catalu�a. Spanish-speaking schools only.Switzerland
Fifteen
of twenty-six cantons included.United States
The U.S.
sample for the International Assessment of Educational Progress (IAEP) consisted
of both public and private schools. Only 13-year-olds were included. The state
samples for the National Assessment of Educational Progress (NAEP), on the other
hand, consisted of 8th grade classrooms only in public schools. On average,
students in the state samples were likely to be older than those in the U.S.
sample in the IAEP.
Technical Notes
Description of levels of mathematics proficiencyLevel 300: Moderately Complex Procedures and Reasoning
Students
at this level are developing an understanding of number systems. They can
compute with decimals, simple fractions, and commonly encountered percents. They
can identify geometric figures, measure lengths and angles, and calculate areas
of rectangles. These students are also able to interpret simple inequalities,
evaluate formulas, and solve simple linear equations. They can find averages,
make decisions on information drawn from graphs, and use logical reasoning to
solve problems. They are developing the skills to operate with signed numbers,
exponents, and square roots.Level 250: Numerical Operations and Beginning Problem Solving
Students
at this level have an initial understanding of the four basic operations. They
are able to apply whole number addition and subtraction skills to one-step word
problems and money situations. In multiplication, they can find the product of a
two-digit and a one-digit number. They can also compare information from graphs
and charts, and are developing an ability to analyze simple logical relations.Level 200: Beginning Skills and Understandings
Students
at this level have considerable understanding of two-digit numbers. They can add
two-digit numbers, but are still developing an ability to regroup in
subtraction. They know some basic multiplication and division facts, recognize
relations among coins, can read information from charts and graphs, and use
simple measurement instruments. They are developing some reasoning skills.Level 150: Simple Arithmetic Facts
Students
at this level know some basic addition and subtraction facts, and most can add
two-digit numbers without regrouping. They recognize simple situations in which
addition and subtraction apply. They also are developing rudimentary
classification skills.Issues in Linking Different Tests
Indicator 25 uses data drawn from two sources. The data for the countries
included in Figure 25 and Table 25a were obtained from the 1991 International
Assessment of Educational Progress (IAEP), which tested 13-year-olds in public
and private schools in participating countries. The data for the states included
in Figure 25 and Table 25b were obtained from the 1992 National Assessment of
Educational Progress (NAEP) Trial State Assessment, which tested eighth graders
in public schools. In order to compare the mathematics achievement of the
countries, which were tested as part of the IAEP, and the states, which were
tested as part of the NAEP, it is necessary to link scores on the two tests.
Several approaches to test linking are available, and the appropriate linking
strategy depends on characteristics of the tests involved. Mislevy (1992)
describes four main strategies: equating, calibration, projection, and
moderation.
choice
of an appropriate strategy to use in linking the IAEP and the NAEP depends on
the degree to which the two tests measure the same constructs in the same ways.
Overall, the IAEP and NAEP have a number of similarities and differences. The
IAEP curriculum framework was adapted from the framework used for the NAEP, and
the two tests contain similar (but not identical) items and were administered
using similar procedures. In addition, both tests have been scaled using item
response theory (IRT) methods. (4)
At
the same time, the two tests also differ in a number of ways, most notably in
that the IAEP was explicitly designed to be administered in countries that
differ in language, curriculum and instructional practice, while the NAEP was
not. In addition, the tests differ in length. In the IAEP mathematics
assessment, one common form of the test was administered to all 13-year-olds.
The form included 76 items and students were given 60 minutes to complete the
assessment (not including time for background questions). In the NAEP
mathematics assessment, 26 different test booklets were prepared, each
containing a somewhat different number of items, and each sampled student
completed one booklet. A typical NAEP booklet included about 60 items, and
students were given 45 minutes to complete the assessment (not including time
for background questions). Because the IAEP was somewhat longer than the NAEP,
the IAEP may provide somewhat more reliable individual-level scores. Given
the similarities and differences among the tests, it would be plausible to
consider linking the tests through a process of calibration, projection, or
moderation. Because the IAEP and NAEP tests differ in the detailed curriculum
frameworks employed as well as in reliability, we chose a form of projection to
predict NAEP scores from IAEP scores.
The
projected
NAEP scores reported for Indicator 25
are based on analyses conducted by Pashley and Phillips (1993) and Pashley,
Lewis, and Yan (1994). In developing their estimates, Pashley and Phillips
relied on data collected in a "linking study," in which both the IAEP and NAEP
instruments were administered to a sample of 1,609 U. S. students who were in
eighth grade or thirteen years old in the spring of 1992. Pashley and Phillips
used the linking study data to estimate a linear regression model predicting a
student's NAEP score on the basis of his or her IAEP score. (5)
(See Table S21, row A, for the estimated coefficients.)
(6)
They then used the regression equation to develop predicted NAEP scores for the
students in the IAEP sample in each participating country.
(7)
Using the predicted scores, Pashley and Phillips obtained various statistics,
including the means and percentile scores for the nations presented in Indicator
25. (Table S22, column A, provides the projected NAEP-scale means Pashley and
Phillips obtained for each IAEP country.)
Table S21 Sensitivity of parameters used to link mean IAEP scores for countries
to the NAEP scale to data source and method
--------------------------------------------------------------------------------------------
Additional NAEP
Projected NAEP score points per IAEP
Samples used Method at (IAEP = 500) point above 500
--------------------------------------------------------------------------------------------
A (IAEP cross-linking sample) Projection 265 0.44
B (IAEP cross-linking sample) Moderation 263 0.53
C (IAEP and 1990 NAEP Trial State Moderation 264 0.69
Assessment in public schools)
D (IAEP and 1992 NAEP Trial State Moderation 270 0.72
Assessment in public schools)
--------------------------------------------------------------------------------------------
and
SOURCE: The IAEP scale range is from 0 to 1000; the NAEP scale range is from 0
to 500. Parameters in this table were calculated using information on the means
and standard deviation of scores in each sample and, for line A, the correlation
of the scores in the cross-linking sample. Pashley and Phillips (1993) used the
sample and method of line A. Beaton and Gonzales (1993) used the samples and
method of line C.
Table S22 Alternative projections of country mean IAEP scores onto the NAEP
scale, by country
----------------------------------------------------------------------
Samples and Method |Difference in projections
----------------------------------------------------------------------
Country A B C D (B - A) (C - A) (D - A)
----------------------------------------------------------------------
Taiwan 285 287 297 303 2 12 6
Korea 283 286 294 301 3 11 7
Switzerland1 270 281 288 294 2 9 6
Soviet Union2 279 281 288 294 2 9 7
Hungary 277 279 285 291 2 8 6
France 273 274 278 284 1 5 6
Emilia Romagna, Italy3 272 272 276 283 0 4 6
Israel4 272 272 277 283 0 5 6
Canada5 270 270 274 280 0 4 6
Scotland 269 270 272 279 1 3 6
Ireland 269 268 271 277 -1 2 6
Slovenia 266 265 267 273 -1 1 6
Spain6 263 261 262 267 -2 -1 5
United States7 262 260 262 266 -2 0 4
Jordan 246 241 236 240 -5 -10 4
----------------------------------------------------------------------
out
of 26 cantons. 2
Fourteen out of 15 republics; Russian-speaking schools only. 3
Combined school and student participation rate is below .80 but at least .70.
Interpret with caution due to possible nonresponse bias. 4
Hebrew-speaking schools only. 5
Nine out of 10 provinces. 6 All
regions except Catalu�a; Spanish-speaking schools only. 7
Eighth-graders took the test and not all were 13 years old. Samples and Method
A.
Cross-linking sample and projection methodB.
Cross-linking sample and moderation method C.
IAEP and NAEP 1990 public school samples and moderation method D.
IAEP and NAEP 1992 public school samples and moderation method Difference in projections
(B - A)
Moderation versus projection in same (cross-linking) sample(C -
A) Moderation and 1990 NAEP/IAEP samples versus projection and cross-linking
sample (D -
A) 1992 NAEP/IAEP versus 1990 NAEP/IAEP both using moderation method NOTE
and SOURCE: Countries are sorted from high to low based on their mean scores
using sample and method A -- Cross-linking sample and projection method. Columns
B and D are from Pashley, Lewis, and Yan (1994) and Beaton and Gonzales (1993),
respectively. Both used student weighted data. Columns A and C are based in part
on tabulations produced by the IAEP Processing Centre in June 1992. It appears
that these tabulations did not use student weights. For most countries, the use
of weights made little difference for estimated country mean IAEP scores.
Switzerland is an exception, due to a complex sample design used there.
Therefore, an unpublished weighted mean IAEP score of 532.36 was used instead of
the published unweighted mean of 538.75 for Switzerland.
The most
widely
discussed alternative to the projection method used by Pashley and Phillips is a
moderation method carried out by Beaton and Gonzalez (1993). Beaton and Gonzalez
based their analysis on the 1991 IAEP United States sample and the 1990 NAEP
eighth grade winter public school sample. They translated IAEP scores into NAEP
scores by aligning the means and standard deviations for the two tests. (8)
Using the techniques of linear equating, they estimated conversion constants to
transform the U.S. IAEP scores into a distribution having the same mean and
standard deviation as the 1990 NAEP scores. (The conversion constants are shown
in Table S21, row C.) They then used these conversion constants to transform the
IAEP scores for the students in the IAEP samples in each participating country
into equivalent NAEP scores. (The moderated country NAEP-scale means produced by
Beaton and Gonzalez are shown in Table S22, column C. Full state and nation
results for Indicator 25 using the Beaton and Gonzalez method are displayed in
Table S23.) The
projection method used to develop Indicator 25 and the moderation method used by
Beaton and Gonzalez produce somewhat different results, especially for countries
with high average IAEP scores. (See Table S22.) For
example, Korea is estimated to have a 1992 NAEP score of 283 using the
projection method employed in Indicator 25 (see column A), while it has an
estimated 1990 NAEP score of 294 using the Beaton and Gonzalez method (see
column C).
The
observed
differences in transformed scores can be attributed in
part to differences in the data sets on which Pashley and Phillips and Beaton
and Gonzalez rely in developing their estimates. The students in the "linking
study" sample used by Pashley and Phillips included both 13-year-olds and eighth
graders in public and private schools. Beaton and Gonzalez used two samples to
develop their estimates: the regular 1991 U.S. IAEP sample, and the regular
winter eighth-grade 1990 NAEP administration. The 1991 United States IAEP sample
on which they relied included 13-year-olds (but not other eighth graders) in
public and private schools, while the 1990 NAEP sample included eighth graders
(but not other 13-year-olds) in public schools only. (
9) Perhaps
as a result of these differences, the estimation samples have somewhat different
distributions. Both estimation methods are particularly sensitive to the ratio
of the standard deviations for the NAEP and IAEP. (10)
In the linking sample used to develop the projection estimates, the ratio of the
NAEP and IAEP standard deviations was about 0.53, while, for the samples used by
Beaton and Gonzalez, the ratio of standard deviations was about 0.69. This
difference in standard deviations generates predicted NAEP scores based on the
projection method that are less distant from the mean than are the equivalent
scores based on the Beaton and Gonzalez method.
To
examine
the sensitivity of the results to the samples used, we
applied the Beaton and Gonzalez method to the data in the "linking sample" used
by Pashley and Phillips. (11)
The conversion coefficient estimates are shown in Table S21, row B, and the
estimated country NAEP means are shown in Table S22, column B. (12)
The estimated country means are much closer to the projection results obtained
by Pashley and Phillips (column A) than are the Beaton and Gonzalez results
obtained using the regular IAEP and 1990 winter public eighth grade samples. For
example, the difference in the projection and moderation estimates for Korea
drops from 11 to 3 points. To
explore this issue further, we applied the moderation method using one
additional NAEP data set: the 1992 public eighth grade sample. (This sample
corresponds to the sample used in the 1992 Trial State Assessment on which the
state results in Indicator 25 are based.) The conversion coefficients are
displayed in Table S21 (row D); and the moderated NAEP-scale country means are
displayed in Table S22 (column D). This sample produces country results more
extreme than do any of the other samples we tried. These
experiments clearly indicate that different samples produce different results.
But the experiments do not indicate which sample is "best". One advantage of the
linking sample used by Pashley and Phillips is that the same students took both
the IAEP and the NAEP. Hence, the estimated conversion coefficients are not
biased by possible differences between the IAEP and NAEP samples. But the fact
that the IAEP standard deviation in the linking sample is substantially higher
than the standard deviation in the regular U.S. administration of the IAEP,
while the NAEP standard deviation in the linking sample is similar to the
regular NAEP standard deviation, may at least in part counterbalance the other
apparent advantages of the linking sample. In
addition to the effects of the sample on coefficient estimates, several
conceptual issues should be considered in evaluating linking methods. We briefly
review three of these issues below: the age or grade-level interpretation placed
predicted test scores; the effects on coefficient estimates of unreliability in
the measures; and potential country-level contextual effects.
First,
different
linking approaches may produce results that differ in the age or grade-level for
which the predicted scores are intended to apply. For example, since the data
used by Pashley and Phillips to derive their coefficient estimates involved a
sample of students who completed both the IAEP and the NAEP, the predicted NAEP
scores based on their coefficients should be viewed as the NAEP scores that
would be obtained by students of the same age or grade as the students whose
IAEP scores are used as predictors. Since the regular country administration of
the IAEP involved sampling 13-year-olds, the predicted NAEP scores using the
Pashley and Phillips method should be viewed as predicted NAEP scores for
13-year-old students. The predicted NAEP scores obtained by Beaton and Gonzalez,
on the other hand, should be interpreted as the scores 13-year-olds who took the
IAEP would receive if they completed the NAEP in eighth grade. (13)
Since average NAEP scores for eighth-graders are generally somewhat higher than
average scores for 13-year-olds, the approach to sample specification used by
Beaton and Gonzales is likely to produce somewhat higher scores than the
approach used by Pashley and Phillips.
Linking
methods
may also differ in their sensitivity to unreliability in
the predictor variable (in this case, the IAEP). In general, regression
estimates of the effects of variables measured with error will be biased toward
zero. Hence, projection coefficients estimated using unreliable measures are
likely to be attenuated. (14)
The effects of unreliability on conversion coefficients obtained using
moderation methods are more difficult to determine. In the special case in which
the predictor and outcome variables are measured with the same reliability, the
moderation coefficients should be roughly unbiased. (15)
Finally,
linking methods
that are based on data from a single country may not properly reflect
country-level contextual effects. Suppose, for example, that individual NAEP and
IAEP scores were obtained for a sample of students in each of n countries. (16)
Both the projection and moderation methods rest on an assumption that the
relationship between IAEP and NAEP scores (pooling students across countries)
can be expressed as a simple linear model of the form:
estimated NAEP score = constant + slope * IAEP score
is
possible, however, that country-context effects exist. One simple specification
might involve the addition of country dummies to the simple linear model above.
If the country dummies differ significantly from zero, the within-country
regression of NAEP scores on IAEP scores will not properly produce
between-country relationships. Contextual effects of this sort might arise, for
example, if the standardized test style used in the IAEP and NAEP is quite
common in some countries, but rarely used in others. Unfortunately, without
linked IAEP and NAEP data for a sample of countries, the possibility of
contextual effects cannot be ruled out. This
brief discussion clearly indicates that different methods of linking the IAEP
and NAEP can produce different results, and further study is necessary to
determine which method is best. For this reason, Indicator 25 is labeled
"experimental." For
more information on cross-linking and on the specific approaches used in
developing Indicator 25, see Peter J. Pashley and Gary W. Phillips, Toward
World-Class Standards: A Research Study Linking International and National
Assessments (Princeton, NJ: Educational Testing Service, June, 1993); Peter J.
Pashley, Charles Lewis and Duanli Yan, "Statistical Linking Procedures for
Deriving Point Estimates and Associated Standard Errors," paper presented at the
National Council on Measurement in Education (Princeton, NJ: Educational Testing
Service, April, 1994); Albert E. Beaton and Eugenio J. Gonzalez, "Comparing the
NAEP Trial State Assessment Results with the IAEP International Results,"
Setting
Performance Standards for Student Achievement: Background Studies
(Stanford, CA: National Academy of Education, 1993); Robert J. Mislevy, Albert
E. Beaton, Bruce Kaplan, and Kathleen M. Sheehan, "Estimating Population
Characteristics from Sparse Matrix Samples of Item Responses,"
Journal of Educational
Measurement, Summer, 1992, vol 29, no 2, pp 133-161; and
Robert J. Mislevy,
Linking Educational
Assessments: Concepts, Issues, Methods, and Prospects
(Princeton, NJ: Educational Testing Service, December, 1992). Table S23 Mathematics proficiency scores for 13-year-olds in countries and
public school 8th-grade students in states, calculated using the equi-percentile
linking method, according to Beaton and Gonzales, by country (1991) and state
(1990)
-----------------------------------------------------------------------------------------
| Percent of population
| in each proficiency score range
-----------------------------------------------------------------------------------------
COUNTRY/State Mean SE | <200 200-250 250-300 300-350>350
-----------------------------------------------------------------------------------------
TAIWAN 296.7 1.5 3.2 13.4 33.9 36.6 12.9
KOREA 294.1 1.3 1.9 10.3 41.8 39.3 6.7
SOVIET UNION 287.6 1.5 0.8 10.4 53.1 34.0 1.7
SWITZERLAND 287.5 1.9 0.2 8.8 57.9 32.2 0.9
HUNGARY 284.8 1.4 1.4 13.5 52.6 29.9 2.7
North Dakota 281.1 1.2 0.8 13.2 60.0 24.8 1.3
Montana 280.5 0.9 0.5 14.3 59.5 24.9 0.8
FRANCE 278.1 1.3 1.4 16.8 57.5 23.4 1.0
Iowa 278.0 1.1 0.6 18.3 57.0 23.3 0.7
ISRAEL 276.8 1.3 1.5 15.6 61.6 20.7 0.6
ITALY 276.3 1.4 1.6 18.1 57.7 22.0 0.5
Nebraska 275.7 1.0 2.0 18.6 56.2 22.4 0.9
Minnesota 275.4 0.9 1.6 19.2 57.0 21.2 1.1
Wisconsin 274.5 1.3 1.5 20.8 55.4 21.6 0.7
CANADA 274.0 1.0 1.4 17.6 63.7 16.7 0.7
New Hampshire 273.2 0.9 1.4 21.2 58.1 18.9 0.5
SCOTLAND 272.4 1.5 1.6 20.6 59.7 17.7 0.4
Wyoming 272.2 0.7 1.1 20.9 60.3 17.4 0.2
Idaho 271.5 0.8 1.2 22.1 59.7 16.8 0.2
IRELAND 271.4 1.4 3.1 21.0 57.1 18.0 0.8
Oregon 271.4 1.0 2.2 23.8 54.2 19.2 0.6
Connecticut 269.9 1.0 3.2 25.3 50.7 20.1 0.7
New Jersey 269.7 1.1 2.4 26.9 50.2 19.7 0.8
Colorado (NAEP) 267.4 0.9 2.8 26.5 54.7 15.7 0.4
SLOVENIA 267.3 1.3 1.6 25.7 60.2 12.2 0.4
Indiana 267.3 1.2 2 28.2 53.9 15.4 0.5
Pennsylvania 266.4 1.6 3.2 27.5 53.0 15.8 0.5
Michigan 264.4 1.2 3.1 30.1 51.7 14.5 0.6
Virginia 264.3 1.5 3.3 32.8 47.3 15.4 1.3
Colorado (IAEP) 264.2 0.7 3.1 28.8 55.4 12.4 0.4
Ohio 264.0 1.0 3.1 30.5 52.4 13.8 0.3
Oklahoma 263.2 1.3 2.8 30.8 53.8 12.5 0.2
SPAIN 261.9 1.3 2.1 29.0 62.0 6.9 0.0
UNITED STATES(IAEP) 261.8 2.0 5.0 30.6 52.0 11.5 0.9
United States (NAEP) 261.8 1.4 5.0 31.5 49.0 14.0 0.5
New York 260.8 1.4 5.9 31.4 48.0 13.9 0.8
Maryland 260.8 1.4 5.7 33.1 45.3 15.3 0.6
Delaware 260.7 0.9 4.6 34.2 47.6 13.0 0.6
Illinois 260.6 1.7 5.7 31.4 49.1 13.4 0.5
Rhode Island 260.0 0.6 5.0 34.0 47.3 13.5 0.3
Arizona 259.6 1.3 4.5 33.8 49.7 11.7 0.4
Georgia 258.9 1.3 5.3 35.2 46.5 12.5 0.6
Texas 258.2 1.4 4.8 36.4 46.7 11.7 0.4
Kentucky 257.1 1.2 3.9 38.2 47.9 9.8 0.2
New Mexico 256.4 0.7 4.3 38.2 47.7 9.6 0.3
California 256.3 1.3 6.9 35.9 45.2 11.5 0.4
Arkansas 256.2 0.9 4.6 37.3 49.4 8.6 0.1
West Virginia 255.9 1.0 4.3 38.7 48.4 8.5 0.2
Florida 255.3 1.3 6.6 37.7 44.3 11.2 0.2
Alabama 252.9 1.1 6.2 40.5 44.8 8.3 0.3
Hawaii 251.0 0.8 9.9 39.2 39.8 10.6 0.5
North Carolina 250.4 1.1 7.9 41.2 42.6 8.1 0.0
Louisiana 246.4 1.2 8.2 46.1 40.6 4.9 0.2
JORDAN 236.1 1.9 16.0 48.3 32.6 3.1 0.0
District of Columbia 231.4 0.9 16.7 56.9 23.6 2.5 0.3
-----------------------------------------------------------------------------------------
Countries and states are sorted from high to low based on their mean proficiency
scores. Colorado participated in both the NAEP Trial State Assessment and,
separately, in the International Assessment of Educational Progress.
SOURCE: Albert E. Beaton and Eugenio J. Gonzalez, "Comparing the NAEP Trial
State Assessment Results with the IAEP International Results," in
Setting Performance
Standards for Student Achievement: Background Studies (Stanford,
CA: National Academy of Education, 1993).
Footnotes
(4)
For
the NAEP and the IAEP IRT scales, conventional individual scale scores are not
generated. Instead, the scaling process generates a set of five "plausible
values" for each student. The five plausible values reported for each student
can be viewed as draws from a distribution of potential scale scores consistent
with the student's observed responses on the test and the student's measured
background characteristics. In other words, the plausible values are constructed
to have a mean and variance consistent with the underlying true population
values. In this sense, the plausible values correct for unreliability. See
Mislevy, Beaton, Kaplan, and Sheehan, 1992
. . . return to section
(5)
The actual procedure used by Pashley and Phillips was somewhat more complex than
the method described in the text. Five regressions were estimated, one for each
pair of IAEP and NAEP plausible values (see the previous footnote). Given the
sample sizes involved, the regression parameters produced by the five
regressions differ only marginally.
(6)
The
regression parameters shown in the table are based on an approximate analysis
using the reported correlation between the IAEP and the NAEP total mathematics
score (r = .825), as well as the mean and the standard deviation of the IAEP and
the NAEP in the linking sample, averaging across the five sets of plausible
values. The results obtained by averaging in this way differ only slightly from
the method used by Pashley and Phillips, based on separate regressions for each
of the five plausible-value pairs. See the previous two footnotes.
. . . return to section
(7)
In the method as implemented by Pashley and Phillips, the five regression
equations were each used to obtain predicted NAEP scores at the individual
level; and the results were averaged to produce country means. The results are
very similar to those that are obtained using the somewhat simpler method
discussed in the text.
. . . return to section
(8)
Like
Pashley and Phillips, Beaton and Gonzalez carried out their procedure separately
for each of the five sets of plausible values; and they then averaged the
results obtained for each set. The results differ only slightly when their
procedure is carried out once using published estimates of means and standard
deviations. .
. . . return to section
(9)
The
1990 NAEP mathematics results were rescaled in 1992, producing slightly
different scale scores. Beaton and Gonzalez used the 1992 rescaling. .
. . . return to section
(10)
The
simple regression coefficient required for the projection method can be
expressed as rsy/sx, where r is the correlation between the IAEP and the NAEP,
sy is the standard deviation of the NAEP, and sx is the standard deviation of
the IAEP. The conversion coefficient required for the moderation method is
simply sy/sx. .
. . . return to section
(11) Given
the data required, it is possible to develop moderation estimates similar to
those developed by Beaton and Gonzalez for several different samples. But
because the Pashley and Phillips projection method requires paired IAEP and NAEP
data, the linking sample is the only data set in which it currently can be
applied. .
. . . return to section
(12) As
discussed in footnotes 4-7 above, Beaton and Gonzalez based their estimates on
the full set of individual-level plausible values for each country. We developed
the estimates in Tables S21 and S22 based only on the reported country means and
standard deviations based on the plausible values. These results differ only
slightly from those that would be obtained using the full set of plausible
values. .
. . . return to section
(13) The
interpretation of the predicted NAEP scores based on the moderation method is
complicated by the fact that the IAEP sample used to develop the conversion
constants included students in both public and private schools, while the NAEP
sample included only public school students. Since the NAEP results for the full
sample of eighth graders including both public and private students differ only
modestly from the results for the sample including only public students, this
problem probably accounts for relatively little of the difference in predicted
outcomes for the projection and moderation approaches.
(14) The
plausible values generated for the IAEP and NAEP are designed to reflect the
true population mean and variance; but correlations among plausible values are
attenuated due to unreliability. .
. . . return to section
(15) Since
the IAEP and NAEP plausible values are designed to produce unbiased estimates of
population variance, moderation methods that make use of the plausible values
should not be sensitive to measurement error.
(16) To
obtain valid NAEP scores in countries outside the United States, language and
other issues would of course need to be taken into account. .
. . . return to section
Using
the above Beaton-Gonzalez crosslink study and published NAEP math scores for
other schools produces the following more complete table: Mean SE | 350 ----------------------------------------------------------------------------------------- Asians Maryland 306 Whites Washington, DC 303 Texas religious 301 Washington religious 299 TAIWAN 296.7 1.5 3.2 13.4 33.9 36.6 12.9 North Dakota religious 296 KOREA 294.1 1.3 1.9 10.3 41.8 39.3 6.7 SOVIET UNION 287.6 1.5 0.8 10.4 53.1 34.0 1.7 SWITZERLAND 287.5 1.9 0.2 8.8 57.9 32.2 0.9 Montana Whites 287 HUNGARY 284.8 1.4 1.4 13.5 52.6 29.9 2.7 Whites in DOD schools 284 California religious 284 North Dakota 281.1 1.2 0.8 13.2 60.0 24.8 1.3 Whites national 281 Montana 280.5 0.9 0.5 14.3 59.5 24.9 0.8 Virginia Whites 279 FRANCE 278.1 1.3 1.4 16.8 57.5 23.4 1.0 Iowa 278.0 1.1 0.6 18.3 57.0 23.3 0.7 ISRAEL 276.8 1.3 1.5 15.6 61.6 20.7 0.6 ITALY 276.3 1.4 1.6 18.1 57.7 22.0 0.5 Nebraska 275.7 1.0 2.0 18.6 56.2 22.4 0.9 Minnesota 275.4 0.9 1.6 19.2 57.0 21.2 1.1 Wisconsin 274.5 1.3 1.5 20.8 55.4 21.6 0.7 CANADA 274.0 1.0 1.4 17.6 63.7 16.7 0.7 New Hampshire 273.2 0.9 1.4 21.2 58.1 18.9 0.5 SCOTLAND 272.4 1.5 1.6 20.6 59.7 17.7 0.4 Wyoming 272.2 0.7 1.1 20.9 60.3 17.4 0.2 Idaho 271.5 0.8 1.2 22.1 59.7 16.8 0.2 IRELAND 271.4 1.4 3.1 21.0 57.1 18.0 0.8 Oregon 271.4 1.0 2.2 23.8 54.2 19.2 0.6 Connecticut 269.9 1.0 3.2 25.3 50.7 20.1 0.7 New Jersey 269.7 1.1 2.4 26.9 50.2 19.7 0.8 Colorado (NAEP) 267.4 0.9 2.8 26.5 54.7 15.7 0.4 SLOVENIA 267.3 1.3 1.6 25.7 60.2 12.2 0.4 Indiana 267.3 1.2 2 28.2 53.9 15.4 0.5 Pennsylvania 266.4 1.6 3.2 27.5 53.0 15.8 0.5 Michigan 264.4 1.2 3.1 30.1 51.7 14.5 0.6 Virginia 264.3 1.5 3.3 32.8 47.3 15.4 1.3 Colorado (IAEP) 264.2 0.7 3.1 28.8 55.4 12.4 0.4 Ohio 264.0 1.0 3.1 30.5 52.4 13.8 0.3 Oklahoma 263.2 1.3 2.8 30.8 53.8 12.5 0.2 SPAIN 261.9 1.3 2.1 29.0 62.0 6.9 0.0 UNITED STATES(IAEP) 261.8 2.0 5.0 30.6 52.0 11.5 0.9 United States (NAEP) 261.8 1.4 5.0 31.5 49.0 14.0 0.5 New York 260.8 1.4 5.9 31.4 48.0 13.9 0.8 Maryland 260.8 1.4 5.7 33.1 45.3 15.3 0.6 Delaware 260.7 0.9 4.6 34.2 47.6 13.0 0.6 Illinois 260.6 1.7 5.7 31.4 49.1 13.4 0.5 Rhode Island 260.0 0.6 5.0 34.0 47.3 13.5 0.3 Arizona 259.6 1.3 4.5 33.8 49.7 11.7 0.4 Georgia 258.9 1.3 5.3 35.2 46.5 12.5 0.6 Texas 258.2 1.4 4.8 36.4 46.7 11.7 0.4 Kentucky 257.1 1.2 3.9 38.2 47.9 9.8 0.2 New Mexico 256.4 0.7 4.3 38.2 47.7 9.6 0.3 California 256.3 1.3 6.9 35.9 45.2 11.5 0.4 Arkansas 256.2 0.9 4.6 37.3 49.4 8.6 0.1 West Virginia 255.9 1.0 4.3 38.7 48.4 8.5 0.2 Florida |