At the direction of the Institute of Education Sciences (IES), we have developed recommendations for middle grades (5-8) math tests. The goal of these recommendations is to provide researchers, including potential IES grantees, with a robust list of high quality assessment tools. These recommended instruments can be administered in both research and educational evaluation settings in order to encourage the use of common measures. This memo not only highlights specific middle grades math tests, but also provides a transparent summary of the steps taken to curate our full list of recommended assessments.
Collection and Coding Process
Middle grades math tests were collected by reviewing publisher websites, soliciting expert recommendations, and conducting ERIC searches (i.e., reviewing the assessment/survey filters in ERIC). The following protocol was used during our collection process:
- We included only formal assessments that could be administered by a researcher or in collaboration with schools/districts for the purpose of a large-scale program evaluation.
- We did not focus on tests designed solely to be used by teachers or clinicians for individual diagnostics.
- We also did not include state summative tests (e.g., individual state tests, PARCC, Smarter Balanced, or ACT Aspire). Iowa Assessments are an exception given the history of use of the older Iowa Tests of Basic Skills and the ability to administer these assessments specifically for research.
- We did not include assessments that are soon to be discontinued (e.g., easyCBM). Where multiple editions of a test were available, we included the newest edition.
The entire coding process was undertaken by one researcher on our team for purposes of consistency. Table 1 describes each variable in the attached spreadsheet, including further information on the process.
While potential grantees should consider the utility of each assessment for their specific research design and purpose, we recommend special consideration of the NWEA MAP Growth. This assessment is of high technical quality, commonly used in research, and well-known in practice. The newer Iowa Assessments (formerly the Iowa Tests of Basic Skills) should also be strongly considered for these same reasons. If individually-administered assessments (1–on–1) are appropriate for the research design (and sufficient resources are available), additional high-quality options include the Woodcock-Johnson IV and the WIAT-4. The SAT10 and GMADE are older tests (predating the CCSS and instead aligned to the NCTM 2000 standards), but are of high quality and used in program evaluation research as well. While the math portions of aimswebPlus have not been used widely in program evaluation research, the older aimsweb tests were utilized, and researchers may be interested in these brief assessments for use in program evaluation. Although the WRAT5 is focused mainly on computation, it is a fairly brief math assessment (~25 minutes), and the older WRAT assessments were common in research and could be considered. Finally, the TOMA-3 offers some unique qualities such as word problems, mathematics in everyday life, and measures of mathematics attitudes, which could be desirable qualities for researchers.
Spreadsheet Variables and Variable Descriptions
|External Link||Link to the publisher website, where the instrument can be purchased. Along with the technical reference (below), information used for coding was found here.|
|Grade Range||Range of grades that the test is appropriate for, based on information from the publisher|
|Middle Grades Content||Test content for Grades 5-8 reported by publisher|
|Assessment Type||General information about the test|
|Middle Grades Reliability||Reliability reported for Grades 5-8|
|Validity evidence?||Evidence of criterion and/or construct validity? Yes/No|
|Evidence of fairness?||Issues of fairness/bias taken into consideration (e.g., differential item functioning or bias reviewers)? Yes/No|
|Nationally representative norms?||Yes/No|
|Technical Reference||Reference used for coding instruments, in addition to the External Link|
|Used in program evaluation research?||Describes the extent that the test has been used in research studying the effects of a policy/program/intervention. To locate published empirical studies, we first scanned the publisher website. If unavailable, we performed a search of the test name in ERIC and Google Scholar.|
|Time||Reported administration time or number of items|