Abstract: This study was conducted to develop a math proficiency test using IRT, based on NAEP’s adapted Mathematics Framework, and aligned with the newly developed standards, and benchmarks of National Mathematics Curriculum. Mathematical Proficiency is the ability to use the mathematical power for conceptual understanding, procedural knowledge, and problem-solving in the real world using appropriate strategies. One hundred and ninety six multiple choice and short constructed response items were developed. Due to limitations of the study only, multiple choice and short constructed response, dichotomously scored items were developed. Items were spot tested on 200 students and piloted on 550 students. A final 60-item math proficiency test was constructed. Stratified cluster sampling technique was used for administration of final math proficiency test. All the 9th grade students studying in public high and higher secondary schools in the province of Punjab comprised the population of the study. Sample of 2680 students and 134 schools was selected using sample design tables and IRT based analysis requirements. Final math proficiency test was administered and data was received from 2617 students. Data was analyzed by using SPSS, Conquest, and Multilog software. Data was well fitted with Rasch Model. The infit and outfit means square statistics of items was within 0.8 to 1.30. The reliability and discrimination index of final math proficiency test was above 0.90 and 0.45 respectively. The IRT based person-item map showed that test items covered ±3 range of abilities. Test characteristics and item characteristics curves, factor analysis, and dimensionality analysis also supported the reliably and validity of the test. Different ix estimations-WLE, EAP, and MLE also supported reliability and validity of test items. Estimations WLE, EAP, and MLE have advantages and disadvantages over each other. For this study the average of these estimations was used as proficiency score for each student. It explored that math proficiency test is appropriately constructed. Replication of this study is recommended by using 2PLM and 3PLM of Item Response Theory (IRT).
