Pu Ying Huang
Associate education professer Walter Stroup supervised earlier studies on the subject by UT students before making this discovery.
As students and educators across the state attempt to raise standardized test scores, UT associate education professor Walter Stroup has found that the validity of standardized testing is questionable.
Along with two other researchers, Stroup has found an issue with the real-world Item Response Theory, the method used to develop the majority of standardized tests across the country. A 2009 study on the Texas Assessment of Knowledge and Skills (TAKS) exam by UT Ph.D. student Vinh Huy Pham, which was supervised by Stroup, showed that standardized testing has become more about a student’s ability to interpret tricky language on an exam than about how much knowledge a student actually has on the relevant material. Stroup has recently taken a stand to bring attention to this research by speaking at a June Texas House Public Education Committee meeting and participating in interviews with multiple mainstream publications.
He said because other standardized exams are developed in the same way, they could easily have similar issues. In a country where testing is heavily used to hold both students and educators accountable, Stroup said if the testing isn’t accurately measuring a student’s knowledge of a subject, it is certainly unfair to hold people responsible for test results and a waste of taxpayers’ money.
Test distributor Pearson currently has a five-year, $468 million contract to create Texas’ standardized exams through 2015 and uses the real-world IRT method in developing them.
Stroup said the real-world IRT method uses benchmarks to determine how well students should be doing on the standardized tests and develops the tests accordingly. In the process of trying to develop an exam at a specific difficulty level, he said vendors will often choose a question because its wording is more complex and not because it actually covers more difficult material.
Stroup’s work originally began when he was piloting a program for middle school math students in Richardson, Texas. He noticed that the students’ improvement in the classroom and on exams didn’t correlate with their minimal improvement on the TAKS exam.
“It didn’t add up,” he said. “Their scores on the TAKS exam didn’t correlate with how much they had learned throughout the year.”
The study shows the test’s question ratio is 70/30 and growing, with 70 percent of the score on a TAKS exam being about a student’s ability to “think like a gamer,” or the ability to read deceptive language. The remaining 30 percent is about other factors such as classroom instruction and class scheduling, the factors that “should show up on these tests,” Stroup said.
In the 2009 analytical study, Stroup’s team analyzed the TAKS scores of nearly 100,000 students. The results demonstrated the profile of all four sections tested: English, math, science and social studies. Each had very similar score distributions. This indicated that the vendor of the exam, Pearson, was aiming for a consistent distribution of test scores in each area, which Stroup said is a major issue with the test creation method.
Another major issue occurs with question selection, which Stroup said is flawed.
“For example, the test vendor may take 20 questions on slope and choose one because the right percentage of students get it correct,” Stroup said.
Stroup said the skills necessary to do well on these exams are somewhat unknown but that whatever they are, these skills are not what instructors should be teaching students.
“The test-taking profile is an ability to read tricky language, to think strategically about which one of these four letters is right. I don’t think society should be optimizing education for that.”
UT alumnus Steven Maddox has been working in the education field for the last 30 years and is currently an assistant principal of Austin High School. Maddox said he has definitely seen classroom education altered to conform to the testing being given, something that takes away from classroom instruction and is futile if the test isn’t emphasizing the right skills.
“We tend to have to narrow our focus on what is going on in the classroom, forcing us to not be able to branch out when opportunities arise,” he said. “Those branch-out opportunities many times are great learning opportunities, but they aren’t specifically the objectives that the teacher has to focus on. There is a lot of pressure on all of us to make sure that the kids are successful on those tests.”
Maddow said the exams often put great pressure on excellent teachers who seem to be teaching classroom material well and can’t figure out why their students aren’t doing better on such exams.
He said the result is forcing teachers to make their instruction more like what is seen on the exams, forcing teachers to take classroom time simply to show students how to break up tricky exam questions so the students can score better on what the exams are actually testing them on.
Stroup said he feels they have lost some credibility by producing such a flawed testing method.
“There are many people that have the ability to properly create standardized exams,” he said.
Stroup said he is planning on addressing the Texas House Public Education Committee again in June 2013 and will be seeking publication for his research as well, something that will be easy to do considering its content.
DeEtta Culbertson, spokesperson for the Texas Education Agency, said her office is still in review of Stroup’s research and has no final opinion on it at this time.