Article about Test-Taking Reactions

*Note: I may have to change the micro-study for class to focus on students’ reaction to the test rather than the test itself because the IRB process is prohibitive. I will need to email you or Skype later this week.

 

Bradshaw, Jenny. “Test-takers’ Reactions to a Placement Test.” Language Testing 7.1 (1990): 13-30. Print.

Students were asked to take a questionnaire half an hour after taking a placement exam. The students were divided into groups based on their first language. The questionnaire features 7 point Likert scales that are individualized by question type.

The purpose of the study was not to test particular hypotheses but rather to identify possible areas of future research. The study was interested in whether a difference in amount of test time or instructions given would influence students’ perceptions of the test. Control groups were created

They had 4 groups of students. The first two groups spoke the same first language and were given the test and questionnaire under similar circumstances as a test of questionnaire validity. Once the results were studied to confirm reliability of the instrument, these groups were combined. The third group spoke a different first language. The fourth group spoke the same first language as the first two groups but had extended time and further instructions.

They predicted that students who scored higher on the test would respond more favorably to the test than the students who scored lower. The students who scored lower on the test “rated the tests significantly lower for time available, clarity of instructions, nervousness and difficulty on all three parts” (21).

Many charts were included with explanations of the mean score as well as separate columns that indicate whether the answer was largely neutral or the mean was a result of almost equal negative and positive scores.

The analysis of the positive components of the test finds that “perceptions of difficulty appeared to have more connections with reactions on other dimensions than did test score” particularly on the third part of the test (22). A chart with correlation coefficients was included as evidence of this finding.

While there were differences between students who scores on either end of the spectrum, there were no significant differences for gender,  first language, or nervousness. Provision of more time did not seem to have an impact.

A different set of questionnaires was given to the teachers. Teacher perceptions of test difficulty differed from student perception.

REVIEW:

In particular, I liked that the study included a reliability test for the instrument being used. The researcher sited criteria by Nevo as their guide for this. I also appreciate the inclusion of data that shows whether the mean was reflective of largely neutral responses or a wash of negative and positive responses.

Little discussion was given to why the perceptions of difficulty seem more significant than the actual test scores in terms of student reactions, especially in light of differences in negative perceptions by the students who scored the lowest on the test.

 

 

Summary/Review of CCRC’s Working Paper (Empirical)

The main findings were that the COMPASS test is better at predicting student success in math than in English. In both subjects, COMPASS does relatively well when predicting which students will perform well in a course. It is much less successful when determining which students will perform at a C level (16). For these reasons and others, it is recommended that colleges use multiple measures or initiate alternative developmental paths to success.

Scott-Clayton says, “The predictive power of placement exams is in a sense quite impressive given how short they are (often taking about 20-30 minutes per subject/module). But overall the correlation between scores and later course outcomes is relatively weak” (37).

When data is examined to see what happens if all students are placed in college-level coursework, the English placement tests “increase the success rates in college-level coursework […]  by 11 percentage points” (27); however, “[…] these tests generate virtually no reduction in the overall severe error rate (in other words, while the placement tests reduce severe overplacements, they increase underplacements by the same amount)” (27).

REVIEW:

One immediate concern I had when reading the data analysis is how withdrawals are treated the same as failing grades in Scott-Clayton’s analysis (14). Part of the rationale given for including the withdrawals as failures is, “because withdrawal decisions are not likely to be random, but rather may be closely linked to expectations regarding course performance” (14). This is the only rationale provided, and it is not enough. If Scott-Clayton assumed students only withdraw when they think they will fail anyway, this discounts the number of students who withdraw for a multitude of personal reasons each semester that have nothing to do with the course and everything to do with what is happening outside of the classroom.

In the predictive validity section, Scott-Clayton quotes Sawyer as saying that conditional probability of success can be tallied with reasonable accuracy when 25% or fewer students are placed in developmental courses (8), but students are placed into developmental courses at much higher rates than 25%. Her solution was to eliminate the very low-scorers in the sensitivity analysis. I have the benefit of hindsight here that the researcher does not, as ACT Inc. admitted in 2015 (three years after this publication), “A thorough analysis of customer feedback, empirical evidence and postsecondary trends led us to conclude that ACT Compass is not contributing as effectively to student placement and success as it had in the past,” before voluntarily phasing out the test. Removing the very low-scorers probably could not account for the large percent of students who were being placed in developmental courses. Besides, the lowest levels of developmental courses exist to serve low-performing students. What is of greatest concern is how many borderline students have been under-placed into a course that delays their timeline into credit-bearing courses. That did not seem to be the priority of this analysis.

“ACT Compass.” ACT. ACT Inc, n.d. Web. 10 Feb. 2016.

Scott-Clayton, Judith. “Do High-Stakes Placement Exams Predict College Success?” Community College Research Center. Columbia University, Feb. 2012. Web. 2 Feb. 2016.

 

Summary and Review of the TYCA White Paper on Placement Reform

My project this semester is to test the validity of a Rhetorical Analysis Diagnostic Exam (RADE) that my department has created.  This test is meant to supplement or replace the English portion of Accuplacer as a placement exam for the writing courses at my institution. My WPD gave me the task of reviewing TYCA’s White Paper before revising RADE. In the next two weeks, I will be revising the test a colleague created. I will also be reformatting the test within Blackboard.

 

The TYCA White Paper is meant to guide decisions made about placement at two-year colleges given the current state of upheaval created by the loss of COMPASS. COMPASS was an inexpensive option for many schools. Its dissolvement gives  two-year colleges “an opportunity and a challenge: how to replace an easy-to-use and relatively cheap placement process which has been shown to be severely flawed with a practical and affordable process that is supported by current research” (2).

 

In the “business as usual” section, the committee explains the flaws inherent in replacing one flawed high-stakes test with another. The assessments are problematic for many reasons, not least of which is weak validity. Additionally, the practice of using a high-stakes exam further divides institutional practices from the professional expertise of the faculty.

 

TYCA  has long recognized that the most effective way to evaluate students’ writing ability is to assess their writing, but there are problems with implementing this type of placement in a two-year college. To be most effective, the writing sample should not be a single piece of writing and it should be “situated within the context of the institution” (7). It should also be assessed by faculty who teach the courses the students will be placed into. As this process is both time-consuming and costly, most two-year colleges will not be able to implement it.

 

The committee recommends basing placement on multiple measures rather than one high-stakes test or a stand-alone writing sample. Possible measures include: high school GPA or transcript, Learning and Study Strategies Inventory (LASSI), interview, writing sample, previous college coursework, and/or a portfolio (8-9). The committee supports use of Directed Self-Placement but also recognizes that it might not be feasible for many institutions. Other options are in-class diagnostic writing samples with the opportunity to move into credit-bearing courses or acceleration models that allow students to take a credit-bearing course alongside a Basic Writing course and progress to 101 on the merit of the credit-bearing course’s grade.

 

If a stand-alone test is going to be used,  special attention must be made to ensure it is fair and non-discriminatory to students of differing backgrounds, age ranges, etc. Among the recommendations of the committee is that all reforms should “be grounded in disciplinary knowledge” and “be assessed and validated locally” (21).

 

The TYCA White Paper will serve as an invaluable resource as my colleagues and I continue to argue for use of RADE as an alternative to Accuplacer. Many of the reforms mentioned in the white paper are not possible for my institution, as the administration has already decided to use Accuplacer and will not pay for additional tests to be administered. Additionally, much of our enrollment comes from students who expect to register for classes the day they enroll in the college. For that reason, multiple measures (and DSP which relies on multiple measures) will not be possible without endangering enrollment procedures, something the college is understandably loathe to do. My personal take-away from the article is the importance of making sure our diagnostic test does not unfairly privilege any demographic groups over others.
TYCA Research Committee. “TYCA White Paper on Writing Placement Reform.” Teaching English in the Two-Year College. Pending, 09/2016.