Standardized Test – Technology and Learning

Time to stop adjusting grades/grade boundaries?

If using an algorithm to adjust marks is unfair, as it has been deemed to be this year, then surely this practice must cease going forward.

The last few weeks have been filled with issues surrounding exam results. One of these was being how the A-Level results were adjusted from centre assessed grades based on a statistical algorithm. This was deemed to be unfair as it penalised some students or groups of students more than others. The lack of equity was clearly evident due to the ability for schools to compare their centre assessed grades with the finally awarded grades. It was therefore evident how the statistical adjustment, carried out in the interests of keeping results generally in line with previous year’s results, impacted on individual students. The faces and lives of individual students could be attached to the grade adjustments. This was deemed unacceptable.

My worry here is that this statistical adjustment has always gone on. Normally students would sit exams with their resulting score undergoing adjustment in the form of changes in the grade boundaries. Again, this was done in the interests of keeping results generally in line with previous years results and again some groups of students would likely be penalised more than others. The grade boundaries changed due to the exam being deemed generally easier/harder. The focus on the difficulty of the exam meant that seldom did we associate resulting grade changes with individual students; we don’t generally attach faces to this change, yet some students would have received lesser grades than had the adjustment not been carried out, the same as happened this year. This seemed acceptable, and has been the way things have been done for decades, but I don’t see how this is any fairer that what happened this year.

Maybe following this years issues, we need to take another look at how we assess/measure students learning and achievement including the associated processes.

Standardized Testing

testing-sign I have written a number of times about my feelings with regards standardized testing. (You can read some of my previous postings here – Some thoughts on Data , Building Test Machines). Having worked internationally in schools in the Middle East I am particularly aware of standardized testing and the weight put on the results from such testing. Within the UAE there is a focus on ensuring that education is of an international standard with the measure of this international standard being the results from PISA and also from EMSA testing regimes. As a result individual schools and their teachers are expected to pore over the EMSA results and analyse what the results mean. I feel that this focus on a standardized testing regime such as PISA is misplaced as how can we on one hand seek differentiated learning tailored to students as individuals while measuring all students with the a single standardized measure.

As such it was with great interest I read the article in the TES titled, “Ignore Pisa entirely,’ says world expert”. The article refers to comments provided by Professor Yong Zhao who I was lucky to see at an SSAT conference event back in 2009. Back then I found Professor Zhao to be both engaging and inspiring as a presenter, with some of his thoughts echoing some of my own plus also shaping some of the thoughts and ideas that I came to develop. Again I find myself in agreement with Professor Zhao. I particularly liked his comment regarding the need for “creativity, not uniformity”.

I feel the focus on PISA is the result of valuing what is measurable as opposed to measuring what is valued. Measuring student performance in a standardized test is easy, with various statistical methods then allowing for what appears to be complex analysis of the data, therefore lending us to be able to prove or disprove various theories or beliefs. Newspapers and other publishers then sensationalize the data and create causal explanations. Education in Finland was heralded to be excellent recently as a result of the results from PISA testing. Teaching in the UAE was deemed to be below the world average however better than most other Middle East countries. Did PISA really provide a measure of the quality of education? I think not!

Can education be boiled down to a simple test? Is a students ability to do well in the PISA test what we value? Does it take into consideration the students pathway through learning as the pathway differs from one country to another? Does it take into consideration local needs? Does it take into consideration the cultural, religious or other contexts within which the learning is taking place? Does it take into account students as individuals? Now I acknowledge that it may be difficult or even impossible to measure the above however does that mean that we accept a lesser measure such as PISA just because it is easier?

There may be some place for the PISA results in education however I feel we would be much better focusing on the micro level, on our own individual schools and on seeking to continually improve, as opposed to what Professor Zhao described as little more than a “beer drinking contest”.

Some thoughts on Data

A recent article in the Telegraph (read it here) got me thinking once more about data. This also got me thinking about the book “Thinking, Fast and Slow” by Daniel Kahneman which I have only recently finished reading. The book highlighted a number of issues which I feel have implications for education and need to be considered by school leaders.

Firstly the small numbers effect: The Bill and Melinda gates foundation commissioned a study to examine schools in search of the most effective schools. It found, unsurprisingly that small size, in terms of student numbers, schools achieved the best results, over larger schools. Contradictory it also found that small schools also achieved the worst results. The reason for this as explained by Kahneman is that where a data set contains only a small number of items the potential for variability is high. As such, due to a variety of random variables and possibly a little helping of luck, some small schools do particularly well, out achieving big schools. Other small schools are not so lucky and the variables don’t fall so well, resulting in the worst results.

darts

To clarify this consider throwing three darts at a dart board aiming for the centre. This represents the results of a school with a small number of students with higher scores being nearer centre and a lower score being those darts ending further from the centre. In the case of student results an average result would then be calculated for the school and the same can be done looking at the position of the darts. Assuming you are not a professional darts player you may do well or you may not do so well due to a variety of random variables. Given the limited number of darts the potential for variability is high hence a high average or low average is very possible. Next consider if you were to continue and throw sixty darts at the dart board, taking the average across all the dart throws. Given the number of darts the average will regress towards your mean darts throwing ability. The increased number of data items means that variability is reduced as each significant good or poor throw is averaged out among the other throws.

Within schools a great deal of value is being attached to statistical analysis of school data including standardised testing however care must be taken. As I have suggested above a statistical analysis showing school A is better than school B could easily be the result of random factors such as school size, school resourcing and funding, etc as much as it may be related to better quality teaching and learning, and improved student outcomes.

Another issue if how we respond to the results. Kahneman suggests that commonly we look for causal factors. As such we seek to associate the data with a cause which in schools could be a number of different things however our tendency is to focus on that which comes easily to mind. As such poorer (and better, although not as often,) results are associated most often attributed to teachers and the quality of their teaching as this is what is most frequently on the mind of school leaders. We arrive at this conclusion often without considering other possible conclusions such as the variable difficulty of the assessments, assessment implementation, the specific cohort concerned, the sample size as discussed earlier and a multitude of other potential factors. We also, due to arriving so quickly at a causal factor which clearly must be to blame and therefore needs to be rectified, fail to consider the statistical validity of our data. We fail to consider the margins for error which may exist in our data including what we may consider acceptable margins for error. We also fail to consider a number of other factors which influence our interpretation of the data including the tendency to focus more on addressing the results which are perceived to be negative. This constant focus on the negative can result in a blame culture developing which can result in increasing negative results and increasing levels of blame. Maybe an alternative approach which may work would be to focus more on the marginally positive results and how they were achieved and how they could be built upon.

The key issue in my belief is that we need to take care with data and the conclusions we infer from it. We cannot abandon the use of data as how else would we measure how we are doing, however equally we cannot take it as fully factual. The world is a complex place filled with variables, randomness and luck, and we need to examine school data bearing this fact in mind. We also need to bear in mind that data is a tool to help us deliver the best learning opportunities for students; data is not an end in itself!

Share this:

Share this:

Share this: