Testing – Technology and Learning

2023 Exam Results: A prediction

And so exam results day once again approaches and I would like to share a psychic prediction: That the newspapers will be filled with headlines as to how A-Level results have fallen when compared with last year.

Ok, so it isnt so much psychic as based on what we know about the UK exams system. We know that each year the grade boundaries are adjusted and that the trend pre-pandemic was for grades generally to be increasing year on year. The ever increasing grades werent necessarily the result of improving educational standards or brighter students, although both of these may or may not be the case, they were the result of a decision taken when setting grade boundaries. With the student exam scores available, the setting of the grade boundaries decided how many students would get an A*, an A, etc and therefore the headline results. It’s a bit like the old goal seek lessons I used to teach in relation to spreadsheets. Using Excel I could ask it what input values I would need to provide in order to attain a given result. So, looking at exam results, what grade boundaries would I need to set in order to maintain the ever increasing grades but while also avoiding it looking like grade inflation or other manipulation of the results. Now I note that in generally increasing grades across all subjects, some subjects showed more improvement than others, with some subjects showing dips, but summed across all subjects the results tended to show improvement year on year.

And then we hit the pandemic and teacher assessed grades and the outcry about how an algorithm was adjusting teacher awarded grades into the final grades they achieved. Students and parents were rightly outraged and this system of adjustment was dropped. But how is this much different from the adjustment of the grade boundaries as mentioned above? The answer is quite simply that the teachers and often students and parents were aware of the teacher assessed grades and therefore could quantifiably see the adjustment when compared against the awarded grade. When looking at the pre-pandemic exams teachers, students and parents don’t have visibility as to what the students grade might have been before adjustments were made to the grade boundaries. They simply see the adjusted score and adjusted final grade. Now I note that a large part of the outrage was in relation to how the grade adjustment appeared to impact some schools, areas or other demographics of students more than others, however I would suggest this is also the case when the grade boundaries are set/adjusted, albeit the impact is less obvious, transparent or well know.

So, we now head into the exam results following the period of teacher assessed grades with students back doing in-person exams. Looking at this from an exam board level, and reading the press as it was after the 2022 exam results, we know that a larger than normal increase was reported over the teacher assessed grade years, with this being put down to teacher assessed grades versus the normal terminal exams. As such I would predict that the exam boundaries will be set in such a way to make the correction. I predict the exam boundaries will therefore be set to push exam results downwards although it is unclear how much the results will be pushed down. It may be that the results are reduced slightly to avoid too much negative press or it may be that a more significant correction is enforced based on the fact that this might be easily explained by the previous teacher assessed grades plus also the lack of proper exams experience held by the students who sat their A-Level exams this time; remember these students missed out on GCSE exams due to the pandemic.

Conclusion

My prediction is that the exam results stats will be lower than last year but not due to students necessarily doing worse, but due to a decision that the results should be worse given last years apparently more generous results plus the fact these particular students have less exam experience than previous years, pre-pandemic. I suspect my prediction is all but guaranteed but an interesting question from all of this has to be, is this system fair? I believe the answer is no, although I am not sure I can currently identify a necessarily fairer system. But I think in seeking a better system, the first step is to identify the current system isnt necessarily fair.

And one more final thought: To those students getting their results: All I can simply say is very well done! This was the culmination of years’ worth of study and effort, and during a period of great upheaval the world over, unlike anything in my or your history to date. No matter the grades, you did well for getting through it. The grades, no matter what they are do not define you, but your effort, your resilience and what you decide to do next, your journey is what really matters. Well done and all the very best for the future!!

The future of exams

We are now in the exams season with students all over the world sat in exam halls with pen (and pencil) and paper, completing their GCSE and A-Level exams. 5 years ago, it was the same, 10 years ago, 20 years ago, in fact I suspect we could go back over 100 years at we would see a similar scene of rows of students sat taking paper-based examinations. Isnt it about time we looked at a more modern solution to the need for terminal exams?

Computer Based Testing – Challenges

One of the big challenges in any computer-based examination solution would be the requirement for schools and colleges to have large numbers of computers available for students to use in taking their exams. If we are simply substituting the paper test for an electronic test, where all students across the country are expected to sit the same exam at the same time, I feel this problem will be difficult for schools and colleges to resolve especially with core subjects like Maths and English.

We could as an alternative look to allow the taking of tests using students own devices however equally this is problematic as students will not have equal access to equipment and in some cases might not have access to a suitable device, plus there would be concerns in relation to cheating where students are using their own equipment. We saw some of these issues, particularly in relation to access to technology during the pandemic.

Remote Invigilation or proctoring

There is also a question as to whether we even need to get students into a common location. Following the pandemic where a lot of teaching went to using online tools and video is it possible to use the same technologies to allow students to take their exams remotely in their own time. I myself experienced this only a few years ago when doing a Cybersecurity exam which involved remote proctoring and someone monitoring my exam efforts via my web camera. This might be another option that could be considered however the potential safeguarding implications would need to be considered.

Adaptive Testing

The use of adaptive testing might be another solution here as in this situation the students do not necessarily do the same questions. The questions are selected from a pool with the adaptive testing solution then selecting subsequent questions based on how the students do in each question. Using adaptive testing we wouldn’t be as worried about all students sitting the same test at the same time, given the students wouldn’t be receiving the same questions. As such schools could use their available IT resources over a period of time to allow students to access the relevant tests. The challenge, I suspect, with adaptive testing will be convincing parents and students that it is fair. Fairness is easy to point to where all students do the same test at the same time but not so easy where they are doing different questions at different times.

And do we need knowledge based final assessment

We also need to question whether there is still the need for the final assessment of students. For some students it is an opportunity to show all they have learned, but for others it is a massive stress and a negative impact on their wellbeing. I have long been a supporter of vocational qualifications based on ongoing assessment throughout the course rather than the heavily weighted final exams of so called “academic” qualifications.

Additionally in a world where we routinely use technology tools such as google to search for answers and solutions, should we actually be considering how such technology might have a place in future exams, rather than banning such devices from exam halls.

Conclusion

I don’t have an answer for this challenge; Any change is likely to be difficult especially after over 100 years of terminal exams. It is however noteworthy that a number of examination bodies are actively looking and trialling alternative digital exams solutions.

Here is another example of where the pandemic has fuelled an exploration of future solutions. I suspect however it will be some years, maybe 10 or more, before any real change happens, although I hope it happens sooner.

Exams: Why should 1/3 of students fail?

criminalatt from freedigitalphotos Not so long ago I read of a discussion in relation to whether the GCSE English Language should be scrapped. Part of the reasoning behind this is identified as being due to the subject identifying a third of students as having failed. As a headline I think it is difficult to disagree with. How can identifying a third of students as having failed be an acceptable thing to do. On reflection my view is that this issue is less about English Language subject and more about the educational system as it is now and as it has been for over one hundred years.

I remember when I worked within an FE college and I was involved in enrolment following the release of the GCSE results. A-Level and Level 3 BTec courses had clear admissions requirements in terms of the minimum number of B’s or C’s required to gain entry to each course. This often included the need for a minimum of a C in Maths or English. I also remember working with students on their university applications, post A-Levels, where once again universities have entry requirements which students must achieve to gain entry. Once again there might be a need for three C’s to get on their preferred university course.

The issue with the above is that a certain set of grades will gain entry and other lower grades will not result in entry. It is easy to therefore perceive some grades as being passes and as a result the other remaining grades must be fails. The education system as we know it is built on the ability to group students in terms of their ability, as described by their grades, and through this identify the opportunities which will be available. As a result of this, independent of the U, or ungraded option, there will always be a perception as to some grades, those that easily permit entrance to the next level of education, being perceived as being passes and the remainder as being fails.

An alternative is to have qualifications which allow all students to pass. From the headline point of view, improving from only two thirds of students passing to one hundred percent of students passing sounds logical and a success worth celebrating. The issue is that it is unlikely to result in any real change. FE colleges will still need to set requirements, meaning some passing grades will permit entry while others will not. Universities will also set their requirements and again some grades will allow students to pass onto the next level whereas others will see their application fail to get them in.

The above alternative continues to be based on an education system where students pass through the system based on their age. Given this there is a need to differentiate the students hence assigning grades to students based on their exams and coursework.

If we are to consider a system where all students are to achieve, we need to acknowledge the students learn at different rates. We therefore need to allow students to progress through education at different rates. The different rates of progress can therefore be used to differentiate students and identify when they are ready to progress to the next educational level. Again this seems like an enviable solution in that students either complete or can be considered as having not yet completed or achieved. They haven’t failed as the opportunity to complete always exists, being available for them at a time that suits their learning and rate of progressions. The issue here is once again perception in that quickly there will become a view as to what the expected rate of progression will be. This might be that by the age of 18 students will progress to university. Instantly with this perception the media will be able to quote the percentage of students who proceed on or ahead of this target and therefore the percentage which do not. Again we have those that progress as normally expected, those which pass, and those who progress at a slower rate, and therefore have not passed; those which are perceived to have failed.

I don’t like the idea of one third of students failing. It simply doesn’t feel right. That said it is difficult to find an alternative solution that wont simply see us back in the same position a couple of years in the future.

Standardized Testing

testing-sign I have written a number of times about my feelings with regards standardized testing. (You can read some of my previous postings here – Some thoughts on Data , Building Test Machines). Having worked internationally in schools in the Middle East I am particularly aware of standardized testing and the weight put on the results from such testing. Within the UAE there is a focus on ensuring that education is of an international standard with the measure of this international standard being the results from PISA and also from EMSA testing regimes. As a result individual schools and their teachers are expected to pore over the EMSA results and analyse what the results mean. I feel that this focus on a standardized testing regime such as PISA is misplaced as how can we on one hand seek differentiated learning tailored to students as individuals while measuring all students with the a single standardized measure.

As such it was with great interest I read the article in the TES titled, “Ignore Pisa entirely,’ says world expert”. The article refers to comments provided by Professor Yong Zhao who I was lucky to see at an SSAT conference event back in 2009. Back then I found Professor Zhao to be both engaging and inspiring as a presenter, with some of his thoughts echoing some of my own plus also shaping some of the thoughts and ideas that I came to develop. Again I find myself in agreement with Professor Zhao. I particularly liked his comment regarding the need for “creativity, not uniformity”.

I feel the focus on PISA is the result of valuing what is measurable as opposed to measuring what is valued. Measuring student performance in a standardized test is easy, with various statistical methods then allowing for what appears to be complex analysis of the data, therefore lending us to be able to prove or disprove various theories or beliefs. Newspapers and other publishers then sensationalize the data and create causal explanations. Education in Finland was heralded to be excellent recently as a result of the results from PISA testing. Teaching in the UAE was deemed to be below the world average however better than most other Middle East countries. Did PISA really provide a measure of the quality of education? I think not!

Can education be boiled down to a simple test? Is a students ability to do well in the PISA test what we value? Does it take into consideration the students pathway through learning as the pathway differs from one country to another? Does it take into consideration local needs? Does it take into consideration the cultural, religious or other contexts within which the learning is taking place? Does it take into account students as individuals? Now I acknowledge that it may be difficult or even impossible to measure the above however does that mean that we accept a lesser measure such as PISA just because it is easier?

There may be some place for the PISA results in education however I feel we would be much better focusing on the micro level, on our own individual schools and on seeking to continually improve, as opposed to what Professor Zhao described as little more than a “beer drinking contest”.

Some thoughts on Data

A recent article in the Telegraph (read it here) got me thinking once more about data. This also got me thinking about the book “Thinking, Fast and Slow” by Daniel Kahneman which I have only recently finished reading. The book highlighted a number of issues which I feel have implications for education and need to be considered by school leaders.

Firstly the small numbers effect: The Bill and Melinda gates foundation commissioned a study to examine schools in search of the most effective schools. It found, unsurprisingly that small size, in terms of student numbers, schools achieved the best results, over larger schools. Contradictory it also found that small schools also achieved the worst results. The reason for this as explained by Kahneman is that where a data set contains only a small number of items the potential for variability is high. As such, due to a variety of random variables and possibly a little helping of luck, some small schools do particularly well, out achieving big schools. Other small schools are not so lucky and the variables don’t fall so well, resulting in the worst results.

darts

To clarify this consider throwing three darts at a dart board aiming for the centre. This represents the results of a school with a small number of students with higher scores being nearer centre and a lower score being those darts ending further from the centre. In the case of student results an average result would then be calculated for the school and the same can be done looking at the position of the darts. Assuming you are not a professional darts player you may do well or you may not do so well due to a variety of random variables. Given the limited number of darts the potential for variability is high hence a high average or low average is very possible. Next consider if you were to continue and throw sixty darts at the dart board, taking the average across all the dart throws. Given the number of darts the average will regress towards your mean darts throwing ability. The increased number of data items means that variability is reduced as each significant good or poor throw is averaged out among the other throws.

Within schools a great deal of value is being attached to statistical analysis of school data including standardised testing however care must be taken. As I have suggested above a statistical analysis showing school A is better than school B could easily be the result of random factors such as school size, school resourcing and funding, etc as much as it may be related to better quality teaching and learning, and improved student outcomes.

Another issue if how we respond to the results. Kahneman suggests that commonly we look for causal factors. As such we seek to associate the data with a cause which in schools could be a number of different things however our tendency is to focus on that which comes easily to mind. As such poorer (and better, although not as often,) results are associated most often attributed to teachers and the quality of their teaching as this is what is most frequently on the mind of school leaders. We arrive at this conclusion often without considering other possible conclusions such as the variable difficulty of the assessments, assessment implementation, the specific cohort concerned, the sample size as discussed earlier and a multitude of other potential factors. We also, due to arriving so quickly at a causal factor which clearly must be to blame and therefore needs to be rectified, fail to consider the statistical validity of our data. We fail to consider the margins for error which may exist in our data including what we may consider acceptable margins for error. We also fail to consider a number of other factors which influence our interpretation of the data including the tendency to focus more on addressing the results which are perceived to be negative. This constant focus on the negative can result in a blame culture developing which can result in increasing negative results and increasing levels of blame. Maybe an alternative approach which may work would be to focus more on the marginally positive results and how they were achieved and how they could be built upon.

The key issue in my belief is that we need to take care with data and the conclusions we infer from it. We cannot abandon the use of data as how else would we measure how we are doing, however equally we cannot take it as fully factual. The world is a complex place filled with variables, randomness and luck, and we need to examine school data bearing this fact in mind. We also need to bear in mind that data is a tool to help us deliver the best learning opportunities for students; data is not an end in itself!

Class sizes

This morning before walking out the front door I saw someone on a BBC morning programme suggesting that their political parties contribution to the education sector was a reduction in classroom sizes.

I find this interesting that classroom sizes continue to be considered as a measure of how good a school or an education system is. In the case of the comments on BBC the person making the comments was equating classroom size to an improvement in the quality of education.

Hanushek (1998) suggested that the linkage between smaller class sizes and improved students results was “generally erroneous”. Kahneman (2011) went further in suggesting that the fact associated with such a claim were “wrong”.

Kahneman’s explanation (2011) was that the reason for the findings relates to statistics and what he refers to as the “law of small numbers”. A small class is made up of a smaller number of students which therefore results in higher levels of variability in terms of the average. He uses an example of drawing coloured marbles from a jar to demonstrate this. Consider randomly picking marbles from a jar containing red and blue marbles. There is a higher probability of drawing out 3 of acolour (3 high achievers in a small class) than of drawing out 6 of a colour (6, the equivalent number of, high achievers but in a bigger class).

marbles

Within a larger class size there is a greater tendency towards regression to the mean and therefore a more stable and less variable average across schools.

The association of improved results resulting from more teacher time, more support, etc resulting from a smaller class sizes is therefore unfounded. The improved results in schools with smaller class sizes is simply a feature of the statistical analysis of small sample sizes. Kahneman suggests that if the researchers were to change their question and look at if poor results could be linked to small classes they would find this to be equally true.

My feeling on this is that generally class size doesn’t have a significant impact on student results within lower and upper limits. Where the ratio is 1 teacher to 2 or 3 students I would expect to see a positive impact and equally at 50+ students I would expect to see a negative impact. Within the larger range between 5 and 50 I would expect the impact to be minimal if evident at all.

Care needs to be taken with the use of statistics and care has to be taken in believing them. As Kahneman explains, it is easy to create a causal explanation for why a given set of statistics such as those on class size make sense. The ease with which a causal explanation comes to mind however doesn’t necessarily make the explanation and resulting judgement true.

Sources

Hanushek, E. A. (1998), The evidence on class size, W. Allen Wallis Institute of Political Economy

Kahneman, D. (2011), Thinking, fast and slow, Penguin Books

Gaming

The subject of schools “gaming” school league tables and performance measures such as Progress 8 has made the news recently so I have decided to contribute my opinion to the mix. Before doing so I need to be clear that I don’t have any particularly strong views with regards this issue. I therefore believe that my points represent a balanced viewpoint. I will however acknowledge that my assessment of my viewpoint as balanced is based on the context as set by my viewpoint, perception and the paradigms within which I operate as an individual. As such, from the point of view of those reading, including yourself, this may not be balanced after all. I make no apologies for this as all I can offer is my opinion, which is never wrong in that it is my opinion and therefore is formed based on my viewpoint and context.

Back on the subject of “gaming” the discussion seems to have opposing viewpoints. One of these viewpoints is that a school should try to offer its students the best opportunities for success in the future. As such it is important to enable them to achieve as many successful qualifications as possible. These schools therefore look to enroll students in qualifications which for minimal effort return successful qualification, such as ECDL.

The other viewpoint is that schools enrolling students in bulk in ECDL are doing so in order to influence league tables and performance measures such as Progress 8. Educators taking this position are of the opinion that these qualifications are of lesser value than other qualifications which may take longer to achieve or which are more difficult to achieve yet have comparable impact on league tables and other performance measures.

For me there may be truth in both viewpoints. If the studying of specific exams is in the interest of students’ futures then surely it is the correct thing to do. Consider two schools which are identical in outcomes except for the fact that students in one achieve an additional ECDL qualification. Surely this puts students who leave with an additional qualification in a more positive position. I myself worked in a school where we delivered OCR National IT to all students. The reason we did this was due to vocational nature of the qualification which suited out student cohort plus the breadth of study and options available which allowed us to accommodate for individual student needs and interests.

Equally there is truth in the other viewpoint in that if a school put all students in for the ECDL qualification or the OCR National they may have done so purely in the interest of achieving a better league table position than other schools. This may put students under stress where the qualification is additional, or may represent an unfair advantage where an “easy” subject has been substituted in place of a more difficult or valued subject with an equivalent or near equivalent league table or performance measurement points worth.

Both of the viewpoints include identical actions in the batch enrolling students in a given qualification yet both viewpoints result in totally opposing opinions. The key fact is not so much what schools do but why they do it. In one viewpoint it is about the students and the benefit to them while in the other viewpoint it is about the school and getting the best league table or performance measure result.

If OFSTED are to clamp down on “gaming” they are therefore going to have to try and identify why a school took the chosen action. How are they going to do this? How are they going to measure the “intentions” of school leaders? Are we going to start seeing OFSTED inspectors administering polygraph lie detector tests on school leaders?

I also feel that this discussion has a lesser discussed aspect to it in the value of differing qualifications. This discussion has raged for some time on the value of so called “core” subjects and the perceived lesser value of the arts and creative subjects. The new “gaming” discussions adds differing values in terms of the perceived difficult level of a course along with the time taken to deliver the course, with shorter courses perceived to have lesser value. Who will decide the relative worth of each course and the total worth of any individual students curriculum of study?

We should all be working in the interests of our students to try and provide them every competitive advantage with regards Further Education or Higher Education options, or options into employment, or even more generally into their future lives. A key part of this is the qualifications they achieve so we need to get them everything reasonably possible. In teaching we use every trick in the book to try and make sure students are learning plus are ready and able to succeed in whatever assessment is required to achieve a given qualification. If this is “gaming” then maybe we are all involved.

Mandatory Testing?

fail

As I was heading to work on Friday I heard a BBC news story regarding new proposals for “testing” of 4 year old children at the start of their school experience. This immediately had me asking about the differences between assessment and testing. I am not sure there is a difference however I am quite happy to listen to anyone who is able to explain this.

For me, independent of the age of students, one of the first things I need to do is to “test” or assess them. I need to find out a little about them, about the things they like, about the things they are good at and the areas within which they still need to develop. I have worked in secondary education, further education and higher education and across each stage the first thing I have done with new students is to assess or test them in order to help in planning their learning experience.

So this led me to ask why the story was so new worthy. My first assumption was that it related to the differing perspectives and definitions for the term of “assessment” and “testing”. It could be that some see the two terms as meaning the same thing, as I do, while others see each term as meaning something different. This differing perspective leads to the debate around whether the proposal in question is a good or bad thing and therefore to a news worthy story.

Upon thinking on it further and accepting the commonality of the two terms I came to think that it is not what the two words mean or “are” which is the issue but the reason why we undertake them. In the case of my testing at the start of working with new students, this is done as I know the benefit such testing will have in terms of providing the best learning experience possible. It is done because good teaching demands it be done. In the case of the news story they are discussing mandated testing. The reason for mandating such testing may be linked to the reasoning I used in deciding to test however the fact it is mandated detracts for this.

The other issue is what is done with the results. In my example the data is solely for me and to inform learning. There needn’t be a score or a rubric attached. In the case of mandated data collection those mandating it want the data which therefore required quantifiable and comparable scores and grades or at least we might assume this is the case.

Maybe we need to trust teachers more rather than mandating what must be done as the act of mandating something changes the activity being mandated!

Building Testing Machines.

Around 4 or 5 years ago while working in the Middle East as an educational consultant I asked around 200 colleagues as to what they considered the purpose of education to be via email. I then analysed the words which those who replied used in their response. At the time the word which came out as the most frequently used was “knowledge”.

At the time I wondered about this given access to the internet and its apparently boundless “knowledge”. At the time 21^st century skills were widely talked about as important however when it came down to it those working in education still clung to the importance of knowledge. Words such as creativity, critical thinking, collaboration and communication appeared significantly less frequently.

It is a recent blog (See full blog here) which makes me reflect on my findings back then as it raises the issue of identifying what the purpose of education really is. Mr Ferriters argument focuses on EdTech use although beyond this he goes to suggest that the superficial usage of EdTech may be the result of the pressures being put on teachers to achieve high student results in the terms of standardized tests.

This use of tests including PISA tests to measure the success of teaching and of education in a wider sense seems to imply that the purpose of education is to get students high test results. I have very strong beliefs that this narrow view on education is damaging to student learning. As educators is the purpose of education not to prepare students for the future, with the skills required to deal with the largely unpredictable and the often changeable. Is education not about developing students as adaptable, resilient, self-aware, responsible members of local and global society? And if this is the case how do a series of test questions fit into the equation?

The big question is how we balance the requirements of accountability and the need for quantifiable and comparable data such as that presented by testing with the requirements to develop students as individuals prepared for what lies ahead, and the qualitative data this produces. I would suggest that I don’t know what the answer is to this dilemma however we are currently progressing steadily more towards the quantifiable end of the balance, with the continuing focus being put on exam results and standardized tests. I believe we need to re-establish a balance here before we lose sight of the importance of some of the less quantifiable but equally (and possibly more) important activities carried out within classrooms across the world. After all are we in the business of building students into test taking machines that regurgitate facts and knowledge or are we trying to develop individuals capable of life long learning?

http://www.teachingquality.org/content/blogs/bill-ferriter/blaming-and-shaming-teachers-low-level-edtech-practices Bill Ferriter (Sept 2015)

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: