Standardized Testing

testing-signI have written a number of times about my feelings with regards standardized testing.    (You can read some of my previous postings here – Some thoughts on Data , Building Test Machines).   Having worked internationally in schools in the Middle East I am particularly aware of standardized testing and the weight put on the results from such testing.   Within the UAE there is a focus on ensuring that education is of an international standard with the measure of this international standard being the results from PISA and also from EMSA testing regimes.    As a result individual schools and their teachers are expected to pore over the EMSA results and analyse what the results mean.    I feel that this focus on a standardized testing regime such as PISA is misplaced as how can we on one hand seek differentiated learning tailored to students as individuals while measuring all students with the a single standardized measure.

As such it was with great interest I read the article in the TES titled, “Ignore Pisa entirely,’ says world expert”.     The article refers to comments provided by Professor Yong Zhao who I was lucky to see at an SSAT conference event back in 2009.    Back then I found Professor Zhao to be both engaging and inspiring as a presenter, with some of his thoughts echoing some of my own plus also shaping some of the thoughts and ideas that I came to develop.    Again I find myself in agreement with Professor Zhao.    I particularly liked his comment regarding the need for “creativity, not uniformity”.

I feel the focus on PISA is the result of valuing what is measurable as opposed to measuring what is valued.      Measuring student performance in a standardized test is easy, with various statistical methods then allowing for what appears to be complex analysis of the data, therefore lending us to be able to prove or disprove various theories or beliefs.     Newspapers and other publishers then sensationalize the data and create causal explanations.   Education in Finland was heralded to be excellent recently as a result of the results from PISA testing.     Teaching in the UAE was deemed to be below the world average however better than most other Middle East countries.    Did PISA really provide a measure of the quality of education?    I think not!

Can education be boiled down to a simple test?   Is a students ability to do well in the PISA test what we value?    Does it take into consideration the students pathway through learning as the pathway differs from one country to another?   Does it take into consideration local needs?   Does it take into consideration the cultural, religious or other contexts within which the learning is taking place?    Does it take into account students as individuals?    Now I acknowledge that it may be difficult or even impossible to measure the above however does that mean that we accept a lesser measure such as PISA just because it is easier?

There may be some place for the PISA results in education however I feel we would be much better focusing on the micro level, on our own individual schools and on seeking to continually improve, as opposed to what Professor Zhao described as little more than a “beer drinking contest”.

 

Advertisements

Some thoughts on Data

A recent article in the Telegraph (read it here) got me thinking once more about data.   This also got me thinking about the book “Thinking, Fast and Slow” by Daniel Kahneman which I have only recently finished reading.  The book highlighted a number of issues which I feel have implications for education and need to be considered by school leaders.

Firstly the small numbers effect:  The Bill and Melinda gates foundation commissioned a study to examine schools in search of the most effective schools.    It found, unsurprisingly that small size, in terms of student numbers, schools achieved the best results, over larger schools.   Contradictory it also found that small schools also achieved the worst results.   The reason for this as explained by Kahneman is that where a data set contains only a small number of items the potential for variability is high.   As such, due to a variety of random variables and possibly a little helping of luck, some small schools do particularly well, out achieving big schools.    Other small schools are not so lucky and the variables don’t fall so well, resulting in the worst results.

darts

To clarify this consider throwing three darts at a dart board aiming for the centre.   This represents the results of a school with a small number of students with higher scores being nearer centre and a lower score being those darts ending further from the centre.   In the case of student results an average result would then be calculated for the school and the same can be done looking at the position of the darts.   Assuming you are not a professional darts player you may do well or you may not do so well due to a variety of random variables.     Given the limited number of darts the potential for variability is high hence a high average or low average is very possible.   Next consider if you were to continue and throw sixty darts at the dart board, taking the average across all the dart throws.    Given the number of darts the average will regress towards your mean darts throwing ability.    The increased number of data items means that variability is reduced as each significant good or poor throw is averaged out among the other throws.

Within schools a great deal of value is being attached to statistical analysis of school data including standardised testing however care must be taken.   As I have suggested above a statistical analysis showing school A is better than school B could easily be the result of random factors such as school size, school resourcing and funding, etc as much as it may be related to better quality teaching and learning, and improved student outcomes.

Another issue if how we respond to the results.  Kahneman suggests that commonly we look for causal factors.   As such we seek to associate the data with a cause which in schools could be a number of different things however our tendency is to focus on that which comes easily to mind.   As such poorer (and better, although not as often,) results are associated most often attributed to teachers and the quality of their teaching as this is what is most frequently on the mind of school leaders.    We arrive at this conclusion often without considering other possible conclusions such as the variable difficulty of the assessments, assessment implementation, the specific cohort concerned, the sample size as discussed earlier and a multitude of other potential factors.   We also, due to arriving so quickly at a causal factor which clearly must be to blame and therefore needs to be rectified, fail to consider the statistical validity of our data.   We fail to consider the margins for error which may exist in our data including what we may consider acceptable margins for error.   We also fail to consider a number of other factors which influence our interpretation of the data including the tendency to focus more on addressing the results which are perceived to be negative.   This constant focus on the negative can result in a blame culture developing which can result in increasing negative results and increasing levels of blame.   Maybe an alternative approach which may work would be to focus more on the marginally positive results and how they were achieved and how they could be built upon.

The key issue in my belief is that we need to take care with data and the conclusions we infer from it.   We cannot abandon the use of data as how else would we measure how we are doing, however equally we cannot take it as fully factual.   The world is a complex place filled with variables, randomness and luck, and we need to examine school data bearing this fact in mind.   We also need to bear in mind that data is a tool to help us deliver the best learning opportunities for students;  data is not an end in itself!

 

Class sizes

This morning before walking out the front door I saw someone on a BBC morning programme suggesting that their political parties contribution to the education sector was a reduction in classroom sizes.

I find this interesting that classroom sizes continue to be considered as a measure of how good a school or an education system is.   In the case of the comments on BBC the person making the comments was equating classroom size to an improvement in the quality of education.

Hanushek (1998) suggested that the linkage between smaller class sizes and improved students results was “generally erroneous”.     Kahneman (2011) went further in suggesting that the fact associated with such a claim were “wrong”.

Kahneman’s explanation (2011) was that the reason for the findings relates to statistics and what he refers to as the “law of small numbers”.     A small class is made up of a smaller number of students which therefore results in higher levels of variability in terms of the average.    He uses an example of drawing coloured marbles from a jar to demonstrate this.   Consider randomly picking marbles from a jar containing red and blue marbles.  There is a higher probability of drawing out 3 of acolour (3 high achievers in a small class) than of drawing out 6 of a colour (6, the equivalent number of, high achievers but in a bigger class).

marbles

Within a larger class size there is a greater tendency towards regression to the mean and therefore a more stable and less variable average across schools.

The association of improved results resulting from more teacher time, more support, etc resulting from a smaller class sizes is therefore unfounded.    The improved results in schools with smaller class sizes is simply a feature of the statistical analysis of small sample sizes.   Kahneman suggests that if the researchers were to change their question and look at if poor results could be linked to small classes they would find this to be equally true.

My feeling on this is that generally class size doesn’t have a significant impact on student results within lower and upper limits.    Where the ratio is 1 teacher to 2 or 3 students I would expect to see a positive impact and equally at 50+ students I would expect to see a negative impact.   Within the larger range between 5 and 50 I would expect the impact to be minimal if evident at all.

Care needs to be taken with the use of statistics and care has to be taken in believing them.   As Kahneman explains, it is easy to create a causal explanation for why a given set of statistics such as those on class size make sense.   The ease with which a causal explanation comes to mind however doesn’t necessarily make the explanation and resulting judgement true.

Sources

Hanushek, E. A. (1998), The evidence on class size, W. Allen Wallis Institute of Political Economy

Kahneman, D. (2011), Thinking, fast and slow, Penguin Books

 

 

Gaming

The subject of schools “gaming” school league tables and performance measures such as Progress 8 has made the news recently so I have decided to contribute my opinion to the mix.    Before doing so I need to be clear that I don’t have any particularly strong views with regards this issue.  I therefore believe that my points represent a balanced viewpoint.    I will however acknowledge that my assessment of my viewpoint as balanced is based on the context as set by my viewpoint, perception and the paradigms within which I operate as an individual.   As such, from the point of view of those reading, including yourself, this may not be balanced after all.    I make no apologies for this as all I can offer is my opinion, which is never wrong in that it is my opinion and therefore is formed based on my viewpoint and context.

Back on the subject of “gaming” the discussion seems to have opposing viewpoints.   One of these viewpoints is that a school should try to offer its students the best opportunities for success in the future.   As such it is important to enable them to achieve as many successful qualifications as possible.    These schools therefore look to enroll students in qualifications which for minimal effort return successful qualification, such as ECDL.

The other viewpoint is that schools enrolling students in bulk in ECDL are doing so in order to influence league tables and performance measures such as Progress 8.    Educators taking this position are of the opinion that these qualifications are of lesser value than other qualifications which may take longer to achieve or which are more difficult to achieve yet have comparable impact on league tables and other performance measures.

For me there may be truth in both viewpoints.   If the studying of specific exams is in the interest of students’ futures then surely it is the correct thing to do.    Consider two schools which are identical in outcomes except for the fact that students in one achieve an additional ECDL qualification.    Surely this puts students who leave with an additional qualification in a more positive position.   I myself worked in a school where we delivered OCR National IT to all students.   The reason we did this was due to vocational nature of the qualification which suited out student cohort plus the breadth of study and options available which allowed us to accommodate for individual student needs and interests.

Equally there is truth in the other viewpoint in that if a school put all students in for the ECDL qualification or the OCR National they may have done so purely in the interest of achieving a better league table position than other schools.   This may put students under stress where the qualification is additional, or may represent an unfair advantage where an “easy” subject has been substituted in place of a more difficult or valued subject with an equivalent or near equivalent league table or performance measurement points worth.

Both of the viewpoints include identical actions in the batch enrolling students in a given qualification yet both viewpoints result in totally opposing opinions.    The key fact is not so much what schools do but why they do it.    In one viewpoint it is about the students and the benefit to them while in the other viewpoint it is about the school and getting the best league table or performance measure result.

If OFSTED are to clamp down on “gaming” they are therefore going to have to try and identify why a school took the chosen action.     How are they going to do this?    How are they going to measure the “intentions” of school leaders?       Are we going to start seeing OFSTED inspectors administering polygraph lie detector tests on school leaders?

I also feel that this discussion has a lesser discussed aspect to it in the value of differing qualifications.   This discussion has raged for some time on the value of so called “core” subjects and the perceived lesser value of the arts and creative subjects.      The new “gaming” discussions adds differing values in terms of the perceived difficult level of a course along with the time taken to deliver the course, with shorter courses perceived to have lesser value.   Who will decide the relative worth of each course and the total worth of any individual students curriculum of study?

We should all be working in the interests of our students to try and provide them every competitive advantage with regards Further Education or Higher Education options, or options into employment, or even more generally into their future lives.   A key part of this is the qualifications they achieve so we need to get them everything reasonably possible.   In teaching we use every trick in the book to try and make sure students are learning plus are ready and able to succeed in whatever assessment is required to achieve a given qualification.   If this is “gaming” then maybe we are all involved.

Mandatory Testing?

fail

As I was heading to work on Friday I heard a BBC news story regarding new proposals for “testing” of 4 year old children at the start of their school experience.    This immediately had me asking about the differences between assessment and testing.   I am not sure there is a difference however I am quite happy to listen to anyone who is able to explain this.

For me, independent of the age of students, one of the first things I need to do is to “test” or assess them.   I need to find out a little about them, about the things they like, about the things they are good at and the areas within which they still need to develop.    I have worked in secondary education, further education and higher education and across each stage the first thing I have done with new students is to assess or test them in order to help in planning their learning experience.

So this led me to ask why the story was so new worthy.    My first assumption was that it related to the differing perspectives and definitions for the term of “assessment” and “testing”.    It could be that some see the two terms as meaning the same thing, as I do, while others see each term as meaning something different.    This differing perspective leads to the debate around whether the proposal in question is a good or bad thing and therefore to a news worthy story.

Upon thinking on it further and accepting the commonality of the two terms I came to think that it is not what the two words mean or “are” which is the issue but the reason why we undertake them.    In the case of my testing at the start of working with new students, this is done as I know the benefit such testing will have in terms of providing the best learning experience possible.   It is done because good teaching demands it be done.    In the case of the news story they are discussing mandated testing.     The reason for mandating such testing may be linked to the reasoning I used in deciding to test however the fact it is mandated detracts for this.

The other issue is what is done with the results.   In my example the data is solely for me and to inform learning.  There needn’t be a score or a rubric attached.   In the case of mandated data collection those mandating it want the data which therefore required quantifiable and comparable scores and grades or at least we might assume this is the case.

Maybe we need to trust teachers more rather than mandating what must be done as the act of mandating something changes the activity being mandated!

 

Building Testing Machines.

booksAround 4 or 5 years ago while working in the Middle East as an educational consultant I asked around 200 colleagues as to what they considered the purpose of education to be via email.     I then analysed the words which those who replied used in their response.   At the time the word which came out as the most frequently used was “knowledge”.

At the time I wondered about this given access to the internet and its apparently boundless “knowledge”.     At the time 21st century skills were widely talked about as important however when it came down to it those working in education still clung to the importance of knowledge.   Words such as creativity, critical thinking, collaboration and communication appeared significantly less frequently.

It is a recent blog (See full blog here) which makes me reflect on my findings back then as it raises the issue of identifying what the purpose of education really is.        Mr Ferriters argument focuses on EdTech use although beyond this he goes to suggest that the superficial usage of EdTech may be the result of the pressures being put on teachers to achieve high student results in the terms of standardized tests.

This use of tests including PISA tests to measure the success of teaching and of education in a wider sense seems to imply that the purpose of education is to get students high test results.     I have very strong beliefs that this narrow view on education is damaging to student learning.    As educators is the purpose of education not to prepare students for the future, with the skills required to deal with the largely unpredictable and the often changeable.     Is education not about developing students as adaptable, resilient, self-aware, responsible members of local and global society?   And if this is the case how do a series of test questions fit into the equation?

The big question is how we balance the requirements of accountability and the need for quantifiable and comparable data such as that presented by testing with the requirements to develop students as individuals prepared for what lies ahead, and the qualitative data this produces.    I would suggest that I don’t know what the answer is to this dilemma however we are currently progressing steadily more towards the quantifiable end of the balance, with the continuing focus being put on exam results and standardized tests.   I believe we need to re-establish a balance here before we lose sight of the importance of some of the less quantifiable but equally (and possibly more) important activities carried out within classrooms across the world.     After all are we in the business of building students into test taking machines that regurgitate facts and knowledge or are we trying to develop individuals capable of life long learning?

 

http://www.teachingquality.org/content/blogs/bill-ferriter/blaming-and-shaming-teachers-low-level-edtech-practices    Bill Ferriter   (Sept 2015)