School Data: The tip of an iceberg

Schools gather a wealth of data in their everyday operation, everything from attendance information, academic achievement, library book loans, free school meals and a wide range of other data.    We use this data regularly however I think we are missing out on many opportunities which this wealth of data might provide.

The key for me lies in statistical analysis of the data looking for correlations.     Is there a link between the amount of reading a student does as measured by the number of library loans and their academic performance for example?     Are there any indicators which might help is in identifying students who are more likely to under perform?

The issue here is how the data is stored.   A large amount of the data is stored in tables within our school management system however no easy way exists in order to pull different data together in order to search for correlations.    I can pull out data showing which students have done well, which subjects students perform well in, etc. however I can’t easily cross link this with other information such as the distance the student travels to school or their month of birth.    Some of the data may exist in separate systems such as a separate library management system, print management system and catering system.    This makes it even more difficult to pull data together.

A further issue is that the data in its raw format may not make it easy for correlations to be identified.    Their postcode for example is not that useful in identifying correlations however if we convert this to a distance from the school we have a better chance of identifying a correlation.

In schools we continue to be sat on an iceberg worth of data although all we can perceive is that which lies above the water.   We perceive a limited set of possibilities in terms of what we can do with the data.    Analysing it in terms of pupil performance against baselines with filtering possible my gender, SEN status and a few other flags however given the wealth of data we have this is just the start of what is possible.    We just need to be able to look below the water as the potential to use the data better and more frequently is there, and in doing so we may be able to identify better approaches and more effective early interventions to assure the students in our care achieve the best possible outcomes.


Data: Making better use?

One of my areas which I want to work on over the next year will be that of Management Information.   In my school as in almost all schools we have a Management Information System (MIS), sometimes referred to as a SIS (School or Student Information System).    This systems stores a large amount of student data including info on their performance as measured by assessments or by teacher professional judgement.    We also have data either coming from or stored in other data sources such as GL or CEM in relation to baseline data.   These represent the tip of the iceberg in terms of the data stored or at least available to schools and their staff.

Using the data we then generate reports which do basic summaries or analysis based on identified factors such as the gender of students, whether they are second language learners of English, etc.  Generally these reports are limited in that they consider only a single factor at a time as opposed to allowing for analysis of compound factors.   So gender might be considered in one report and then age in another, but not gender and age simultaneously.   In addition the reports are generally reported in a tabular format, with rows and columns of numeric values which therefore require some effort in their interpretation.    You cant just look at a tabular report and make a quick judgement, instead you need to exercise some mental effort in examining the various figures, considering and then drawing a conclusion.

My focus is on how we can make all the data we have useful and more usable.    Can we allow staff to explore the data in an easier way, allowing for compound factors to be examined?    Can we create reports which present data in a form from which a hypothesis can be quickly drawn?    Can the data be made to by live and dynamic as opposed to fixed into the form of predetermined “analysis” reports?   Can we adopt a more broad view of what data we have and therefore gather and make greater use of a broader dataset?

I do at this point raise a note of caution.   We aren’t talking about doing more work in terms of gathering more data to do more analysis.  No, we are talking about allowing for the data we already have to be better used and therefore better inform decision making.

I look forward to discussing data on Saturday as part of #EdChatMeda.    It may be the after this I may be able to better answer the above questions.

Data, data and more data

waitingroomThis morning it was the turn of the NHS to be the focus of the morning TV discussion about how things aren’t going well.    I suppose I should be partially thankful as this takes the spotlight off education at least for a short while.    That said it also once again shows the superficial use of data.

This mornings TV took some time, along with fancy graphics, to outline how the NHS waiting times had increased.   The specific figure they presented being the percentage of patients at A&E who were seen within 4 hours.   This seems like a reasonable statistic to use from the perspective of a patient as it suggests the likelihood that should I need to turn up at A&E I would be seen in 4 hours of less.   I suspect the fact that it is so potential meaningful for prospective patients, the average TV viewer, is why they picked this statistic over others.

The issue with this is what it doesn’t tell us the additional context which may be important in interpreting the figures.    Over the period under consideration did the number of patients attending A&E remain static or did they in fact increase which may be a contributing factor to increased waiting times?     A briefing report by Carl Baker from November 2016 suggested that in 2016 the number of A&E patients at major A&E departments increased 6.3% over attendance levels in 2015.   Were there any changes in the demographics of patients attending A&E as an increase in elderly people attending may mean that patients are less likely to be able to be quickly seen and discharged, again contributing to increased waiting times.    What about the staffing levels of A&E over the period?   Did this change as a reduction in staffing may account for increased waiting times?   Also the figures look specifically at average data for the whole of England; were there any regional variations?   Personally I live in the South West and feel that it is difficult to access a doctor which may mean that I would attend A&E on occasions where someone with more ready access to a GP would not.    Are there also differences between A&Es serving urban and rural areas?   Are there differences between A&Es serving large versus those serving smaller populations or population densities?

In the current performance indicator and accountability led environment we often focus on specific figures such the percentage of patients seen in 4 hours or the number of pupils achieving A*-C or Progress 8, PISA, EMSA, TIMMS, PIPS or other measures.    Each of these pieces of data is informative and tells us something however equally there are a lot of things that it doesn’t tell us.    We need to ask what doesn’t this data tell us and seek data to add context.

Only with context is data useful.

Accident and Emergency Statistics: Demand, Performance and Pressure, C Baker (2016), House of Commons Briefing Library (6964)


Some thoughts on GCSE and A-Level results

criminalatt from freedigitalphotosHaving read various articles following the recent A-Level and GCSE results I cant help but think that schools and more importantly education in general needs to make a decision as to what we are seeking to achieve, and stop acting re-actively to limited data which has been used to draw generalized conclusions.

Take for example the shortage of STEM graduates and students.    This was and still is billed as a big issue which has resulted in a focus on STEM subjects in schools.   More recently there has been a specific focus on computer programming and coding within schools.     In a recent article it was acknowledged that the number of students taking A-Level Computing had “increased by 56% since 2011” (The STEM skills gap on the road to closing, Nichola Ismail, Aug 2016).     This appears to suggest some positive movement however in another article poor A-Level ICT results were cited as a cause for concern for the UK Tech industry (A Level Results raise concern for UK tech industry, Eleanor Burns, Aug 2016).  Now I acknowledge this data is limited as ideally I need to know whether ICT uptake has been increasing and also whether A-Level Computing results declined, however it starts to paint a picture.

Adding to this picture is an article from the guardian discussing entries:

Arts subjects such as drama and music tumbled in terms of entries, and English was down 5%. But it was the steep decline in entries for French, down by 6.5% on the year, as well as German and Spanish, that set off alarm bells over the poor state of language teaching and take-up in Britain’s schools.

Pupils shun English and physics A-Levels as numbers with highest grades falls, Richard Adams, Aug 2016)

So we want STEM subjects to increase and they seem to be for computing, however we don’t want modern languages entries to fall.   Will this mean that next year there will be a focus on encouraging students to take modern foreign languages?    And if so, and this results in the STEM numbers going down will we then re-focus once more on STEM subjects until another subject shows signs of suffering.

It gets even more complex when a third article raises the issue of Music A level Entries which “dropped by 8.8% in a single year from 2015 and 2016”.  (We stand back and allow the decline of Music and the Arts at our peril. Alun Jones, Aug 2016).    Drama entries are also shown to have seen a decrease this year (Dont tell people with A-Levels and BTecs they have lots of options, Jonathan Simons, Aug 2016).  So where should our focus lie?   Should it be on STEM subject, foreign languages, drama or Music?

I suspect that further research would result in further articles raising concerns about still further subjects, either in the entries or the results.   Can we divide our focus across all areas or is there a particular area, such as STEM subjects, which are more worthy of focus?  Do the areas for focus change from year to year?

As I write this my mind drifts to the book I am currently reading, Naseem Talebs, The Black Swan, and to Talebs snooker analogy as to variability.     We may be able to predict with a reasonable level of accuracy, a single snooker shot however as we try to predict further ahead we need more data.    As we predict five shots ahead the quality of the surface of the table, the balls, the cue, the environmental conditions in the room, etc. all start to matter more and more, and therefore our ability to predict becomes less and less accurate.      Taking this analogy and looking at schools what chance do we have of predicting of the future and what the UK or world will need from our young adults?    How can we predict the future requirements which will be needed from the hundreds of thousands of students across thousands of schools, studying a variety of subjects from a number of different examining bodies, in geographical locations across the UK and beyond.

These generalisations of data are subject to too much variability to be useful.    We should all focus on our own schools as by reducing the scope we reduce the variability and increase the accuracy.   We also allow for the context to be considered as individual school leaders may know the significant events which may impact on the result of their cohort, individual classes or even individual students.  These wide scale general statements as to the issues, as I have mentioned in a number of previous postings, are of little use to anyone.   Well, anyone other than editors wishing to fill a space in a newspaper or news website.






Some thoughts on Data

A recent article in the Telegraph (read it here) got me thinking once more about data.   This also got me thinking about the book “Thinking, Fast and Slow” by Daniel Kahneman which I have only recently finished reading.  The book highlighted a number of issues which I feel have implications for education and need to be considered by school leaders.

Firstly the small numbers effect:  The Bill and Melinda gates foundation commissioned a study to examine schools in search of the most effective schools.    It found, unsurprisingly that small size, in terms of student numbers, schools achieved the best results, over larger schools.   Contradictory it also found that small schools also achieved the worst results.   The reason for this as explained by Kahneman is that where a data set contains only a small number of items the potential for variability is high.   As such, due to a variety of random variables and possibly a little helping of luck, some small schools do particularly well, out achieving big schools.    Other small schools are not so lucky and the variables don’t fall so well, resulting in the worst results.


To clarify this consider throwing three darts at a dart board aiming for the centre.   This represents the results of a school with a small number of students with higher scores being nearer centre and a lower score being those darts ending further from the centre.   In the case of student results an average result would then be calculated for the school and the same can be done looking at the position of the darts.   Assuming you are not a professional darts player you may do well or you may not do so well due to a variety of random variables.     Given the limited number of darts the potential for variability is high hence a high average or low average is very possible.   Next consider if you were to continue and throw sixty darts at the dart board, taking the average across all the dart throws.    Given the number of darts the average will regress towards your mean darts throwing ability.    The increased number of data items means that variability is reduced as each significant good or poor throw is averaged out among the other throws.

Within schools a great deal of value is being attached to statistical analysis of school data including standardised testing however care must be taken.   As I have suggested above a statistical analysis showing school A is better than school B could easily be the result of random factors such as school size, school resourcing and funding, etc as much as it may be related to better quality teaching and learning, and improved student outcomes.

Another issue if how we respond to the results.  Kahneman suggests that commonly we look for causal factors.   As such we seek to associate the data with a cause which in schools could be a number of different things however our tendency is to focus on that which comes easily to mind.   As such poorer (and better, although not as often,) results are associated most often attributed to teachers and the quality of their teaching as this is what is most frequently on the mind of school leaders.    We arrive at this conclusion often without considering other possible conclusions such as the variable difficulty of the assessments, assessment implementation, the specific cohort concerned, the sample size as discussed earlier and a multitude of other potential factors.   We also, due to arriving so quickly at a causal factor which clearly must be to blame and therefore needs to be rectified, fail to consider the statistical validity of our data.   We fail to consider the margins for error which may exist in our data including what we may consider acceptable margins for error.   We also fail to consider a number of other factors which influence our interpretation of the data including the tendency to focus more on addressing the results which are perceived to be negative.   This constant focus on the negative can result in a blame culture developing which can result in increasing negative results and increasing levels of blame.   Maybe an alternative approach which may work would be to focus more on the marginally positive results and how they were achieved and how they could be built upon.

The key issue in my belief is that we need to take care with data and the conclusions we infer from it.   We cannot abandon the use of data as how else would we measure how we are doing, however equally we cannot take it as fully factual.   The world is a complex place filled with variables, randomness and luck, and we need to examine school data bearing this fact in mind.   We also need to bear in mind that data is a tool to help us deliver the best learning opportunities for students;  data is not an end in itself!


Inconsistency in the quality of teaching

I have came across the above statement or similar statements across schools both here and in the middle east.   At first reading I would suggest that everyone, myself included, will take this to be a negative comment.   On reflection I am not so sure it necessarily is negative or in fact that it tells us anything.

Consider the “average” school and lets consider that the measure of quality of teaching is student outcomes.    Now I know this is a very limited model however it will hopefully serve its purpose in terms of proving a point which could equally be proven by using a different measure for the quality of teaching.

Within this “average” school there will be some above average teachers where outcomes come out as very positive.   There will also be those that come out as below average.    Would this be considered as consistent as clearly having different qualities of teaching would suggest inconsistency?

Lets assume what is meant be inconsistency is an inconsistency when compared with the national profile for the quality of teaching within any given school.    In this case our average school now becomes consistent in terms of quality of teaching.    Consistency is therefore referring to the distribution of individuals within the school with regards the quality of teaching, and how this compares to other schools.

Modifying the scenario a little lets say that some of the so called “weaker” teachers performance only gets worse while the stronger teachers only get better.    Our average still remains the same however is the school any more or less consistent given the wider variance between teachers and given the difference between this profile and the profile of the “average” school?

If some of the teachers formerly within the “average” band improve this would shift the average and change the distribution.   Is this inconsistency and if so could it not be viewed as a positive inconsistency?

Now I was considering using some further examples however have decided not to.  Instead I will point out my belief in the fact that teaching is a social activity involving a class full of students and a teacher all interacting.   Given it is a social activity involving 30 or more human beings and therefore influenced and affected by a multitude of different dynamic variables, consistency is highly unlikely.    Teaching is very much like chaos theory in that it is highly sensitive to its conditions, which are frequently changing.    As such how could any school be expected to demonstrate consistency?   Like chaos theory, we can only possibly perceive a pattern by looking at the much wider picture, as under close inspection we see nothing except the variability and the differences.    How might an inspection team or an internal mock-sted see this big picture?   I doubt they would do so how can a judgment indicating an inconsistency be arrived at?

And maybe something different, unique or not fitting in with the usual run of play may be a positive thing.   So maybe consistency isn’t all it’s meant to be!

Silos of Data

Day 11 in the #29daysofwriting house and the housemates are getting a little restless………

Sorry couldn’t resist!  This posting every day is starting to feel a little like the diary room on an episode of Big Brother.   It is also getting steadily more difficult to decide on the topic of the day.

Today I would like to just spend 29mins writing on systems.   In schools we have a large number of different systems.   We have a school (or management) information system, an HR and payroll system, an email and file storage system, a library system, a bus/transport system and a multitude of other systems.


Each system is designed for a specific purpose.   The SIS (or MIS) system has all the personal details of students along with their academic performance data.   The library system has details of students, books and loans.   The HR system has details about all of the staff.

Each system reports its data in a specific way.  The SIS system can produce class registers and parental reports, while the HR system can produce staff lists and the Library system information about student lending habits.

The issue is that even where the systems are supposedly “integrated” in actual fact they are not.    The data exists in Silos, independently in each different system albeit linked by a common identifier such as a student ID number or other ID number.

Having recently read about the impact of Silos and how overcoming them can have a significant effect it makes me wonder about the Silos in school systems.    If we could extract all the data into a single common location where we could apply various business intelligence tools to analyse it we would likely be able to draw new conclusions and through doing so be better informed.   We might be able to identify linkages which previously weren’t apparent.   Maybe students in particular classes or with particular teachers borrow more books and maybe, of these students, a majority perform better.   Obviously I speculate here for illustrative purposes.    The key point being is that we might be able to identify patterns which currently cannot be identified due to the Silo’d nature of data.