Schools gather a wealth of data in their everyday operation, everything from attendance information, academic achievement, library book loans, free school meals and a wide range of other data. We use this data regularly however I think we are missing out on many opportunities which this wealth of data might provide.
The key for me lies in statistical analysis of the data looking for correlations. Is there a link between the amount of reading a student does as measured by the number of library loans and their academic performance for example? Are there any indicators which might help is in identifying students who are more likely to under perform?
The issue here is how the data is stored. A large amount of the data is stored in tables within our school management system however no easy way exists in order to pull different data together in order to search for correlations. I can pull out data showing which students have done well, which subjects students perform well in, etc. however I can’t easily cross link this with other information such as the distance the student travels to school or their month of birth. Some of the data may exist in separate systems such as a separate library management system, print management system and catering system. This makes it even more difficult to pull data together.
A further issue is that the data in its raw format may not make it easy for correlations to be identified. Their postcode for example is not that useful in identifying correlations however if we convert this to a distance from the school we have a better chance of identifying a correlation.
In schools we continue to be sat on an iceberg worth of data although all we can perceive is that which lies above the water. We perceive a limited set of possibilities in terms of what we can do with the data. Analysing it in terms of pupil performance against baselines with filtering possible my gender, SEN status and a few other flags however given the wealth of data we have this is just the start of what is possible. We just need to be able to look below the water as the potential to use the data better and more frequently is there, and in doing so we may be able to identify better approaches and more effective early interventions to assure the students in our care achieve the best possible outcomes.
As we make greater use of technology in our schools we make greater use of online services. We might make use of an online communication tool to improve on communications with parents. We might make use of Google Apps or Office 365 to allow staff and students to have cloud storage so they can access their files when away from the school or on any device. We might engage with an online maths tutorial site so students can undertake self directed study online and further develop their maths skills. We might make use of a site to manage trips or resource bookings within our school. The number of online services we are using in schools is increasing and therefore we are sharing more and more data with online service vendors.
The above is important to note given the new general data protection regulations are speeding towards us. These new regulations will come into operation in May 2018 and will put a focus on all organisations to prove that they comply. It is therefore important that all organisations including schools get a handle on the data which they have and how it is stored and processed. For schools part of this includes examining where third party services are being used such that the schools data is processed and/or stored by these service providers. We need to be asking what these service providers do to ensure the security of our data.
To aid the above, the need to review third parties, and the increasing use of third party online sites, the government has created their Self Certification process for vendors to self-certify their provision in relation to data protection where they offer cloud software services for schools. You can view this here. The thing that worries me is that as I write this there are only 38 vendors listed which appear to have submitted a self certification. This represents only the very very tip of the iceberg which represents the vast range of services being used by school.
We all need to push vendors to answer questions in relation to the protection of our school data. We need to push them to self-certify and to share what they are doing. We need to ask the difficult questions now before they are asked of us later.
Have you considered the data protection of school data on third party services lately? It is time you did!
One of my areas which I want to work on over the next year will be that of Management Information. In my school as in almost all schools we have a Management Information System (MIS), sometimes referred to as a SIS (School or Student Information System). This systems stores a large amount of student data including info on their performance as measured by assessments or by teacher professional judgement. We also have data either coming from or stored in other data sources such as GL or CEM in relation to baseline data. These represent the tip of the iceberg in terms of the data stored or at least available to schools and their staff.
Using the data we then generate reports which do basic summaries or analysis based on identified factors such as the gender of students, whether they are second language learners of English, etc. Generally these reports are limited in that they consider only a single factor at a time as opposed to allowing for analysis of compound factors. So gender might be considered in one report and then age in another, but not gender and age simultaneously. In addition the reports are generally reported in a tabular format, with rows and columns of numeric values which therefore require some effort in their interpretation. You cant just look at a tabular report and make a quick judgement, instead you need to exercise some mental effort in examining the various figures, considering and then drawing a conclusion.
My focus is on how we can make all the data we have useful and more usable. Can we allow staff to explore the data in an easier way, allowing for compound factors to be examined? Can we create reports which present data in a form from which a hypothesis can be quickly drawn? Can the data be made to by live and dynamic as opposed to fixed into the form of predetermined “analysis” reports? Can we adopt a more broad view of what data we have and therefore gather and make greater use of a broader dataset?
I do at this point raise a note of caution. We aren’t talking about doing more work in terms of gathering more data to do more analysis. No, we are talking about allowing for the data we already have to be better used and therefore better inform decision making.
I look forward to discussing data on Saturday as part of #EdChatMeda. It may be the after this I may be able to better answer the above questions.
This morning it was the turn of the NHS to be the focus of the morning TV discussion about how things aren’t going well. I suppose I should be partially thankful as this takes the spotlight off education at least for a short while. That said it also once again shows the superficial use of data.
This mornings TV took some time, along with fancy graphics, to outline how the NHS waiting times had increased. The specific figure they presented being the percentage of patients at A&E who were seen within 4 hours. This seems like a reasonable statistic to use from the perspective of a patient as it suggests the likelihood that should I need to turn up at A&E I would be seen in 4 hours of less. I suspect the fact that it is so potential meaningful for prospective patients, the average TV viewer, is why they picked this statistic over others.
The issue with this is what it doesn’t tell us the additional context which may be important in interpreting the figures. Over the period under consideration did the number of patients attending A&E remain static or did they in fact increase which may be a contributing factor to increased waiting times? A briefing report by Carl Baker from November 2016 suggested that in 2016 the number of A&E patients at major A&E departments increased 6.3% over attendance levels in 2015. Were there any changes in the demographics of patients attending A&E as an increase in elderly people attending may mean that patients are less likely to be able to be quickly seen and discharged, again contributing to increased waiting times. What about the staffing levels of A&E over the period? Did this change as a reduction in staffing may account for increased waiting times? Also the figures look specifically at average data for the whole of England; were there any regional variations? Personally I live in the South West and feel that it is difficult to access a doctor which may mean that I would attend A&E on occasions where someone with more ready access to a GP would not. Are there also differences between A&Es serving urban and rural areas? Are there differences between A&Es serving large versus those serving smaller populations or population densities?
In the current performance indicator and accountability led environment we often focus on specific figures such the percentage of patients seen in 4 hours or the number of pupils achieving A*-C or Progress 8, PISA, EMSA, TIMMS, PIPS or other measures. Each of these pieces of data is informative and tells us something however equally there are a lot of things that it doesn’t tell us. We need to ask what doesn’t this data tell us and seek data to add context.
Only with context is data useful.
Accident and Emergency Statistics: Demand, Performance and Pressure, C Baker (2016), House of Commons Briefing Library (6964)