I had been intrigued by the idea emerging from The National Archives (TNA) of the disruptive digital archive (DLF Forum, 2019) which advocated new theories and new practices with digital archives and more clearly articulated as a Bayesian model of digital preservation risk (iPRES, 2019). These ideas had been outliers for its project funded by the National Lottery Heritage Fund and involving colleagues at the Applied Statistics and Risk Unit at the University of Warwick and a range of archive services that were active in digital preservation from across the local authority, business and higher education sectors.
Risk has always had a high profile in digital preservation with early focus on hardware and software obsolescence, but experience has shown that this isn’t as bad as had been feared. I was interested to see how risks could be an active part of planning considerations rather than being the end of the line as a result of in-action.
I approached colleagues at TNA to ask if participation in the project was possible despite being unattached to an archive, lucky for me it was. I was invited to join the two day workshop which moved online following the shutdown, this meant of course missing the networking and catch-up opportunities that come with face to face meetings, I also needed to supply my own tea and biscuits.
Known unknowns and the Bayesian model
Where as Donald Rumsfeld famously said in 2002 about known unknowns “we know there are some things we do not know” the Bayesian model is based on the idea that knowledge about something unknown can be expressed using probability. Determined not to show my ignorance of the model I signed-up for a free course on Bayesian statistics on Open Learn which looked at this very topic.
The focus of the workshop was a series of questions where we were asked to provide 3 responses to identify what we thought the answer was, but also the range of where we thought the answer may fall between. For example (not one of the actual questions)…
Out of 1000 UK archivists how many do you think will be aware of the NDSA Levels of Digital Preservation?
If I thought the answer was 400 but that it could be as low as 100 or as high as 600 these would be my three responses to this question. Somebody else might think that awareness was much higher than this and so think the answer was 850 with a likely range of 775 to 900. This gives those undertaking the analysis data but also expressions relating to the degree or certainty or uncertainty about the topic. The power of the process comes from the diverse experiences, knowledge and perspectives that those contributing bring with them.
A key part of the workshop was the review of the questions – ensuring we all interpreted them in the same way. Definitions had been provided for many of the specialist terms, but it was never the specialist terminology that were hotly debated. Instead it was words like ‘all’, ‘some’, ‘most’ and ‘frequently’ that we discussed sometimes at length.
Discussion and review
We were left to answer the questions, without phoning a friend or using Google and ended day 1 by sending the answers to be compiled overnight. We returned on day 2 and went through the questions again, with access to the anonymised responses from all participants and discussions as to why there might be reasons why scores may be high or low. The purpose was not to try and reach a consensus. The variation in skills, knowledge and capabilities that we all knew existed and experienced in the real world could be seen through the responses. Having discussed the questions again we then had a chance to revisit our individual responses – some of my answers went up, some went down and some remained the same.
Having submitted our second responses, the work has now been passed across to colleagues at TNA and Warwick to work this into the final model. As things currently stand the plan is to launch the model in time for the ARA Conference in early September and two DPC online workshops have been announced for 16th June and the 17th July with written articles also in the pipeline.
As a supporter of both the NDSA Levels of Preservation and the DPC Rapid Assessment Model (both of which I will be returning to in future blogs) I am really looking forward to seeing the results and how this tool can help us with the decisions we make. I’d like to thank colleagues at TNA for allowing me to contribute to the work. It was intense but the mood was lifted by variations in virtual backgrounds, and the occasional “sorry I forgot to unmute”.