E-ARK is a multinational bid data research project that aims to improve the methods and technologies of digital archiving, in order to archieve consistency on an Europe-wide scale.
Tackling a range of problems associated with independent record-keeping technologies, system and practices, E-ARK aims to impact the development of internationally accessible archives through: the provision of technical specifications and tools, the development of an integrated archiving infrastructure, the demonstration of improved availability, access and use, and the rigorous analysis of aggregated sets of archival data.
Running from 1st February 2014 to 31st January 2017 it has been co-funded by the European Commission under its ITC Policy Support Programme (PSP) within its Competitiveness and Innovation Framework Programme (CIP).
The E-ARK project has provided:
Guidelines on pan-European e-archiving system as part of EC e-infrastructure,
Open Archival Products (tools, services, framework, metadata specifications),
Open Technical Products (tools, services, metadata specifications),
Open Operational Products (ingest and access tools, services, metadata specifications),
Open Access tools, services, metadata specifications, including data mining tools for business intelligence,
Open interfaces from tools, services, metadata specifications to existing Systems Products,
Outcomes of legal study,
Outcomes of pilots, especially where similar archival material to that under consideration was processed,
Project papers on details of integration work undertaken.
In recent years, there has been a fundamental change in the notions surrounding what constitutes archiving. With the onset of open access and e-government policies, the image of the archive as the place where precious documents are kept hidden away forever has had to give way to alternative scenarios. E-government legislation across Europe and beyond has brought about a situation whereby archives are obliged to accept, store, and provide access to digital data on an ongoing basis. However, relatively few memory organisations have the sophisticated digital archiving infrastructure required to handle all aspects of these activities.
The process of gathering electronic content must take into account changed relationships between governments, governments and citizens, and governments and business. The move to e-interactions is supported by new business systems that streamline and automate transactions, enable integration of information and service delivery, and enhance collaboration between participants. Such changes in the way government business is carried out have significant implications for how public administrations document their activities, and make that information available to both government and citizens to aid future decision making and accountability.
These changes also need to be addressed in the archival environment within memory organisations. We need to ensure that all such digital information is appropriately gathered, along with all the contextual information required to ensure it remains comprehensible and accessible over the long term. The process of developing, implementing, and maintaining the tools, standards, and administrative processes required to support this activity, is by no means a straightforward exercise.
Another issue which memory institutions must address is changed expectations: everyone in the value chain now demands more in terms of discovery, access and re-use. The desire to valorise archival material and make it widely accessible is also part of the sea-change overtaking archival practice. Academic researchers; analysts from enterprise and commerce; and citizens must be supported as users of the valuable digital holdings residing in European multifarious digital archives. New and enhanced discovery methods need to be developed to support the full exploitation of our shared digital cultural heritage, and the expansion of the European Digital Single Market which underpins our digital economy.
There are also implications brought about by the scale of operations involved. The vast quantities of data of ever increasing complexity are potentially overwhelming. The rapid influx of material poses real challenges for archivists and administrators managing the process, as well as for the businesses, researchers, and citizens who use them.
Big Data is often hailed as a solution, but the underlying mechanics of Big Data are generally not well understood by end users. Perhaps more importantly, many researchers are not sufficiently familiar with the basic metadata practices required to enable them to track and query their data in the future. With the ever-increasing amounts of data involved, the failure to employ best practice is more than simply ‘problematic’, the adage: ‘garbage in garbage out’, takes on renewed significance when the scale of operation makes recovery more or less infeasible.
Addressing these issues formed part of the grand challenge posed by the European Commission (EC) in the eArchiving services Pilot B element of the Policy Support Programme (PSP) within the Competiveness and Innovation Framework Programme (CIP). One grant of just under €6million was awarded to the E-ARK consortium comprising five national archives (Denmark; Estonia; Hungary; Norway; Slovenia); four research institutions (Austrian Institute of Technology; University of Brighton, UK; University of Köln, Germany; University of Portsmouth, UK, Instituto Superior Técnico, Lisbon, Portugal); three SMEs (ES Solutions, Sweden; KEEP Solutions, Portugal; Magenta, Denmark); two government Home Offices (Portugal, Spain); and two pan-European umbrella organisations (the DLM Forum and the DPC).
E-ARK was thus conceived as an intensely practical project where modularity, extensibility, openness and inclusivity were design imperatives. Throughout the project, advisory boards, provided vital external input and validation from commercial and technical, archival, and data provider sectors.
E-ARK analysed existing pan-European best practices and discovered these to be inadequate without further modification, extension, and standardisation. In response, metadata specifications were drawn up for the preparation, ingest, transfer of digital content into archives, and for continued access to this material. For example, with data content types, E-ARK initially defined a number including Electronic Records Management Systems (ERMSs) such as SharePoint; databases; geo-spatial data; and simple file-based systems. These were explicitly designed to be extensible, making it possible to continue to add further data content types such as 3D scans for use in museums.
Existing open source software tools were examined, tested, and where appropriate modified to meet the new E-ARK specifications. Where necessary, completely new open source software components were designed, developed and implemented to cover the archiving workflow end-to-end.
The resulting eArchiving infrastructure was piloted in seven different scenarios across six countries. An end-to-end reference implementation, E-ARK Web, was produced and is available to be downloaded and installed locally. A data mining showcase demonstrated how to use Big Data techniques such as OnLine Analytical Processing (OLAP) on large datasets, which included geo-spatial data among the exemplars used. The scalable E-ARK infrastructure makes use of Big Data technologies such as Hadoop, Lily and Solr.
All this development was set against a European legislative framework and was supported by a legal study covering recent EU law on data protection, copyright and the reuse of Public Sector Information (PSI). On the business side, a maturity model was produced to assist institutions in assessing how well they are performing in their eArchiving activities. The results of the project are hosted in a knowledge centre, and will be maintained there for a minimum of ten years from the conclusion of the project.
Although geared towards national archives, the E-ARK methods, tools and infrastucture are of real use to regional and local archives, as well as archives in business; higher education; scientific and research data centres; the creative and cultural industries, etc. One of the highlights of the E-ARK project is the end-to-end approach towards database preservation and reuse which facilitates continued access to digital archives across the board, including e-government systems, wesite-driven research outputs, cultural heritage databases, and many more. The discovery methods developed within the project included Big Data techniques such as 'faceted search', which open up interdisciplinary research avenues across multifarious datasets. E-ARK's pan-European specifications and standards make it possible to search more easily across archives, and to open up new research questions.