SWAT - Snappy Web Archiving Tool

SWAT is a tool designed for archiving web sites and displaying the archive in a simple and pedagogical way. Besides harvesting the original files from the web site, SWAT generates snapshots of each page to TIFF files and describes the entire archive in a METS-file. By generating snapshots with an arbitrary rendering engine (webkit, trident, gecko etc.) one can ensure that future generations will understand how the page looked like without having to render the html themselves. The system is coded in Ruby and is accessed via a web application that is built with the Rails framework.

The work flow in SWAT consists of these steps:

A SWAT technician, most probably an archivist, logs in to the system and creates an account for an organisation.
A web administrator from the organisation logs in to the system and selects the option of creating a new webchive, i.e. a web archive. When creating the webchive, the administrator enters required information such as title, publisher etc, selects which browser engines that should be used when creating all the snapshots and enters which minimum resolution that the browser engines will operate in. Finally the administrator provides the system with a complete sitemap of that which is to be archived.
Immediately after the web administrator has submitted all the necessary data concerning the webchive, the system starts working. All files from the sitemap are harvested, analysed, categorised and sorted. When the work is complete, the webchive is made publically available.
When a webchive is completed, the web administrator responsible can add describing documents to the webchive. This is an option accessible from the first view of a webchive.
Since the webchive is completed, the matter of long term storage remains. This step is conducted by the one that started the whole process, the archivist. After selecting the webchive, an SIP can be downloaded, examined and possibly allowed to be ingested into an OAIS based archival system.