I’d like to talk about what we have developed so far in the TILT project in collaboration with the British Library. Many museums and libraries have images of old books, typescripts and manuscripts for which they want to provide better access. However, all that is usually done is to present the user with a long list of lifeless book and page images, which fail to engage the user in any kind of meaningful interaction. As a first step to overcome this problem the page-images must be connected with the text of the books or manuscripts they relate to. The user needs to search a text, to find the corresponding page image, and to see where the searched-for text is on that image. The user will also need to read and comment on a text and to compare the transcription with the original document. These can easily be different if the original has been edited, if the spelling has been normalised, or if the transcription is simply inaccurate. A separate transcription can also greatly clarify what is written on the page, and for optimum readability on a variety of devices it should not follow the precise layout of the original.
Google has already tried this on a massive scale using OCR techniques to overlay an approximate text on top of the page image. Although this provides access to the content it doesn’t allow the text to be reformatted, commented on or edited. However, putting the text next to the image, as we do in TILT, creates a host of user interface problems: how can you ensure that the user can follow where he/she is on an image when reading the corresponding text? The answer lies in linking individual words and pages to the text, loading page-images automatically as needed, and also highlighting portions of images that correspond to words. So as the user moves the mouse over an image or the text, the corresponding words will be highlighted, making it easy to relate one to the other. Also, the text and the image can be kept in alignment, since the two are now connected.
TILT in two parts
So our idea for tackling this problem is to create two separate programs.
The first will exist solely on the server, and will be a faceless application. Its job is to recognise words on a page, and to link them with the supplied transcription using shape-based analysis rather than actual OCR. So the program has no idea which letters in the transcription correspond to which letters in the page-image, but works it out by establishing the sequence of word-shapes in the image, and by linking this sequence to the corresponding sequence of words in the text.
The second will be a browser-based application that will allow the user to manipulate the page-image links, and to provide fast ways to correct the alignment semi-automatically. As well as being able to draw word-shapes on the screen the interface will be able to specify two anchor-points, where one word corresponds to one shape, and then ask the server program to reassess the alignment between them. In this way it is hoped that TILT will facilitate the rapid creation of thousands of page-to-image links, which can be used to enhance the user experience for a wide variety of page image types.
What we have so far is a demonstrable version of the first program. We’d like to discuss ways that this technology can be made available to users, what features people would like to see and basically where we can go from here.
Here is a video of the current TILT in action.