Project

Overview
Digital

Libraries

on the Web
Digital

Library

Projects
A Digital

Library

Vocabulary
Resources

Digital Libraries Homepage

DIGITIZING THE THATCHER LIBRARY;
Making the Search for Rosebud More Pleasurable.

 


Tell me something, Miss Anderson. You're not Rosebud, are you?
(Reporter Jerry Thompson to Bertha Anderson in Orson Wells' Citizen Kane.)

The shooting script for the classic American motion picture Citizen Kane describes in one of its opening scenes a library that is wholly uninviting and nearly impossible to access, the Thatcher Memorial Library. The vault room, where investigative reporter Jerry Thompson must search for his treasure, has "all the warmth and charm of Napoleon's tomb." The curator, Bertha Anderson, is described as an "elderly, mannish spinster." Following stern warnings from Ms. Anderson about the use of Thatcher's journal and about the time constraints he will be under, Thompson begins his reading in the stark, marble tomb. At the appointed time, the stern curator returns and smugly informs him, "You have enjoyed a very rare privilege, young man." Thompson is not so sure. What is worse, he leaves the library without the answer to his question–who or what was Rosebud? Thatcher's curator could cite specific pages for Thompson to scan, and she may very well have known there was no mention of Rosebud in Thatcher's journal, but she didn't offer and he didn't ask. Worse still, at the direction of the library's directors, she limited Thompson's access to information so severely that he could offer her nothing but disrespect as he left the sacred tome to be reshelved by the armed guard in the library's vault. The image of the stern curator, admittedly, may be a cinematic exaggeration, even the worst stereotype a librarian can imagine, but it is a useful starting point for rebuilding the Thatcher library into something more inviting, a digital library.

So what lessons can we apply in digitizing the Thatcher library? Quite a few really. First, let's put an inviting and charming face at the entrance, inviting people in through the electronic doors rather than staunchly guarding an austere, cold vault sealed with massive doors that defy ingress. Let's greet our visitors and usher them in to peruse the treasures that lie in wait to be discovered. Invite them in and ask them to stay as long as they would like, for the collections they've approached are there for them to experience now, not just as a remembrance of things past, not to be sealed in an electronic vault that must be guarded with armaments and strict time limits. And our richest treasures will be safe, because they've been scanned, recorded, digitized, preserved electronically where we can accommodate all who want access and never be concerned that the occasional unscrupulous user has strapped under his shirt a razor knife and hidden compartments for systematically taking away what we've worked hard to collect and preserve. A simple, warm entrance with a few doors (links) and a few exhibits will lure most people into our library, but, once they've entered, they'll find it hard to leave.

If we want frills for our digital library, a digital greeting that I have found to be particularly inviting, and that takes advantage of our technological prowess, is the multimedia opening to Softkey International's CD ROM Multipedia, the "reference library that starts where your encyclopedia leaves off." Using video and audio, a user is invited in by a smiling, beckoning information specialist to peruse the stacks and is given a few basic options for browsing. This seems a good start. Computers still intimidate some people, but if you can put a human face at the entry, most people will warm to the idea of approaching a digital collection more quickly. I don't know that the bells and whistles are really so necessary for the digital library on the Web (we must keep in mind that many people have phone lines that limit their Internet connection speeds, modems that won't accommodate high speed digital transmission, computers that need upgrading but must wait until the car repairs are completed), but a clean, inviting opening page that provides doorways to more information without overwhelming the first time visitor might do as much as Multipedia's charming and talkative information specialist.

And should the visitor want a guided tour, hypertext provides the perfect medium for taking our electronically enabled visitor by the hand, guiding him or her to the various rooms of our library, and inviting further scrutiny. Browsing has always been a favorite means of exploring a library's print collection–pointing and clicking serves just as well for the casual browser in an electronic collection. And the stacks never close. The sound and video equipment are always accessible. We must keep in mind, however, that too many choices on first glance may turn away all but the most inquisitive. To use an old acronym familiar to most computer programmers, KISS (keep it simple stupid) should guide our design.

Second, for those who need guidance, we should provide assistance. No digital library is complete without a reference librarian to query when needed. The Internet Public Library has its own electronic reference desk, a place where people can pose questions and get responses from other people. Thompson was greeted with a curator at the Thatcher library who told him what pages he was allowed to peruse. The digital library reference librarian should be prepared to offer the questioner numerous resources that he or she can peruse at leisure, suggesting specific sources that fit the question.

Third, for those who like to do their own sleuthing, we need a super-catalog, not just a subject searchable interface that allows only predetermined matches on user queries, but an electronic interface that will allow matches anywhere in our full text collections and that will also search our media collections for the appropriate film clips and the appropriate sound bytes. We have technology now that can enable all of this. The user interfaces may still need some polishing, but anyone who has recently used one of the major search services on the Web will likely attest to the improved retrieval that keyword searching now affords them. The Web, unruly and unwieldy beast that it is, still yields wildly inappropriate results to even the most experienced searchers, but, applying the technology developed by such Web powers as AltaVista and Inktomi to digital libraries, where collections have been chosen carefully and organized meticulously, promises power searching with the most rudimentary knowledge of system protocols and instructions. Just think what Thompson could have done with Thatcher's journal had it been digitized and indexed by the Inktomi search engine! He wouldn't have needed but seconds of the limited time begrudgingly allotted him by the library's curator to discover that nowhere in the journal was there one mention of Rosebud. The best the curator could give him was a span of pages where Kane was mentioned. No doubt she knew it by heart and could have told the journalist that Rosebud appeared nowhere, but, like so many people who do research, Thompson didn't ask the right question and received only a general response.

For the timid, the uncertain, the embarrassed, a Web search interface on a digital library collection is the next best thing to having a personal reference librarian. They are still useful--reference librarians, that is. Just ask Lisa Matson and David Bonski, who concluded in a 1997 article for Online that "Digital libraries need librarians. Jaime Carbonell of Carnegie Mellon University, pointed the way when he wrote: 'Advances in all these technologies are underway, but are not yet coordinated and targeted at the task of creating a digital librarian'."1 But some researchers prefer the thrill of the hunt, as long as the hunt is productive. Current Web interfaces have gone a long way toward empowering searchers using complex computer algorithms and full text indexing. Thompson, eat your heart out.

Fourth, we need to provide access to the original materials if possible. Just providing intellectual access, an interface, to the materials that a researcher or browser hopes to find still does not completely cover all the user's needs. Finding a resource is one thing. Reading it or appreciating it is yet something more that needs consideration. Thompson had Thatcher's hand-written journal to pore over in order to find what he wanted. In many cases it is important to preserve the original, to be able to display it in its original format, as in the case of Scandinavian manuscripts that are viewable at Projekt Runeberg. Here a researcher can view the original in digital form and read the OCR transcribed text. Why is it important to provide the originals, though? Shouldn't the text itself be sufficient? In many cases, seeing the original provides more authenticity to the retrieved prize, so reproductions of the originals are important. Besides, OCR, even at its best, unedited, can be fraught with errors. Thompson would not have trusted an OCR conversion of Thatcher's journal, even had it been available. And viewing the original of a hand-written record can provide other clues. Is the handwriting relaxed? Is it stressed? Is it slurred? These easily overlooked clues to the original intent and thought might be missed in a transcribed version, so, yes, having images of the original is important. We can do this in a digital collection. To provide access, of course, it has to either be transcribed or indexed in some manner, but there are many means for handling both.

We must also consider when we provide images of the original manuscripts that people dislike scrolling and love printing, so we should strive to provide reproductions in our digital library that fit neatly on a standard resolution display (this can vary from 640 x 480 pixels to 1280 x 1024 pixels). As Michael Lesk so aptly points out in his Practical Digital Libraries, viewing a newspaper page reproduced on a 640 x 480 monitor is like looking at the paper through a 3 inch by 5 inch index card sized viewer—in short, frustrating. These are things that still can't be resolved effectively because of varying computer capabilities of our visitors. We can, however, try to avoid huge displays suited only for large monitors running at high resolutions by providing lower resolution alternatives in addition to high resolution scans. And, yes, the originals are not only nice to see, but, in some instances, important to examine. We can do all of this now and should. And we can still provide access to the originals via the transcribed plain text versions. Indexing, it is called, something our digital library can do so easily and transparently that Thompson might never have even bothered with the stodgy curator of the Thatcher library had it been an option.

Fifth, let's not ignore that video and audio records can be extremely useful means of discovering information. We should provide access to film clips and sound bytes and provide keyword access as well. MPEG offers a digital standard for compressing and storing video, so we can archive hours of footage and use up relatively little disk storage doing it. What's more, MPEG is an accepted Internet standard, so Thompson could have even scanned videos from his home computer, had he had the Internet and a PC. And, if our visitors have fast Internet connections or access to a library with a fast connection, we can provide streaming audio and video using Real Player, thus eliminating the wait while a sound and video download.

Very few visitors are going to have the time to spend poring over hours and hours of video and audio, so we also need to provide some means for accessing the content of our multimedia collections. Even though indexing video and sound has in the past been labor intensive and tricky at best, Carnegie Mellon University's Informedia project is demonstrating that it can be done. So we'll provide our users with a keyword searchable index of all the audio and video they can handle. Thompson probably would have been drooling over this technology when he was so diligently searching for some clues about Rosebud.

Finally, while we plan our digitization project, we should keep foremost in our minds what a digital library ought to be, a concept so aptly captured in the Digital Library Federation's definition.

Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities. Digital Library Federation

With this in mind, we can provide users not only with the technology they need to find information but the attitude they need to be greeted with as they enter our electronic doors.

The point, then, is that, if Thompson had had all the available technology and the interfaces to the technology that we now have, the search for Kane's Rosebud would at least have taken less time. He still might not have found exactly what he had hoped for until that last revealing moment in the movie, but he would have spent far much less time looking for it. Foremost in this consideration of digitizing Thatcher is what we can do in a digital environment to enable detection and retrieval in much less time. If Thatcher's journal had been digitized, indexed, and keyword searchable, and if the library's directors and curator been more amenable to providing information rather than hording it, Thompson would have been on his way much sooner and much more satisfied with his experience.

As digital libraries populate the Internet, users may find they have more information than they could ever have conceived. Unfortunately, more information does not necessarily equate to quicker answers. So we still need to continue improving user systems, striving to provide even better means for getting those tidbits of information that answer those pressing questions. The systems for finding information are there. Finding it, however, still means knowing something about the systems. So we still have a way to go. But converting our treasures into a wonderful digital library means the answers are closer and easier to find, if we provide the right entry and the means for new users to get what they need. And we won't worry if Thompson lights up a cigarette as he prepares to leave, because our halls are not marble, our materials are not flamable, and we'll never even smell the smoke.

1Matson, Lisa Dallape & David J. Bonski. "Do Digital Libraries Need Librarians? An Experiential Dialog." Online, 21.6 (Nov-Dec 1997): 87+..


Comments & Suggestions
to Jim Alderman.

Updated 10 December 1998.