Emerging Issues in Library Web Collections
Raymond D. Irwin
The Internet’s growth has been extraordinary. According to the Internet Software Consortium the number of hosts has roughly quadrupled since the summer of 1998 (Internet Software Consortium, 2002). The number of networks and registered domains also increased (Caslon Analytics, 2003). About 180 of the world’s nations now have access to the Internet and, perhaps more significantly, the numbers of people online worldwide increased from just 16 million in December 1995 to over 600 million in October 2002 (Gromov, 2002; Web Metro, 2002). Over 765 million users are expected by the end of 2005 (CommerceNet, 2002).
With this growth has come extensive use of Internet services. Worldwide, according to Nielsen, the average Internet user logs on about 22 times per month and spends almost 12 hours per month online (Nielsen//Net Ratings, 2003). Estimates suggest that North American Internet traffic nearly doubled in 2002 (RHK, 2002). In other places around the world, this activity is expected to increase sharply.
Likewise, the range of Internet activities in which people are engaged is impressive. According to Jupiter Research, the activities responsible for most Internet traffic are email, use of search engines, research to find products, services, and local information, and entering contests and reading news (Greenspan, 2002). The service used by the largest proportion of Internet users, e-mail, is projected to grow in volume from 31 billion messages sent per day currently to 60 billion per day by 2006 (Cyberatlas, 2002).
A popular subset of the Internet, the World Wide Web, has shown similar growth. The Web Characterization Project at the Online Computer Library Center (OCLC) noted a roughly three-fold increase in the number of Web sites between 1998 and 2002. The number of "public" Web sites—those offering "free, unrestricted access to all or at least a significant portion of [their] content"—nearly doubled between 1998 and 2000 and has remained steady at about 3 million since 2000 (OCLC, 2002). Similarly, the number of publicly available, indexable Web pages grew from approximately 800 million in February 1999 to more than 3 billion today (Lawrence & Giles, 2002).
Erosion of Time-Space Barriers
Arguably, both the Internet and World Wide Web have grown by breaking down barriers of time and space. Prior to FTP and email, document delivery depended largely upon courier, mail, and fax. Twenty-four-hour shopping was limited to a few retail and convenience stores and required physical travel. Meeting people with similar interests often meant investing significant amounts of time and energy. Learning involved traveling to school and sitting in a classroom. Entertainment required scheduling, arranging one’s schedule around available movie times, concert dates, and T.V. slots.
Now, with the appropriate machine and applications, individuals need not be so constrained by time and space boundaries. They can attend Webcast meetings without leaving home, receive documents almost instantaneously, shop at any hour of the night from one’s living room, discuss at low cost similar interests in real time with people around the globe, turn in school assignments and carry on class discussions without stepping inside a classroom, quickly gain access to paperless newspapers, enjoy clips or entire movies upon demand, and so much more. Pre-computer technology made many of these activities possible, but personal computers with high-speed network interconnectivity has increased the functionality and speed required for these actions.
This electronic diminution of time-space barriers has changed—and continues to change—both personal and institutional behaviors. Technology has increased user expectations. The delivery of products and services can no longer be measured in days or weeks but must be counted in hours, minutes, or seconds. Travel beyond the confines of one’s home for informational or commercial services has, for the experienced Internet user, become less common and, for many such individuals, less welcomed. Institutions that at one time were located only in physical space have been compelled by customer demand to add virtual locations as well (Dembeck, 1999).
Libraries, of course, have not been immune to these changes. Few would refer to the typical library as a "click-and-mortar" operation, but most libraries have seen their share of changes resulting from the Internet’s reduction of time and space boundaries. For starters, libraries of all types have worked to obtain a visible entry point on the Web. Every state library in the United States and all ARL member libraries maintain Web sites (Berkeley Digital Library Sunsite, 2003; Association of Research Libraries, 2003), and over three-quarters (77%) of the public libraries in the U.S. and Canada have a presence on the Web (Prabha & Irwin, 2003c).
Beyond a simple presence, though, many libraries have added electronic materials and services to the "traditional" items associated with physical space. Items offered by research libraries via the Web include e-books, online catalogs, licensed databases, e-journals, research guides and finding aids, freely available Web resources and local digital collections (Wetzel & Jackson, 2002). Depending upon their sizes and budgets, public libraries present a similar array to offsite users who cannot—or would prefer not to—visit in person (Prabha & Irwin, 2003a). Almost certainly user expectations established by the "Internet culture" have driven libraries to become accessible remotely to their service populations twenty-four hours a day through the Web (Kiesler, 1997, ix-xiii).
Library Web Collections
Libraries’ posting of links to free Web resources constitutes a new type of collection aimed at taking advantage of the medium’s decreased time-space restrictions. Except for technical glitches, links pages never "close"; rather, a library’s electronic reference collections are available for consultation 24 hours a day, 7 days a week. The term "visit" now commonly refers to access to files from home via computer.
This concept of "Web collections" is relatively new to libraries. It involves initially intangible items that are not held or owned, but rather freely shared and distributed over interconnected networks. Instead of physical materials, they are computer files accessed through agreed upon protocols, whereby a single item can be simultaneously used by many people. The creator(s) of the content can update, change, or remove access at any time without notifying potential users ahead of time; on the other hand, very often those accessing the material can create a physical or local electronic copy of the resource without alerting the creator(s). Virtually anyone—from highly regarded government agencies and commercial entities to elementary school students—can "publish" in this environment; all one needs is a network host and the software and skills to create and upload pages.
In contrast, the traditional collections model was based on "tangibility," a central tenet in the development of all types of libraries. Historically and across cultures lending libraries in particular were developed on the principle of collecting, holding, organizing, and making physical items available for use. Three key concepts were "ownership," "lending" and "borrowing." The library owned an item, typically a book, and under conditions laid out by library authorities, lent the material to qualified borrowers. In this model libraries acquired and processed items and maintained them for as long as they were deemed useful, at which time they were sold, donated, or discarded.
General Problems with Web Resources
The movement from one model to the next has not been very straightforward or particularly easy for libraries and librarians. This is due, in part, to a number of oppositional ideas notable in the Web generally:
1. "Traditional" vs. "Progressive" Media. In its development the Web has essentially followed the same pattern as all other transitional, emerging media. Based initially on the equipment or tools available to most users for access to the new medium (in this case keyboards), the information available on the Web is decidedly textual, but increasingly includes higher amounts of other media like video and sound. The Web is becoming, in terms of formats, more mixed, but remains for the time being largely print-based.
2. Scope vs. Utility. The Web contains large amounts of information on virtually every conceivable subject, yet users often complain that they cannot find exactly what they want. This "embarrassment of riches" is perhaps both the Web’s greatest advantage and its most significant liability.
3. Mediation vs. Immediacy. On one hand, because most Web materials are free and a large number are indexed by commercial search engines, the need for institutional intermediaries has seemingly been minimized. On the other hand, the lack of restrictions on what can be "published" in this medium and the sheer amount of material available arguably requires more intensive intermediation in the form of selection, organization, and access.
4. Cost vs. Permanence. For most publicly available Web resources there is no explicit contractual agreement between the creator of the material and the potential user. The result is freely accessible material that the publisher has no obligation to maintain in any particular form. Therefore, though a site might be free, it is also potentially transient.
5. Standardization vs. Flexibility. The extraordinary growth of the Web can be attributed, at least in part, to the common use of certain protocols and relative lack of regulation. At the same time, freedom to choose from among Web design styles, preferred browsers for viewing, and the lack of agreement on metadata schemes has made the retrieval of some Web-based information more difficult.
Specific Challenges for Libraries Developing Web Collections
These general, oppositional ideas have had a direct effect on libraries’ uses of free Web resources. Even before considering specific problems associated with collecting Web materials, the staff of each library will first have to consider two very basic, related questions:
Library administrations that proceed with the development of Web collections will need to resolve a number of other issues, most of which are related to the "general problems" outlined above. The problematic traditional-progressive continuum, for instance, demands that the format and system requirements of a selected resource be taken into account. While most Web items currently convey information textually, that is changing rapidly and could present difficulties for home users with low bandwidth connections (Nielsen, 1997). Multimedia resources also require additional staff attention since links to necessary plug-ins need to be provided as well.
A separate concern is the extent to which librarians are able to discover hard-to-find quality Web resources. This "scope vs. utility" problem is not one that plagues only end users, but information intermediaries as well. The problem for librarians involves adding value to the search, bringing to light Web resources that would be buried in most search engine retrievals. Studies suggest that public and academic libraries are already enjoying some success in this area (Prabha & Irwin, 2003b; Irwin, 2002). Finding such "diamonds in the rough is not easy, though, and may require that future librarians receive both Web searching and material evaluation coursework in library schools as well as consistent training on the job.
Librarians will also need to confront related issues regarding the selection of Web resources. Clearly, the analogy of Web materials to physical collections is imperfect. Typically a physical item is static, whereas Web pages can be updated frequently. Moreover, catalogers can apply fairly clear standards to the format identification and classification of tangible materials, but are less able to do so in the "mixed" environment of the Web (Connell & Prabha, 2002).
Units of selection and description are similarly unclear. When a Web site is selected (e.g. http://www.yahoo.com), one must decide whether everything beneath that URL (e.g. http://www.yahoo.com/News_and_Media/) is also selected. Does every individual HTML page warrant description? If a larger Web unit ("parent") is chosen and described, under what circumstances would a smaller unit ("child") also receive treatment? Issues regarding extent and subjective scope of a resource need to be addressed.
Librarians also must consider strategies for user access to Web collections. Because relatively few libraries have the resources to build searchable link databases single-handedly or can afford the staff time required to create catalog records for all Web resources, most currently allow access to their Web collections by subject only. For all types of libraries, the key to solving the "access vs. quality" problem may be cooperative action. The widespread use of nimble systems based on emerging metadata schemes like Dublin Core would be ideal, since it would allow for searching by a number of fields (creator, etc.) and for the arrangement of search results according to user preferences. Alternatively, a system could be developed whereby library link pages and the resources to which they point are regularly cached and made searchable as a quality-filtered subset of the Web using a standard search engine. An even lower cost, shorter term idea to increase user access to high-quality networked resources would be simply for libraries to link to one another’s Web collections. Many already do this, but perhaps it deserves a higher profile.
Tied up with selection, description, and access is the issue of permanence. Recent studies of public and academic library Web collections have shown that about 15% of the resources linked to are unavailable at any particular moment (Prabha & Irwin, 2003b; Irwin, 2002). As noted above, a general problem with the provision of free Web materials is that there is no obligation on the part of the creator to maintain a file at a network location for any specified period of time. The result for users is frequent frustration. While the larger Internet community fashions standards to deal with this problem, librarians can in the meantime do several things to minimize user discouragement. In the selection process, they can favor site-level resources, which have been found to be available over 95% of the time (Prabha & Irwin, 2003b; Irwin, 2002). They might also choose to make greater use of relatively inexpensive link checking software, like that produced by Elsop and Watchfire (Electronic Software Publishing Corporation, 2003; Watchfire, 2003).
The "standardization vs. freedom" issue manifests itself in many of the decisions librarians must make when they develop a Web collection because this issue represents the connective tissue among all other considerations. In terms of format selection and access, Web collection librarians often must choose—for reasons of simplicity and user demand—one application over another (e.g. Windows Media Player instead of Real Media to play a video clip, or HTML instead of PDF for text). For more accurate retrieval in Web link databases, some basic agreement on a metadata scheme will be required. Librarians may also choose a single set of evaluation standards for the selection of Web resources to improve the consistency of such databases.
Studies have suggested that the Web is a perceived threat to libraries and librarianship because of the potent, widespread belief that "everything is on the Internet" and the popular view that widely available electronic information is easier to access than libraries are (Jones E-Global Library, 2002; D’Elia, Jörgensen, Woelfel, & Rodger, 2002). The relative youth of the Web and its ability to diminish constraints of time and space in the information environment provide an easy contrast to the perception of libraries as old, conservative institutions—buildings filled with books.
Clearly, however, libraries are not going away. On the contrary, librarians are using the Web to provide remote users with 24-hour access to information products and services like subscription databases, catalogs, and reference. At the same time, they are doing what virtually everyone else on the Web is doing: providing pointers to other free Web resources.
The creation of such "Web collections" carries with it certain difficulties for libraries, most of which are related to issues inherent in all modern information: format varieties, vastness of available materials, selection, durability/ preservation, and development of standards. In the era when most information was tangible and costly, libraries were among the few institutions whose main interest was to acquire material and make it available for free to a specific service population. Now, however, much of the everyday information in which people have an interest (e.g. stock prices, maps, news) need not be purchased and made available only within the confines of a library. Rather, it can be gleaned from the Web using a URL or search engine, completely bypassing libraries.
Librarians should, however, be able to exploit several advantages in the Web space. Perhaps the primary advantage that libraries enjoy as aggregators of Web resources is the trust of users. Much has been made of the dubious trustworthiness of information on the Web and the typical user’s lack of understanding regarding its retrieval (Lynch, 2001; Trickey, 1998). That’s where librarians come in. A Benton Foundation survey concluded "that Americans admire, respect, and trust librarians to be their guides to information both on- and offline" (63). The non-profit Librarians’ Index to the Internet (http://lii.org/) prominently displays this sentiment in its online banner: "Information You Can Trust." Even commercial entities on the Web like Galaxy.com make use the profession’s positive image with slogans like "Trust Your Internet Librarians" (Galaxy, 2000).
This issue of trust is based on the librarian’s traditional ethic of providing equal access to information. Users of library Web collections can be fairly certain that no political or social agenda underlies items selected and, in most cases, no commercial interest either. This can be viewed merely as an extension of what libraries have always done in the creation and maintenance of collections: serve without prejudice the interests of their users.
Beyond the matter of trust, librarians have long been known as highly proficient organizers of information. After culling information from the Web, librarians are very well suited to making it available for use. Most users understand this, generally because they associate with libraries standard organizational schema like Dewey and the Library of Congress. Librarians, more than any other professionals, are known for their expertise in creating order from masses of information. Obviously, that’s a tremendous asset in dealing with the Web.
After confronting the issues related to the creation of Web collections—formats, selection, access, permanence, and standards—libraries of all kinds will need to increase the visibility of their Web collections. Librarians have much going for them in the Web space, but they also have strong competition. Commercial interests like Yahoo have had the luxury of spending untold sums of money on branding, marketing and technical staffing. Libraries, generally, have not.
Because of funding limitations, libraries will need to consider increasing the amount of cooperative work that they do, both with other libraries and with non-library partners. With the former, greater consideration must be given to joint marketing campaigns and the branding of Web collections. Libraries should emphasize the "upside of tradition"—libraries as trustworthy stewards of organized information to which there is free access. Because of their basic computer training functions, public libraries are in a particularly strong position to cultivate loyalty to library Web collections in new Internet users.
Creating Web collections is a relatively new activity for most libraries. Basic, thorny questions related to the establishment and maintenance of such collections still need to be answered, both by librarians themselves and by advocates in professional organizations. Handling these issues well will improve the status of libraries in the Web space and help libraries continue their missions outside the boundaries of time and space. If librarians give these questions short shrift, however, larger numbers of increasingly sophisticated users and potential users may come to view libraries as irrelevant to the rapidly changing information landscape. In an era of shrinking resources for all kinds of libraries, we can scarcely afford to be pushed to the margins any further.
Dembeck, C. (1999). Toys ‘R’ Us: The latest brick-and-mortar that doesn’t get it? E-Commerce Times, July 9, 1999. Retrieved November 18, 2002, from the E-Commerce Times Web site: http://www.ecommercetimes.com/perl/story/746.html
Gromov, G. (2002). The roads and crossroads of Internet history: Growth of the Internet. Retrieved November 18, 2002, from History of the Internet and WWW Web site: http://www.netvalley.com/intvalstat.html
Lawrence, S. & Giles, L. (1999). Accessibility and distribution of information on the Web. Retrieved November 18, 2002, from http://wwwmetrics.com/
Librarians’ Index to the Internet (2003). Retrieved October 31, 2003, from http://www.lii.org/
Lynch, C. (2001). When documents deceive: Trust and provenance as new factors for information retrieval in a tangled Web. Journal of the American Society for Information Science and Technology, 52, 12-17.
Nielsen, J. (1997, Winter). Guidelines for multimedia on the Web. Advancing HTML: Style and Substance 2 (1). Retrieved November 19, 2002, from the W3C Journal Web site: http://www.w3j.com/5/s3.nielsen.html
Nielsen//NetRatings (2003). September 2003 global Internet index average usage. Retrieved October 30, 2003, from Nielsen//NetRatings Web site: http://www.nielsen-netratings.com/news.jsp?section=dat_gi
RHK Telecommunications Industry Analysis (2002). Press releases, July 24, 2002. Retrieved November 18, 2002, from RHK Telecommunications Industry Analysis Web site: http://rhk.com/pressrelease.asp?id=160
WebMetro (2002). More than 600 million people have Net access, November 4, 2002. Retrieved November 18, 2002, from WebMetro Web site: http://www.webmetro.com/News1Detail1.asp?NewsRS_Action=Find('AutoNo','829')&NewsRS_Position=FIL%3ACategory+%3D+%27Statistics%27ORD%3AABS%3A3KEY%3A829PAR%3A