Emerging Issues in Library Web Collections
Raymond D. Irwin
The Internet’s growth has been
extraordinary. According to the Internet Software Consortium the number of
hosts has roughly quadrupled since the summer of 1998 (Internet
Software Consortium, 2002). The number of networks and registered domains
also increased (Caslon
Analytics, 2003). About 180 of the world’s nations now have access to the
Internet and, perhaps more significantly, the numbers of people online
worldwide increased from just 16 million in December 1995 to over 600 million
in October 2002 (Gromov,
2002;
Web Metro, 2002). Over 765 million users are expected by the end of 2005 (CommerceNet,
2002).
With this growth has come
extensive use of Internet services. Worldwide, according to Nielsen, the
average Internet user logs on about 22 times per month and spends almost 12
hours per month online (Nielsen//Net
Ratings, 2003). Estimates suggest that North American Internet traffic
nearly doubled in 2002 (RHK,
2002). In other places around the world, this activity is expected to
increase sharply.
Likewise, the range of Internet
activities in which people are engaged is impressive. According to Jupiter Research, the activities responsible for
most Internet traffic are email, use of search engines, research to find
products, services, and local information, and entering contests and reading
news (Greenspan,
2002). The service used by the largest proportion of Internet users,
e-mail, is projected to grow in volume from 31 billion messages sent per day
currently to 60 billion per day by 2006 (Cyberatlas,
2002).
A popular subset of the Internet,
the World Wide Web, has shown similar growth. The Web Characterization Project
at the Online Computer Library Center (OCLC) noted a roughly three-fold
increase in the number of Web sites between 1998 and 2002. The number of
"public" Web sites—those offering "free, unrestricted access to
all or at least a significant portion of [their] content"—nearly doubled
between 1998 and 2000 and has remained steady at about 3 million since 2000 (OCLC,
2002). Similarly, the number of publicly available, indexable Web pages
grew from approximately 800 million in February 1999 to more than 3 billion
today (Lawrence
& Giles, 2002).
Erosion
of Time-Space Barriers
Arguably, both the Internet and
World Wide Web have grown by breaking down barriers of time and space. Prior to
FTP and email, document delivery depended largely upon courier, mail, and fax.
Twenty-four-hour shopping was limited to a few retail and convenience stores
and required physical travel. Meeting people with similar interests often meant
investing significant amounts of time and energy. Learning involved traveling
to school and sitting in a classroom. Entertainment required scheduling,
arranging one’s schedule around available movie times, concert dates, and T.V.
slots.
Now, with the appropriate machine
and applications, individuals need not be so constrained by time and space
boundaries. They can attend Webcast meetings without leaving home, receive
documents almost instantaneously, shop at any hour of the night from one’s
living room, discuss at low cost similar interests in real time with people
around the globe, turn in school assignments and carry on class discussions
without stepping inside a classroom, quickly gain access to paperless
newspapers, enjoy clips or entire movies upon demand, and so much more.
Pre-computer technology made many of these activities possible, but personal
computers with high-speed network interconnectivity has increased the
functionality and speed required for these actions.
This electronic diminution of
time-space barriers has changed—and continues to change—both personal and
institutional behaviors. Technology has increased user expectations. The
delivery of products and services can no longer be measured in days or weeks
but must be counted in hours, minutes, or seconds. Travel beyond the confines
of one’s home for informational or commercial services has, for the experienced
Internet user, become less common and, for many such individuals, less welcomed.
Institutions that at one time were located only in physical space have been
compelled by customer demand to add virtual locations as well (Dembeck,
1999).
Libraries, of course, have not
been immune to these changes. Few would refer to the typical library as a
"click-and-mortar" operation, but most libraries have seen their
share of changes resulting from the Internet’s reduction of time and space
boundaries. For starters, libraries of all types have worked to obtain a
visible entry point on the Web. Every state library in the United States and
all ARL member libraries maintain Web sites (Berkeley
Digital Library Sunsite, 2003; Association
of Research Libraries, 2003), and over three-quarters (77%) of the public
libraries in the U.S. and Canada have a presence on the Web (Prabha
& Irwin, 2003c).
Beyond a simple presence, though,
many libraries have added electronic materials and services to the
"traditional" items associated with physical space. Items offered by
research libraries via the Web include e-books, online catalogs, licensed
databases, e-journals, research guides and finding aids, freely available Web
resources and local digital collections (Wetzel
& Jackson, 2002). Depending upon their sizes and budgets, public
libraries present a similar array to offsite users who cannot—or would prefer
not to—visit in person (Prabha
& Irwin, 2003a). Almost certainly user expectations established by the
"Internet culture" have driven libraries to become accessible
remotely to their service populations twenty-four hours a day through the Web (Kiesler,
1997, ix-xiii).
Library
Web Collections
Libraries’ posting of links to
free Web resources constitutes a new type of collection aimed at taking
advantage of the medium’s decreased time-space restrictions. Except for
technical glitches, links pages never "close"; rather, a library’s
electronic reference collections are available for consultation 24 hours a day,
7 days a week. The term "visit" now commonly refers to access to
files from home via computer.
This concept of "Web
collections" is relatively new to libraries. It involves initially
intangible items that are not held or owned, but rather freely shared and
distributed over interconnected networks. Instead of physical materials, they
are computer files accessed through agreed upon protocols, whereby a single
item can be simultaneously used by many people. The creator(s) of the content
can update, change, or remove access at any time without notifying potential
users ahead of time; on the other hand, very often those accessing the material
can create a physical or local electronic copy of the resource without alerting
the creator(s). Virtually anyone—from highly regarded government agencies and
commercial entities to elementary school students—can "publish" in
this environment; all one needs is a network host and the software and skills
to create and upload pages.
In contrast, the traditional
collections model was based on "tangibility," a central tenet in the
development of all types of libraries. Historically and across cultures lending
libraries in particular were developed on the principle of collecting, holding,
organizing, and making physical items available for use. Three key concepts
were "ownership," "lending" and "borrowing." The
library owned an item, typically a book, and under conditions laid out
by library authorities, lent the material to qualified borrowers.
In this model libraries acquired and processed items and maintained them for as
long as they were deemed useful, at which time they were sold, donated, or
discarded.
General
Problems with Web Resources
The movement from one model to
the next has not been very straightforward or particularly easy for libraries
and librarians. This is due, in part, to a number of oppositional ideas notable
in the Web generally:
1.
"Traditional" vs. "Progressive" Media. In its development the Web has essentially
followed the same pattern as all other transitional, emerging media. Based
initially on the equipment or tools available to most users for access to the
new medium (in this case keyboards), the information available on the Web is
decidedly textual, but increasingly includes higher amounts of other media like
video and sound. The Web is becoming, in terms of formats, more mixed, but
remains for the time being largely print-based.
2.
Scope vs. Utility. The Web
contains large amounts of information on virtually every conceivable subject,
yet users often complain that they cannot find exactly what they want. This
"embarrassment of riches" is perhaps both the Web’s greatest
advantage and its most significant liability.
3.
Mediation vs. Immediacy. On
one hand, because most Web materials are free and a large number are indexed by
commercial search engines, the need for institutional intermediaries has
seemingly been minimized. On the other hand, the lack of restrictions on what
can be "published" in this medium and the sheer amount of material
available arguably requires more intensive intermediation in the form of
selection, organization, and access.
4. Cost
vs. Permanence. For most
publicly available Web resources there is no explicit contractual agreement
between the creator of the material and the potential user. The result is
freely accessible material that the publisher has no obligation to maintain in
any particular form. Therefore, though a site might be free, it is also
potentially transient.
5.
Standardization vs. Flexibility. The extraordinary growth of the Web can be attributed, at least in part,
to the common use of certain protocols and relative lack of regulation. At the
same time, freedom to choose from among Web design styles, preferred browsers
for viewing, and the lack of agreement on metadata schemes has made the
retrieval of some Web-based information more difficult.
Specific
Challenges for Libraries Developing Web Collections
These general, oppositional ideas
have had a direct effect on libraries’ uses of free Web resources. Even before
considering specific problems associated with collecting Web materials, the
staff of each library will first have to consider two very basic, related
questions:
Library administrations that
proceed with the development of Web collections will need to resolve a number
of other issues, most of which are related to the "general problems"
outlined above. The problematic traditional-progressive continuum, for
instance, demands that the format and system requirements of a selected
resource be taken into account. While most Web items currently convey
information textually, that is changing rapidly and could present difficulties
for home users with low bandwidth connections (Nielsen,
1997). Multimedia resources also require additional staff attention since
links to necessary plug-ins need to be provided as well.
A separate concern is the extent
to which librarians are able to discover hard-to-find quality Web resources.
This "scope vs. utility" problem is not one that plagues only end
users, but information intermediaries as well. The problem for librarians involves
adding value to the search, bringing to light Web resources that would be
buried in most search engine retrievals. Studies suggest that public and
academic libraries are already enjoying some success in this area (Prabha
& Irwin, 2003b; Irwin,
2002). Finding such "diamonds in the rough is not easy, though, and
may require that future librarians receive both Web searching and material
evaluation coursework in library schools as well as consistent training on the
job.
Librarians will also need to
confront related issues regarding the selection of Web resources. Clearly, the
analogy of Web materials to physical collections is imperfect. Typically a
physical item is static, whereas Web pages can be updated frequently. Moreover,
catalogers can apply fairly clear standards to the format identification and classification
of tangible materials, but are less able to do so in the "mixed"
environment of the Web (Connell
& Prabha, 2002).
Units of selection and
description are similarly unclear. When a Web site is selected (e.g. http://www.yahoo.com), one must decide whether
everything beneath that URL (e.g. http://www.yahoo.com/News_and_Media/) is also
selected. Does every individual HTML page warrant description? If a larger Web
unit ("parent") is chosen and described, under what circumstances
would a smaller unit ("child") also receive treatment? Issues
regarding extent and subjective scope of a resource need to be addressed.
Librarians also must consider
strategies for user access to Web collections. Because relatively few libraries
have the resources to build searchable link databases single-handedly or can
afford the staff time required to create catalog records for all Web resources,
most currently allow access to their Web collections by subject only. For all
types of libraries, the key to solving the "access vs. quality"
problem may be cooperative action. The widespread use of nimble systems based on
emerging metadata schemes like Dublin Core
would be ideal, since it would allow for searching by a number of fields
(creator, etc.) and for the arrangement of search results according to user
preferences. Alternatively, a system could be developed whereby library link
pages and the resources to which they point are regularly cached and made
searchable as a quality-filtered subset of the Web using a standard search
engine. An even lower cost, shorter term idea to increase user access to
high-quality networked resources would be simply for libraries to link to one
another’s Web collections. Many already do this, but perhaps it deserves a
higher profile.
Tied up with selection,
description, and access is the issue of permanence. Recent studies of public
and academic library Web collections have shown that about 15% of the resources
linked to are unavailable at any particular moment (Prabha
& Irwin, 2003b; Irwin,
2002). As noted above, a general problem with the provision of free Web
materials is that there is no obligation on the part of the creator to maintain
a file at a network location for any specified period of time. The result for
users is frequent frustration. While the larger Internet community fashions
standards to deal with this problem, librarians can in the meantime do several
things to minimize user discouragement. In the selection process, they can
favor site-level resources, which have been found to be available over 95% of
the time (Prabha
& Irwin, 2003b;
Irwin, 2002). They might also choose to make greater use of relatively
inexpensive link checking software, like that produced by Elsop and Watchfire (Electronic
Software Publishing Corporation, 2003; Watchfire,
2003).
The "standardization vs.
freedom" issue manifests itself in many of the decisions librarians must
make when they develop a Web collection because this issue represents the
connective tissue among all other considerations. In terms of format selection
and access, Web collection librarians often must choose—for reasons of
simplicity and user demand—one application over another (e.g. Windows Media
Player instead of Real Media to play a
video clip, or HTML instead of PDF for text). For more accurate retrieval in
Web link databases, some basic agreement on a metadata scheme will be required.
Librarians may also choose a single set of evaluation standards for the
selection of Web resources to improve the consistency of such databases.
Conclusions
Studies have suggested that the
Web is a perceived threat to libraries and librarianship because of the potent,
widespread belief that "everything is on the Internet" and the
popular view that widely available electronic information is easier to access
than libraries are (Jones
E-Global Library, 2002; D’Elia,
Jörgensen, Woelfel, & Rodger, 2002). The relative youth of the Web and
its ability to diminish constraints of time and space in the information
environment provide an easy contrast to the perception of libraries as old,
conservative institutions—buildings filled with books.
Clearly, however, libraries are
not going away. On the contrary, librarians are using the Web to provide remote
users with 24-hour access to information products and services like
subscription databases, catalogs, and reference. At the same time, they are
doing what virtually everyone else on the Web is doing: providing pointers to
other free Web resources.
The creation of such "Web
collections" carries with it certain difficulties for libraries, most of
which are related to issues inherent in all modern information: format
varieties, vastness of available materials, selection, durability/
preservation, and development of standards. In the era when most information
was tangible and costly, libraries were among the few institutions whose main
interest was to acquire material and make it available for free to a specific
service population. Now, however, much of the everyday information in which
people have an interest (e.g. stock prices, maps, news) need not be purchased
and made available only within the confines of a library. Rather, it can be
gleaned from the Web using a URL or search engine, completely bypassing
libraries.
Librarians should, however, be
able to exploit several advantages in the Web space. Perhaps the primary
advantage that libraries enjoy as aggregators of Web resources is the trust of
users. Much has been made of the dubious trustworthiness of information on the
Web and the typical user’s lack of understanding regarding its retrieval (Lynch,
2001; Trickey,
1998). That’s where librarians come in. A Benton
Foundation survey concluded "that Americans admire, respect, and trust
librarians to be their guides to information both on- and offline" (63).
The non-profit Librarians’ Index to the Internet (http://lii.org/)
prominently displays this sentiment in its online banner: "Information You
Can Trust." Even commercial entities on the Web like Galaxy.com make use the profession’s positive
image with slogans like "Trust Your Internet Librarians" (Galaxy,
2000).
This issue of trust is based on
the librarian’s traditional ethic of providing equal access to information.
Users of library Web collections can be fairly certain that no political or
social agenda underlies items selected and, in most cases, no commercial interest
either. This can be viewed merely as an extension of what libraries have always
done in the creation and maintenance of collections: serve without prejudice
the interests of their users.
Beyond the matter of trust,
librarians have long been known as highly proficient organizers of information.
After culling information from the Web, librarians are very well suited to
making it available for use. Most users understand this, generally because they
associate with libraries standard organizational schema like Dewey and the Library of Congress. Librarians,
more than any other professionals, are known for their expertise in creating
order from masses of information. Obviously,
that’s a tremendous asset in dealing with the Web.
After confronting the issues
related to the creation of Web collections—formats, selection, access,
permanence, and standards—libraries of all kinds will need to increase the
visibility of their Web collections. Librarians have much going for them in the
Web space, but they also have strong competition. Commercial interests like Yahoo have had the luxury of spending untold
sums of money on branding, marketing and technical staffing. Libraries,
generally, have not.
Because of funding limitations,
libraries will need to consider increasing the amount of cooperative work that
they do, both with other libraries and with non-library partners. With the
former, greater consideration must be given to joint marketing campaigns and
the branding of Web collections. Libraries should emphasize the "upside of
tradition"—libraries as trustworthy stewards of organized information to
which there is free access. Because of their basic computer training functions,
public libraries are in a particularly strong position to cultivate loyalty to
library Web collections in new Internet users.
Creating Web collections is a
relatively new activity for most libraries. Basic, thorny questions related to
the establishment and maintenance of such collections still need to be
answered, both by librarians themselves and by advocates in professional
organizations. Handling these issues well will improve the status of libraries
in the Web space and help libraries continue their missions outside the
boundaries of time and space. If
librarians give these questions short shrift, however, larger numbers of
increasingly sophisticated users and potential users may come to view libraries
as irrelevant to the rapidly changing information landscape. In an era of shrinking resources for all
kinds of libraries, we can scarcely afford to be pushed to the margins any
further.
References
Association of
Research Libraries (2003). Member Libraries. Retrieved October 31, 2003, from
the ARL Web site: http://www.arl.org/members.html
Berkeley
Digital Library Sunsite (2003). Libraries on the Web: USA State. Retrieved
October 31, 2003, from the Libweb Web site: http://sunsite.berkeley.edu/Libweb/usa-state.html
Caslon Analytics
(2003). Net metrics and statistics guide, August 2003. Retrieved October 30,
2003, from Caslon Analytics Web site: http://www.caslon.com.au/metricsguide1.htm
CommerceNet (2002).
Worldwide Internet Population. Retrieved November 18, 2002, from CommerceNet
Web site: http://www.commerce.net/research/stats/wwstats.html
Cyberatlas
(2002). Email to double by 2006. Retrieved November 18, 2002, from the
Cyberatlas Web site: http://cyberatlas.internet.com/big_picture/applications/article/0,,1301_1472121,00.html
Dembeck, C.
(1999). Toys ‘R’ Us: The latest brick-and-mortar that doesn’t get it? E-Commerce
Times, July 9, 1999. Retrieved November 18, 2002, from the E-Commerce Times
Web site: http://www.ecommercetimes.com/perl/story/746.html
Electronic
Software Publishing Corporation (2003). Retrieved October 31, 2003, from the
ESPC Web site: http://www.elsop.com/
Galaxy (2000).
Galaxy Invites Public to ‘Trust Your Internet Librarians.’ Retrieved October
31, 2003, from the Galaxy Web site: http://www.galaxy.com/info/release40.html
Greenspan, R
(2002). American surfers keep it simple. Retrieved November 18, 2002, from the
Internet News Web site: http://www.internetnews.com/stats/article.php/1466661
Gromov, G.
(2002). The roads and crossroads of Internet history: Growth of the Internet.
Retrieved November 18, 2002, from History of the Internet and WWW Web site: http://www.netvalley.com/intvalstat.html
Internet Software
Consortium (2002). Internet domain survey, July 2002. Retrieved November 18,
2002, from Internet Software Consortium Web site: http://www.isc.org/ds/WWW-200207/index.html
Jones E-Global
Library (2002). The Role of Librarians in the Digital Age. Retrieved
November 20, 2002, from the Jones Knowledge Web site: http://www.jonesknowledge.com/eglobal/ala_survey.html
Kiesler, S.
(1997). Culture of the Internet. Mahwah, N.J.: Lawrence Erlbaum
Associates.
Lawrence, S. & Giles, L. (1999). Accessibility and distribution of
information on the Web. Retrieved November 18, 2002, from http://wwwmetrics.com/
Librarians’ Index to the Internet
(2003). Retrieved October 31, 2003, from http://www.lii.org/
Nielsen, J.
(1997, Winter). Guidelines for multimedia on the Web. Advancing HTML: Style
and Substance 2 (1). Retrieved November 19, 2002, from the W3C Journal Web
site: http://www.w3j.com/5/s3.nielsen.html
Nielsen//NetRatings
(2003). September 2003 global Internet index average usage. Retrieved October
30, 2003, from Nielsen//NetRatings Web site: http://www.nielsen-netratings.com/news.jsp?section=dat_gi
OCLC Web
Characterization Project (2002). Retrieved November 18, 2002, from the Web
Characterization Project Web site: http://wcp.oclc.org
Watchfire
(2003). Retrieved October 31, 2003, from the Watchfire Web site: http://www.watchfire.com/
WebMetro (2002).
More than 600 million people have Net access, November 4, 2002. Retrieved
November 18, 2002, from WebMetro Web site: http://www.webmetro.com/News1Detail1.asp?NewsRS_Action=Find('AutoNo','829')&NewsRS_Position=FIL%3ACategory+%3D+%27Statistics%27ORD%3AABS%3A3KEY%3A829PAR%3A