| A
B
C
D
E
F
G
H
I
J
K
L
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
|
|
Following are
some of the most frequently used terms for discussing the Internet
and virtual libraries. This is admittedly a limited selection of
terminology.
AACR2
(Anglo-American Cataloging Rules)
A set of standards for describing materials for access via
a library catalog or database.
access management Means by which a system administrator
can control who retrieves materials over a computer network and
under what circumstances. Although materials available on the Web
typically are not restricted, databases carry access restrictions.
Acrobat -- Software developed by Adobe Systems to facilitate
cross-platform presentation of text and graphics. Acrobat generates
documents in PDF (portable document format).
administrative metadata Such information as rights,
permissions, and access data as it applies to a specific data object.
aggregator Services that gather publications from
a number of publishers and offer the packaged data to users as a
single service.
antialiasing A technique for smoothing bitonal
graphic images by using grayscale pixels to gradually provide transition
from the black to the white portions of an image. Without antialiasing,
a diagonal line would appear choppy instead of smooth. This is especially
useful in text scans where letters may be composed of not only vertical
and horizontal strokes but of varying degrees of curves and diagonal
lines. The overall effect is a smooth outline for all characters.
applet a small program that can be served by a network
computer and run on a computer workstation. Typically applets are
thought of as Java programs that are cross-platform and can be initialized
by code strings embedded in a Web page.
ARPANET Developed in 1969 by Larry Roberts and supported
by the United States Department of Defense, the Arpanet can be considered
the beginning of today's Internet. At the time, the network was
developed to facilitate high-speed communication between institutions
doing military research.
artificial intelligence A long pursued goal of computing
system designers, a system capable of artificial intelligence would
actually be able to approximate human thought processes, understand
spoken commands, and respond appropriately. Speech synthesis has
already progressed considerably, but the goal of producing a truly
intelligent system has still not been achieved.
ASCII (American Standard Code for Information Interchange)
A text representation standard that stores one alphabetic,
numeric, or other character in one computer byte. This is the most
common means for representing characters and allows plain text to
be shared among multiple applications. Saving a document in ASCII
format in a word processor like WordPerfect means that the same
document can be used in another different word processor without
having to be converted. ASCII does not allow, however, for page
layout like word processing and page layout programs do. A document
originally created with bold and italic text and with special formatting
(tables, indentions, etc.) will lose all special formatting and
text characteristics when saved to ASCII format.
assymetric cryptography Use of one-way encryption
of messages. A sender can use encryption to send a message to a
person with the capability of decoding the message. The recipient
of the message does not have the capability of encoding messages,
so the transaction is one-way or assymetric.
authentication Means by which a computer workstation
or computer user establishes a connection to another workstation
or to an online system. Authentication may be via a set protocol
or via username and password. Many database vendors now offer IP
authentication, which works by allowing a user workstation to broadcast
its IP address to the database server for verification against valid
user IPs. If the IP results in a match in the vendors authorized
user base, the user is allowed access to the database. Otherwise,
access is blocked.
automatic speech recognition The ability of
a computer system to recognize and interpret human speech by use
of software designed to analyze sounds and interpret them according
to a programmed vocabulary. The applications of such a system range
from enabling a user to command a computer to perform certain tasks
(shut down, open file, etc.) to enabling hands-off dictation of
entire texts to a computer system to identifying individuals by
matching voice and pronunciation to know patterns.
binary search A means of quickly searching
a sequential data list. A search for a match begins in the middle
of the file and continues subdividing the list until a match is
found or until the list is small enough to be searched sequentially.
bitmap Also called a raster image, a bitmap uses a
grid of pixels to reproduce an image. Each pixel is a small square
that is assigned a color value and location within the bitmap grid.
The combination of pixels can produce anything from a low resolution
bitonal image to a high resolution full color image of photographic
quality. Most photo-editing software uses the bitmap format as a
standard for editing images. Instead of editing an entire image,
a photo-editor can edit each individual pixel to produce desired
results.
Bitnet (Because It's There Net) Network developed
by the City University of New York and Yale University for facilitating
data transfer between the institutions. The Bitnet helped lay groundwork
for the Internet and served as a communications standard between
universities for years.
Boolean operators Logical operators used to facilitate
database searching. AND, OR, and NOT are commonly used Boolean operators.
Boolean logic is based on the work of British mathematician George
Boole, whose work in algebra established the logical principles
of set theory.
browser Software that allows a user to view
materials on the Internet. The three best known browsers are Netscape
Navigator, Microsoft Internet Explorer, and NCSA Mosaic. Although
typically thought of as an Internet only program, browsers can also
be used to provide access to local files and to access information
offline.
browsing
Looking through a collection of materials without a particular
goal. People often browse the Internet looking for nothing in particular
so much as just for something interesting. Browsing physical library
collections is also a common user approach. When a user is familiar
with the arrangement of a collection (history in one area, literature
in another, etc.), browsing can be a productive means of finding
specific materials.
cache A temporary copy of a set of
data that is stored on a computer workstation to speed up access
times.
catalog Collection of bibliographic records created
according to strict rules.
CCITT Group IV Fax Compression
An international standard for facsimile transmission developed by
the Consultative Committee on International Telecommunications Technology
(CCITT), now known as the International Telecommunications Union
(ITU). The standard supports compression for more efficient image
transmission.
CD ROM (Compact Disk Read Only Memory) A hard plastic
disk measuring 12 centimeters (5 3/4 inches in diameter) which is
composed of a very thin layer of metal sandwiched between plastic
layers. The metallic layer is imprinted with depressions of varying
depths that can be interpreted as images, sounds, and text. A single
CD ROM can hold around 650 megabytes of information, over 450 times
the capacity of a single floppy disk (1.44 megabytes). Even larger
storage capacities are possible with the newer DVD technology (Digital
Versatile Disk).
CGI (Common Gateway Interface) A programming interface
that allows a browser to access information services other than
Web pages on a network server. One of the most common uses of CGI
is to process data submitted via HTML forms in a Web page. The CGI
script processes the user input and transmits the formatted data
to the server for interpretation/action.
CIE (Commission Internationale de l"Eclairage)
Standard Colorimetric System International standard for representing
color in three dimensions -- lightness, red-green, and yellow-blue.
The CIE model represents all colors that can be perceived by the
human eye. Colors that can be reproduced on a color monitor or in
a color photograph comprise only a subset of the full visual range.
classification An organizing scheme used for grouping
materials. Library's typically organize materials by subjects. Similarly,
databases will have classification schemes that typically assign
subject headings or descriptors to articles or materials included.
client In the server/client model of computing, the
"subservient" machine that depends on the server for its
information. On an intranet, the client might be the user workstation,
which can access programs and data from the organization's file
server. On the Internet, a user's workstation is the client to the
many Web servers that offer digitized information.
clustering A technique for retrieving data based on
frequency of use. For example, a text that was accessed numerous
times and had been scanned in full (rather than being viewed and
passed over) would rank higher in a retrieval list than a text that
had only be viewed (rather than fully accessed) or that had rarely
been accessed at all. The idea behind clustering is that the more
useful items in a collection will be more frequently accessed by
users.
CMYK (Cyan Magenta Yellow Black) A color model based
on the light-absorbing properties of color inks on paper. The combination
of cyan, magenta, and yellow on paper should produce the color black,
but because of impurities in ink these must be combined with black
ink to produce true black on paper. The combination of cyan, magenta,
and yellow instead produces a brownish color. This color model is
used in the so-called "four-color" printing process.
collision chain A collection of data in a hash table
with conflicting values. In other words, because of representing
words with numeric values, numerous words may be represented by
the same value and are said to collide. Usually a secondary hashing
scheme is used to sort through collision chains within a table.
compression A scheme for decreasing the size of a
digital file in order to speed retrieval or to save on storage space.
Compression may be lossless or lossy, depending on the algorithm
that is used to compress the file. Where space is not so much an
issue, lossless compression is favored.
controlled vocabulary A collection of terms used in
indexing materials described in a database. For example, the Library
of Congress Cataloging System uses a controlled vocabulary to assist
catalogers in uniformly describing materials to be included in a
library catalog. Most online databases utilize controlled vocabularies
to provide access points to the materials indexed in the databases.
For each item described in the database, an indexer/cataloger further
analyzes the content and assigns specific subject terms to further
describe the item. Most systems using controlled vocabularies maintain
detailed thesauri that define and cross-reference the terms used
in the vocabulary.
copyright Legal claim to intellectual property. In
the United States, the 1976 revision to the copyright law provided
a duration of 75 years. The Berne Convention, which the U.S. will
be applying provides for the life of the author plus 50 years as
the period of copyright.
cryptography The process of encoding information using
a secret algorithm or key. Cryptography is especially important
in electronic communications since it is relatively easy for an
outsider to intercept messages being sent over the Internet or even
to impersonate someone else when sending messages. People wanting
to exchange messages securely can use encryption technology to secure
their exchanges.
cryptolope or Secret Envelope Refers to IBM's encryption
technology designed to assist companies in doing business over the
Internet. The vendor using cryptolopes can set options which either
allow or don't allow users to view, download, copy, print, or otherwise
access materials from its Web site. Further information on cryptolopes
is available directly from IBM.
CSMA/CD (carrier sense multiple access, collision detection)
Standard for transferring data across an Ethernet LAN. Devices
needing access to the network will check for available bandwidth.
If the network is free, the device will broadcast. If it is busy,
the device waits for a random amount of time before trying again.
CSS (Cascading Style Sheets) A W3C standard for describing
document formatting elements in an HTML document. The strength of
CSS is that numerous pages can be changed on the fly just by editing
the CSS style document that controls page formatting.
DARPA (Defense Advanced Research Projects
Agency) Formerly ARPA, this federal government agency has
traditionally been a major sponsor of computer science research
in the United States. One of the best-known outgrowths of DARPA's
funding was the ARPAnet, which helped to form the backbone of today's
Internet.
data Information elements that can be gathered and
organized and presented in print or via electronic media.
database A collection of information elements that
can be managed with a database management system. A database may
be text only or may include media of many types (sound, video, graphics,
etc.).
descriptive metadata In library catalogs or indexes,
the bibliographic information for a data object.
descriptor A subject term assigned from a controlled
vocabulary to describe a data object. Most article databases assign
multiple descriptors to each article indexed or reproduced in the
database in order to provide searchers with subject access to the
collections of data.
digital library Digital libraries are organizations
that provide the resources, including the specialized staff, to
select, structure, offer intellectual access to, interpret, distribute,
preserve the integrity of, and ensure the persistence over time
of collections of digital works so that they are readily and economically
available for use by a defined community or set of communities.
(Source: Digital
Library Federation)
Digital
Libraries Initiative NSF, NASA, and ARPA funded
program focused on digitizing and providing access to library collections.
Six universities are the primary sites for the Inititiative: University
of California at Berkeley, University of Michigan, University of
Illinois, Stanford University, University of California at Santa
Barbara, and Carnegie Mellon University.
distributed computing The use of clusters of computers
to serve data and applications over a network.
distributed
searching
Searching over multiple systems.
distributed system A group of computers that work
together to provide access to data.
dithering A process of approximating a color or shade
of gray by using dots of varying values. Dithering is used to improve
the quality of a compressed digital image by representing subtle
changes with varying dot sizes and intensities. Dithering produces
more photorealistic images without drastically increasing file size.
DNS (domain-name service) One of the key elements
of the Internet, the Domain Name Service resolves IP addresses over
the Internet.
domain name The name of a computer connected to the
Internet. For example, the domain name for the Whitehouse Web server
is whitehouse.gov.
DPI (Dots Per Inch) A means of describing printed
text or image quality based on the number of ink dots that are used
to produce a solid image. Standard resolution on paper for black
text from an inkjet printer is 300 dpi. Laser printers and some
more expensive inkjet printers are capable of 600 dpi or higher.
The higher the dpi number, the finer the printed image will appear.
DTD (Document Type Definition) Document formatting
definition file used in SGML. SGML makes use of tags to control
the presentation of a document. The tags and their functions are
defined in the DTD file.
DVD (Digital Versatile Disk) High capacity storage
medium based on and the same size as a Compact Disk. Where a standard
CD can hold up to 650 MB of information, a DVD can hold anywhere
from 4 to 9 Gigabytes on a side or up to 17 Gigabytes on a double
sided disk. DVD was developed originally as a medium for storing
compressed digital movies which were too large to fit on a single
CD.
electronic journal Commonly, a publication
that maintains many print characteristics but is produced and distributed
online. Term can be used for a journal that is solely digital and
for one that exists in both print and digital formats.
electronic library Often used interchangeably with
digital library and virtual library. A collection of resources that
users can access over a computer system. In the broadest sense of
the term, resources would be available via the Internet. In the
most narrow sense, the resources would be served over an Intranet
or even on an individual workstation.
email Electronic mail
Ethernet The most commonly used local area network
(LAN). Developed by Xerox, Digital,
and Intel, Ethernet allows connection
of up to 1024 nodes over twisted pair, coax, or fiber and is established
as an IEEE standard (IEEE 802.3).
fair use A concept used in copyright
to allow for individual use/copying of materials, as long as such
use is not for profit or used to produce financial gain.
field searching Searching in a database for information
in specific fields. Metadata searching. In a database that uses
author, title, and descriptor indexing, field searching might take
the shape of title field searching or author field searching or
descriptor field searching. This is in contrast to full text or
keyword searching.
field A specific bit of metadata about a data object.
Fields might include author name, title, descriptor, publishing
data, publication date, identifiers, etc.
firewall A machine or software application used to
block or limit access to an individual workstation or to a computer
network. Just as the firewall of an automobile is designed to prevent
damage to the interior of the car and injury to the passengers,
a computer firewall is designed to prevent damage to the workstation
or file server.
FTP (file transfer protocol) An Internet communications
standard that facilitates the sharing of files from one computer
to another. Many Web sites offer file downloads to users using FTP.
Indeed, some repositories of digital texts provide users with the
option to download full text via an FTP site.
full text searching Searching through all indexed
words in a full-text database. Field searching limits the search
for information to metadata. Full text searching is not limited
to metadata, but instead probes the data itself.
gamma A numeric value representing the difference
between the input and output of a device. Most commonly, gamma is
used in relation to monitors, which may display images either brighter
or darker than the original scanned image. Gamma correction is an
integral part of photo editing software and can be used to adjust
scanned images so that they display and print more like the original.
Gecko The latest rendering engine
designed by the Mozilla project. The Gecko engine is the basis of
the Netscape browser and is compliant with current Web standards
including HTML 4.0, XML, CSS, and DOM.
GIF (Graphics Interchange Format) A standard image
representation format used extensively to make images available
over the Internet. A GIF image can have a maximum of 256 colors
or shades, which makes it useful for preserving the sharpness of
bitonal or grayscale images. Although not as useful as other formats
in preserving the richness of natural color scenes (the GIF compression
scheme reduces the number of colors in a photograph from millions
of colors to a maximum of 256), it is widely used to preserve color
images for presentation over the Internet.
gopher An Internet file sharing system that predates
the World Wide Web. Gopher systems allowed users on different computer
systems to share information over the Internet using a text-based
indexing system.
hash coding A search technique that eliminates the
need to scan an entire database by providing access to key words
based on where they appear in a database. Words and their locations
may be represented, for example, by numeric values, which are then
searchable. Hash coding provides quicker search times than linear
searching.
hierarchical organization A principal
of oranization that effectively ranks information from broader to
more narrow. Some indexing systems are hierarchical in nature, providing
broad, general terms at the top of the hierarchy, with more specific,
narrower terms appearing further down the chain of concepts.
hit Successful retrieval of information from a database.
For example, a search of an article database for recent articles
on the 2000 Presidential Election might uncover thousands of "hits."
In this example, a hit would be a reference in the database to an
article about the election.
home page The starting page for an individual or organization
with a Web presence. Typically home pages are named index.html,
quite appropriately. The home page typically introduces the Web
site and provides links to other content in the site.
HTML (Hypertext Markup Language) The standard formatting
language used on the World Wide Web. Very similar in appearance
to SGML, HTML's primary difference is that it supports hypertext
links within a document, one of the hallmarks of Web page design.
HTML uses tags enclosed in angle brackets (<>) to determine
text characteristics, layout, and position of images and sound files.
HTTP (Hypertext Transfer Protocol) Standard protocol
that enables computer systems to exchange hypertext materials over
the Internet.
hyperlink A "clickable" word, phrase, or
object in a Web page that "jumps" the viewer to further
information or to other Web sites.
identifier Information that describes
an item in lay terms. Identifiers are used in many databases to
provide subject-related access points other than descriptors.
IETF (Internet Engineering Task Force) The primary
protocol and standardization development body for the Internet,
the IETF is an international task force that combines the expertise
of network researchers and designers.
information retrieval The process of gathering data
from an online system. In the case of a library catalog, the act
of locating relevant catalog records in the OPAC.
Internet An "interconnected group of independently
managed networks," the Internet is a worldwide telecommunications
backbone that supports communication among computer systems around
the world. The Internet had its roots in the Arpanet in 1969 but
has since grown into a vast communications network utilized not
only by government, but by educational institutions, private organizations,
companies, and individuals.
Internet Explorer Microsoft's answer to Netscape Navigator,
a graphical Web browser that is incorporated into the Windows operating
system.
interoperability Functionality that allows users access
across many systems using similar or same command language.
IP address Typically used to refer to the numeric
address of a computer connected to the Internet. IP addresses can
be static (permanently assigned) or dynamic (assigned temporarily
to enable computer to computer communications and consist of numeric
strings punctuated by a period. For example, the IP address 139.62.208.133
might designate an individual workstation at a specific organization.
The first three strings of numbers would normally represent a particular
computer system within an organization, while the final three digits
of the address would locate an individual machine attached to the
network at that site.
Java Platform-independent programming
language created by Sun Microsystems. Similar in structure to C++,
Java applets can be imbedded in HTML documents and enable Web users
to run programs on their computers regardless of their operating
system (Windows, Apple, UNIX, etc.). Both Netscape Navigator and
Microsoft Internet Explorer browsers have built-in support for Java.
JavaScript A Netscape-developed scripting language
that can be used to execute commands in a Web browser. Javascript
provides Web pages with more functionality than does HTML code alone.
JPEG (Joint Photographic Experts Group) A standard
image representation format used extensively to make images available
over the Internet. A JPEG image preserves natural color more efficiently
than the GIF format while still providing image compression sufficient
to make it useful as a means of representing images in a computer
system. The JPEG compression algorithm selectively discards color
information based on the amount of compression desired. JPEGs support
RGB, CMYK, and grayscale color modes.
keyword searching Search capability
that allows querying a database by any chosen words. Keyword searching
can accommodate the use of Boolean operators, phrase and proximity
searching, and allows for inexperienced searchers to find content
within a database without knowing the structure or subject organization.
library
linear searching Accessing a data file by starting
at the beginning and going through the entire file looking for a
string of data. Linear searching is inherently slower than other
means of accessing a file, but it can accommodate searching for
multiple strings of data.
LSI (Latent Semantic Indexing) Technique for condensing
vector space into fewer dimensions. LSI uses a matrix to facilitate
location of words within documents. LSI also facilitates cross-language
retrieval of documents.
MARC (Machine-Readable Cataloging)
Originally developed at the Library of Congress as a format for
distributing catalog records on magnetic tape, MARC is a format
for describing bibliographic items. In a MARC record, each line
of the record begins with a coding indicating the line's content
followed by the content to be displayed. For example, the 245 tag
indicates the title of an item, while the 100 tag indicates the
author. MARC records are the foundation for library OPACs (Online
Public Access Catalogs) and facilitate describing, searching, and
displaying information about bigliographic items.
markup languages High-level computer languages that
describe formatting of documents or other types of materials. HTML
is one of the most well known markup languages.
metadata Data about data.
MIME Originally used to describe info sent via email
using a two-part coding scheme that provides a generic and a specific
description. For example, the MIME type for a GIF is image/gif.
Image is the generic description, while gif is the specific description.
mirror A digital copy of a collection of data
mirror site A network computer system used to store
copies of data collections
Mosaic NCSA Mosaic was the first widely distributed
and used Web browser. Marc Andreesen led development of Mosaic in
1993 and would later found Mosaic Communications Corporation which
would become Netscape Communications in 1994.
Mozilla Development name for the Netscape browser.
Mozilla was to be the monster program that would kill Mosaic, thus
the name Mozilla. Currently, the Mozilla project focuses on developing
an open source browser through worldwide cooperative efforts. Current
versions of Netscape Navigator are based on Mozilla.
MPEG (Motion Pictures Experts Group) A Standard for
compressing and storing moving images digitally. MPEG-1 is the most
common standard and is useful for videoconferencing. MPEG-2 is more
commonly used for storing motion pictures on disk.
natural language query A database
search query issued using normal spoken language syntax. One of
the more frequently used Internet search services that uses natural
language queries is Ask Jeeves.
Netscape The company and the browser that really supercharged
the Web revolution. Begun by Marc Andreesen, one of the original
NCSA developers of Mosaic, Netscape once held over 80% of the browser
market with its popular Netscape Navigator Web browser. Microsoft's
entry into the browser market with its Internet Explorer came late,
but with the considerable clout of the company, IE has reversed
the market and now holds nearly 80% to Netscape's less than 20%.
network protocols Procedures for organizing and exchanging
data among computers. Data exchange over the Internet is managed
with the TCP/IP suite, which includes Transmission Control Protocol
and Internet Protocol.
OCR (Optical Character Recognition)
Conversion of printed text into digitally encoded text. A page of
print can be scanned using a scanner and is then converted into
a digital representation character by character. OCR converted texts
typically require editing since character recognition accuracy can
vary depending on the quality of the scanned text. Generally speaking,
sans serif fonts are more accurately converted during the OCR process
than are serif fonts. The darkness of the original type face can
also affect OCR accuracy.
OPAC (online public access catalog) Computerized version
of a library's card catalog.
PCL (Printer Control Language) Developed by Hewlett
Packard, PCL is a language designed specifically for printing. PCL
is not as portable across printer models as PostScript but it does
support faster printing of images than the PostScript language.
PDF (Portable Document Format) A display
format developed by Adobe Systems based on the PostScript printing
language which displays an exact image of an original document with
all its layout and type characteristics intact. PDF files can be
created using Adobe's Acrobat or PageMaker software and can be read
using the free Acrobat Reader, which can be used as a plug-in for
Netscape Communicator and Microsoft Internet Explorer to share PDF
files over the Internet.
PDL (Page Description Language) Formatting language
used in specifying page layouts, including positioning of text and
graphics.
PNG (Portable Network Graphics) An open source
Internet graphics compression format designed to replace the Compuserve
GIF format. Developed during 1995, PNG was recommended by the W3C
as a Web standard the following year. Like the GIF, a PNG supports
lossless, cross-platform graphics that are easily viewed over the
Internet. Development of the new format was spurred by the Unisys
Corporation announcement that it would charge license fees for its
patented LZW compression algorithm, which is the basis of the GIF
file format. Not all browsers inherently support PNG at this point,
but that should be only a temporary predicament.
PostScript A complex programming language developed
by Adobe Systems for generating graphic images that may include
multiple fonts, colors, and bitmapped images. Although PostScript
is usually thought of primarily in terms of printing, it is a full
programming language having its own set of commands that facilitate
page layout as well as printing. PostScript is widely accepted as
a standard language for interfacing with a variety of printers.
PPI (Pixels Per Inch) A measure of the dimensions
of a graphic image. A pixel is a single small square of color information
that can be displayed on a computer monitor or can be printed on
a printer. The higher the pixels per inch the finer the printed
image will be. Images stored solely for viewing on a computer monitor
need not have a high number of pixels since the typical monitor
displays at the resolution of 72 pixels per inch.
precision The extent to which a search of a database
produces results that exactly match a user's query. A search that
retrieves 30 documents, 20 of which are considered relevant, has
precision of 66.66%. Precision is used along with recall to measure
success of an information retrieval system.
protocol A procedural code. For example, network protocols
describe the procedures used by linked computer systems for exchanging
data over the network.
proxy A go-between. In the world of libraries and
databases, proxy services are used to link authorized users to research
databases.
Public Key/Private Key The most secure of two encryption
methods used for exchanging information over the Internet. Each
recipient of a message has both a public and a private encryption
key. A sender uses the public key to encrypt a message. The recipient
uses his or her private key unencrypt it.
PURL Persistent URL
query A request for information from
a database issued in a specific format. A user looking for information
on an author would issue an author query to a database to determine
what data is available.
RealAudio/Video Designed by RealNetworks,
RealAudio and RealVideo are generally accepted Internet standards
for enabling streaming audio and video over the Internet. Instead
of waiting for a sound or video file to download, RealPlayer allows
the file to start playing as soon as a user initiates reception.
Over a fast Internet connection, RealAudio/Video appears to come
across in real time.
recall The extent to which a search of a database
produces relevant documents. A search of a database that contains
40 relevant documents that produces a resulting set of 30 documents
is said to have 75% precision. Recall is used along with precision
to measure success of an information retrieval system.
render In the computer world, to render is to take
digital data and display it in viewable format on a computer screen.
For example, Web browsers receive document and image data from a
Web server and render the data in the form of a Web page or Web
pages.
RGB (Red Green Blue) A color model that uses three
colors red, green, and blue to reproduce up to 16.7
million colors on a computer monitor. The RGB model is used by all
computer monitors to reproduce images on the screen. In the RGB
model, each pixel that composes an image is assigned an intensity
value ranging from 0 (black) to 255 (white).
RSA Encryption Considered the standard for encrypting
data to be exchanged over the Internet, RSA Encryption is an encryption
algorithm developed by the security company RSA. In 2000, RSA announced
that it was releasing its encryption standard to the public domain,
which meant that anyone wanting to build secure technology based
on the RSA standard could do so without licensing the algorithm
from RSA.
scanner An optical device that can be interfaced with
a computer to allow for importing a digital image of a physical
object into a computer program. Scanners can be hand-held, flat-bed,
or sheet-fed. On a flat-bed scanner, the object to be scanned is
placed against a glass panel (usually page size) and is scanned
by a scanning head consisting of a linear array of light-sensitive
sensors. The image is illuminated by a high-intensity light and
the reflected light is picked up by the sensors and transmitted
as digital codes photo-editing or OCR software.
server A computer that provides or
serves access to client computers.
SGML (Standard Generalized Markup Language) A syntax
for marking up a document to have certain layout, text display,
and image display. SGML imbeds tags describing certain aspects of
a document in angle brackets (<>) and surrounds all text elements
with tags to describe how the text will be displayed. For example,
a line of bold text will be enclosed inside the tags . The resulting
display will be to show the enclosed text in bold. SGML provides
standards for laying out tables, displaying graphics, and for producing
special characters like the pound sign, the ampersand, and the copyright
symbol. Unlike PDF documents, SGML documents can be rearranged according
to user specifications, producing a document display that may be
larger than or smaller than the original. This provides considerable
flexibility from one machine to another. The down side of this is
that an author wanting control over the appearance of his or her
document essentially loses control to the reader. Current SGML standards
include the American Association of Publishers' Electronic Manuscript
Standard, the Department of Defense Continuous Acquisition and Life-Cycle
Support rules (CALS), and the Text Encoding Initiative standard
(TEI).
shared cataloging OCLC is prime example of shared
cataloging at work. In the OCLC model, member libraries cooperatively
contribute to the cataloging database so that no one library must
carry the burden of cataloging every published book.
SMTP (Simple Mail Transfer Protocol) An Internet
standard for establishing and negotiating communications between
sender and recipient of an email transaction.
snail mail Traditional paper-based mail.
speech compression Use of algorithms to sample voice
as it is recorded and reduce the size of the resulting digital file.
The most common standard for compressing speech, that can also be
used to compress music, is the GSM algorithm used for digital cellular
phones. GSM can compress speech by a factor of 5, producing files
that occupy approximately 1600 bytes per second of recorded information.
speech recognition Means for receiving and interpreting
spoken words in a computer system. Using a microphone, a user is
able to voice commands to a software application and execute program
routines without use of the keyboard.
spider In Internet search engine technology, the computer
system that actually "crawls" the Web looking for information
to bring back for indexing. Most search engines use the combination
of a spider or crawler to search for live connections, an indexing
and storage system for keeping the information and providing access
to it, and a user interface system that allows querying of the database.
SSL (Secure Sockets Layer) A protocol developed by
Netscape for securely delivering private information over the Internet.
SSL works using encryption technology.
stop word Non-searchable words in a database. Some
common stop words are short words like prepositions and words like
and, or, and not that are used as logical connectors in the database
search query language.
structural metadata Information about document formats
and structures.
style sheet An overall guide that specifies document
formatting features. Typical elements included in a style sheet
might include paragraph formats, font faces, font sizes, and other
page layout information.
subscription Payment for continuing access to a print
publication or to an online system.
tag An element of a markup language
that describes formatting or actions in a page or document layout
scheme. For example, HTML uses page formatting tags (indicated by
enclosure in brackets <>), font formatting tags, and data
embedding tags to provide content in a browser readable format.
TCP/IP (Transmission Control Protocol/Internet Protocol)
Two basic protocols that enable communications over the Internet.
IP links computers and networks using a standard protocol or language.
TCP facilitates passing data over IP from one computer to another
by routing packets of data to a specific IP address.
TCP/IP suite Set of computer programs shared among
computers linking via TCP/IP protocol, including terminal emulation
(telnet), file transfer (FTP), electronic mail (Email, SMTP).
telnet An Internet protocol that allows a PC to act
as a "dumb terminal" and is a standard protocol for logging
onto another computer system over the Internet and running programs
from a remote terminal or workstation. Telnet was developed as part
of the ARPAnet project.
TeX (LaTeX) TeX is a macro processor that provides
document creators control over typographical formatting. Not the
easiest user interface to master TeX was used as the basis for creating
LaTeX and Plain TeX, two typographical formatting tools with easier
to master user interfaces.
TIFF (Tagged Independent File Format) A format used
to exchange files between computer applications and platforms (Windows,
Mac, etc.). The TIFF format is shared by almost all photo-editing
and paint programs and is supported by most scanner software as
an option for saving scanned images. TIFFs are bitmapped images
that support RGB and CMYK color schemes as well as grayscale.
truncation Use of special characters to search variant
word endings in a database. Some common truncation symbols include
the asterisk and the question mark. In a database that uses the
question mark as a truncation symbol, a searcher could any form
of the word education by using the root of the word followed with
the question mark (educat?).
Unicode 16-bit text format used to
represent characters used in most the world's languages on a computer
screen. UTF-8 is an 8-bit encoding scheme used for the same purpose.
union catalog In a multi-site system, the combined
catalogs of multiple libraries, searchable as a single database,
is called a union catalog.
URL (Uniform Resource Locator) The digital "address"
of a resource available over the Internet. URLs are prefaced with
the name of the protocol used to access a resource. For example,
a hypertext source is prefaced with the http:// protocol designator
while a telnet system is prefaced with the protocol designator telnet://.
The address may be a combination of letters and numbers but is translated
into a numeric value by a domain name server, a computer system
assigned to resolve addresses on the Internet.
URN (Uniform Resource Name) A locator for a specific
Internet resource that is independent of location. In contrast,
a URL specifies a specific machine address for locating a resource.
The URN is machine-independent.
user interface The software configuration that allows
a user to access features of a program. In a search engine, the
user interface would include navigation buttons, an input box (or
boxes) for entering data to be located, and various dialog boxes
and help screens necessary to effective navigation of the software.
vector graphic An image format that
uses geometrical shapes likes lines and curves to define images.
Vector graphics are frequently used to define type faces or high
definition shapes where the smoothness of the image at all resolution
settings is important. Unlike bitmap images, which can lose their
definition and look chunky at increased resolutions, vector graphics
retain their definition at any resolution. When displayed on a computer
monitor, both bitmaps and vector graphics are displayed in pixels.
vector model Information retrieval model introduced
by Gerard Salton that represents documents using vectors based on
word frequency within a document. The more times a word occurs within
a document, the stronger the vector representing the document. A
search of the database on a word would produce an ordered listing
of documents whose vectors most closely match the search query.
virtual library A
collection of resources and services similar to those available
in a physical library building but available to users over a computer
system, typically the Internet.
Virtual Reality Modeling Language (VRML)
Originally released in 1994, VRML is a file format specification
that provides for the creation of 3D environments viewable over
the Web.
W3C (World Wide
Web Consortium) A cooperative of over 150 organization
whose responsibility is to set standards for exchange of information
over the Web. "The World Wide Web Consortium (W3C) develops
interoperable technologies (specifications, guidelines, software,
and tools) to lead the Web to its full potential as a forum for
information, commerce, communication, and collective understanding."
WAIS (Wide Area Information Server) A search and retrieval
system used for accessing databases over the Internet. The full
text of databases can be searched using WAIS systems.
watermark In the electronic publishing world, an embedded
digital signature that identifies the creator of the file.
WAV Sound file format standard in the Windows operating
system. WAV files typically use quite a bit of disk space and aren't
the most efficient means for transmitting sound digitally.
Web crawler The part of a search engine that scours
the Web for pages that can be indexed. Also known as a spider.
Web page A document that resides on the World Wide
Web. Web pages may or may not have print equivalents, may exist
only on the Web, and may incorporate graphics, sounds, motion pictures,
and other multimedia elements to provide information.
Web server In the client/server computing model, the
file server on which Web pages reside and from which they are served
to client workstations.
Web site A collection of Web pages that are focused
on a unified theme or on multiple themes. For example, Microsoft's
Web site has information on the company, assistance for using Microsoft
products, and an online store. A Web site can consist of a few pages
(as is the case with many personal pages) or may be an extensive
collection of hundreds of pages and other servable information.
World Wide Web (WWW) The graphics rich, sound and
video enabled, hyperlinked part of the Internet that many people
confuse with being synonymous with the Internet. The communication
protocol that supports the Web is called Hypertext Transfer Protocol
(http). Anyone with access to the Internet and who is running a
Web browser on his or her workstation can access materials on the
Web and get the full benefit of multimedia content and hypertext
links to other resources. The full Internet includes other types
of systems including FTP sites, Gopher servers, and telnet systems,
to name a few.
WYSIWYG (What You See Is What You Get) A format for
laying out documents where the image displayed on the screen is
the same as what a user will ultimately print. Developed by Xerox,
this standard for layout and print is used by most word processing
programs at this point.
Xerox Parc The Xerox Corporation's
Palo Alta Research Center (Parc). The company has been essential
in the development of many computer products, including graphical
user interfaces, electronic document delivery systems, printing
technologies, and search technologies.
XSL (eXtensible Style Language) A system of document
description, XSL is a specification that allows for the separation
of style and content in HTML and XML pages.
XML (eXtensible Markup Language) A document markup
language based on SGML that allows for user defined tags that specify
in what format content is delivered. XML has been touted as the
replacement for HTML, but widespread support and acceptance have
yet to be seen.
Z39.50 Digital exchange protocol that
allows a computer on one system to search collections of information
on a remote system using field searching.
For additional
Web terminology, visit the Webopedia at http://www.webopedia.com/. |