|
|
|
|
| Acrobat |
|
Software developed by Adobe
Systems to facilitate cross-platform presentation of text and
graphics. Acrobat generates documents in PDF (portable document format). |
| |
|
|
| Antialiasing |
|
A technique for smoothing
bitonal graphic images by using grayscale pixels to gradually provide
transition from the black to the white portions of an image. Without
antialiasing, a diagonal line would appear choppy instead of smooth.
This is especially useful in text scans where letters may be composed
of not only vertical and horizontal strokes but of varying degrees
of curves and diagonal lines. The overall effect is a smooth outline
for all characters. |
| |
|
|
|
Arpanet (Advanced Research Projects Agency
Network)
|
|
Developed in 1969 by Larry
Roberts and supported by the United States Department of Defense,
the Arpanet can be considered the beginning of today's Internet. At
the time, the network was developed to facilitate high-speed communication
between institutions doing military research. |
| |
|
|
| Artificial
Intelligence |
|
A long pursued goal of computing
system designers, a system capable of artificial intelligence would
actually be able to approximate human thought processes, understand
spoken commands, and respond appropriately. Speech synthesis has already
progressed considerably, but the goal of producing a truly intelligent
system has still not been achieved. |
| |
|
|
| ASCII
(American Standard Code for Information Interchange) |
|
A text representation standard
that stores one alphabetic, numeric, or other character in one computer
byte. This is the most common means for representing characters and
allows plain text to be shared among multiple applications. Saving
a document in ASCII format in a word processor like WordPerfect means
that the same document can be used in another different word processor
without having to be converted. ASCII does not allow, however, for
page layout like word processing and page layout programs do. A document
originally created with bold and italic text and with special formatting
(tables, indentions, etc.) will lose all special formatting and text
characteristics when saved to ASCII format. |
| |
|
|
| Assymetric
Cryptography |
|
Use of one-way encryption
of messages. A sender can use encryption to send a message to a person
with the capability of decoding the message. The recipient of the
message does not have the capability of encoding messages, so the
transaction is one-way or assymetric. |
| |
|
|
| Automatic
Speech Recognition |
|
The ability of a computer
system to recognize and interpret human speech by use of software
designed to analyze sounds and interpret them according to a programmed
vocabulary. The applications of such a system range from enabling
a user to command a computer to perform certain tasks (shut down,
open file, etc.) to enabling hands-off dictation of entire texts to
a computer system to identifying individuals by matching voice and
pronunciation to know patterns. |
| |
|
|
| Binary
Search |
|
A means of quickly searching
a sequential data list. A search for a match begins in the middle
of the file and continues subdividing the list until a match is found
or until the list is small enough to be searched sequentially. |
| |
|
|
| Bitmap |
|
Also called a raster image,
a bitmap uses a grid of pixels to reproduce an image. Each pixel is
a small square that is assigned a color value and location within
the bitmap grid. The combination of pixels can produce anything from
a low resolution bitonal image to a high resolution full color image
of photographic quality. Most photo-editing software uses the bitmap
format as a standard for editing images. Instead of editing an entire
image, a photo-editor can edit each individual pixel to produce desired
results. |
| |
|
|
| Boolean
Operators |
|
Logical operators used to
facilitate database searching. AND, OR, and NOT are commonly used
Boolean operators. Boolean logic is based on the work of British mathematician
George Boole, whose work in algebra established the logical principles
of set theory. |
| |
|
|
| Browsing |
|
Looking through a collection
of materials without a particular goal. People often browse the Internet
looking for nothing in particular so much as just for something interesting.
Browsing physical library collections is also a common user approach.
When a user is familiar with the arrangement of a collection (history
in one area, literature in another, etc.), browsing can be a productive
means of finding specific materials. |
| |
|
|
| CCITT
Group IV Fax Compression |
|
An international standard
for facsimile transmission developed by the Consultative Committee
on International Telecommunications Technology (CCITT), now known
as the International Telecommunications Union (ITU). The standard
supports compression for more efficient image transmission. |
| |
|
|
| CD ROM
(Compact Disk Read Only Memory) |
|
A hard plastic disk measuring
12 centimeters (5 3/4 inches in diameter) which is composed of a very
thin layer of metal sandwiched between plastic layers. The metallic
layer is imprinted with depressions of varying depths that can be
interpreted as images, sounds, and text. A single CD ROM can hold
around 650 megabytes of information, over 450 times the capacity of
a single floppy disk (1.44 megabytes). Even larger storage capacities
are possible with the newer DVD technology (Digital Versatile Disk). |
| |
|
|
| CIE (Commission
Internationale de l"Eclairage) Standard Colorimetric System |
|
International standard for
representing color in three dimensions -- lightness, red-green, and
yellow-blue. The CIE model represents all colors that can be perceived
by the human eye. Colors that can be reproduced on a color monitor
or in a color photograph comprise only a subset of the full visual
range. |
| |
|
|
| Clustering |
|
A technique for retrieving
data based on frequency of use. For example, a text that was accessed
numerous times and had been scanned in full (rather than being viewed
and passed over) would rank higher in a retrieval list than a text
that had only be viewed (rather than fully accessed) or that had rarely
been accessed at all. The idea behind clustering is that the more
useful items in a collection will be more frequently accessed by users. |
| |
|
|
| CMYK (Cyan
Magenta Yellow Black) |
|
A color model based on the
light-absorbing properties of color inks on paper. The combination
of cyan, magenta, and yellow on paper should produce the color black,
but because of impurities in ink these must be combined with black
ink to produce true black on paper. The combination of cyan, magenta,
and yellow instead produces a brownish color. This color model is
used in the so-called "four-color" printing process. |
| |
|
|
| Collision
Chain |
|
A collection of data in a
hash table with conflicting values. In other words, because of representing
words with numeric values, numerous words may be represented by the
same value and are said to collide. Usually a secondary hashing scheme
is used to sort through collision chains within a table. |
| |
|
|
| Compression |
|
A scheme for decreasing the
size of a digital file in order to speed retrieval or to save on storage
space. Compression may be lossless or lossy, depending on the algorithm
that is used to compress the file. Where space is not so much an issue,
lossless compression is favored. |
| |
|
|
| Controlled
Vocabulary |
|
A collection of terms used
in indexing materials described in a database. For example, the Library
of Congress Cataloging System uses a controlled vocabulary to assist
catalogers in uniformly describing materials to be included in a library
catalog. Most online databases utilize controlled vocabularies to
provide access points to the materials indexed in the databases. For
each item described in the database, an indexer/cataloger further
analyzes the content and assigns specific subject terms to further
describe the item. Most systems using controlled vocabularies maintain
detailed thesauri that define and cross-reference the terms used in
the vocabulary. |
| |
|
|
| Copyright |
|
Legal claim to intellectual
property. In the United States, the 1976 revision to the copyright
law provided a duration of 75 years. The Berne Convention, which the
U.S. will be applying provides for the life of the author plus 50
years as the period of copyright. |
| |
|
|
| Cryptography |
|
The process of encoding information
using a secret algorithm or key. Cryptography is especially important
in electronic communications since it is relatively easy for an outsider
to intercept messages being sent over the Internet or even to impersonate
someone else when sending messages. People wanting to exchange messages
securely can use encryption technology to secure their exchanges. |
| |
|
|
| Cryptolope
or Secret Envelope |
|
Refers to IBM's encryption
technology designed to assist companies in doing business over the
Internet. The vendor using cryptolopes can set options which either
allow or don't allow users to view, download, copy, print, or otherwise
access materials from its Web site. Further information on cryptolopes
is available directly from IBM. |
| |
|
|
| CSMA/CD
(carrier sense multiple access, collision detection) |
|
Standard for transferring
data across an Ethernet LAN. Devices needing access to the network
will check for available bandwidth. If the network is free, the device
will broadcast. If it is busy, the device waits for a random amount
of time before trying again. |
| |
|
|
| Database |
|
A collection of related files
that are managed with a database management system. A database may
be text only or may include media of many types (sound, video, graphics,
etc.). |
| |
|
|
| Deskewing |
|
Realignment of a scanned image
to return it to its proper orientation. Skewing can result from improper
placement of a page on a scanner or misalignment that can occur when
using a sheet feeding scanner. |
| |
|
|
| Digital
Library |
|
Digital libraries are organizations
that provide the resources, including the specialized staff, to select,
structure, offer intellectual access to, interpret, distribute, preserve
the integrity of, and ensure the persistence over time of collections
of digital works so that they are readily and economically available
for use by a defined community or set of communities. (Source: Digital
Library Federation) |
| |
|
|
| Dithering |
|
A process of approximating
a color or shade of gray by using dots of varying values. Dithering
is used to improve the quality of a compressed digital image by representing
subtle changes with varying dot sizes and intensities. Dithering produces
more photorealistic images without drastically increasing file size. |
| |
|
|
| DLI
(Digital Library Initiative) |
|
NSF, NASA, and ARPA funded
program focused on digitizing and providing access to library collections.
Six universities are the primary sites for the Inititiative: University
of California at Berkeley, University of Michigan, University of Illinois,
Stanford University, University of California at Santa Barbara, and
Carnegie Mellon University. |
| |
|
|
| DPI (Dots
Per Inch) |
|
A means of describing printed
text or image quality based on the number of ink dots that are used
to produce a solid image. Standard resolution on paper for black text
from an inkjet printer is 300 dpi. Laser printers and some more expensive
inkjet printers are capable of 600 dpi or higher. The higher the dpi
number, the finer the printed image will appear. |
| |
|
|
| DTD (Document
Type Definition) |
|
Document formatting definition
file used in SGML. SGML makes use of tags to control the presentation
of a document. The tags and their functions are defined in the DTD
file. |
| |
|
|
| DVD (Digital
Versatile Disk) |
|
High capacity storage medium
based on and the same size as a Compact Disk. Where a standard CD
can hold up to 650 MB of information, a DVD can hold anywhere from
4 to 9 Gigabytes on a side or up to 17 Gigabytes on a double sided
disk. DVD was developed originally as a medium for storing compressed
digital movies which were too large to fit on a single CD. |
| |
|
|
| Ethernet |
|
The most commonly used local
area network (LAN). Developed by Xerox,
Digital, and Intel,
Ethernet allows connection of up to 1024 nodes over twisted pair,
coax, or fiber and is established as an IEEE standard (IEEE 802.3). |
| |
|
|
| FTP (File
Transfer Protocol) |
|
An Internet communications
standard that facilitates the sharing of files from one computer to
another. Many Web sites offer file downloads to users using FTP. Indeed,
some repositories of digital texts provide users with the option to
download full text via an FTP site. |
| |
|
|
| Gamma |
|
A numeric value representing
the difference between the input and output of a device. Most commonly,
gamma is used in relation to monitors, which may display images either
brighter or darker than the original scanned image. Gamma correction
is an integral part of photo editing software and can be used to adjust
scanned images so that they display and print more like the original. |
| |
|
|
| GIF (Graphics
Interchange File) |
|
A standard image representation
format used extensively to make images available over the Internet.
A GIF image can have a maximum of 256 colors or shades, which makes
it useful for preserving the sharpness of bitonal or grayscale images.
Although not as useful as other formats in preserving the richness
of natural color scenes (the GIF compression scheme reduces the number
of colors in a photograph from millions of colors to a maximum of
256), it is widely used to preserve color images for presentation
over the Internet. |
| |
|
|
| Hash
Coding |
|
A search technique that eliminates
the need to scan an entire database by providing access to key words
based on where they appear in a database. Words and their locations
may be represented, for example, by numeric values, which are then
searchable. Hash coding provides quicker search times than linear
searching. |
| |
|
|
| HTML (Hypertext
Markup Language) |
|
The standard formatting language
used on the World Wide Web. Very similar in appearance to SGML, HTML's
primary difference is that it supports hypertext links within a document,
one of the hallmarks of Web page design. HTML uses tags enclosed in
angle brackets (<>) to determine text characteristics, layout, and
position of images and sound files. |
| |
|
|
| HTTP (Hypertext
Transfer Protocol) |
|
Standard protocol that enables
computer systems to exchange hypertext materials over the Internet.
|
| |
|
|
| Internet |
|
The worldwide telecommunications
backbone that supports communication among computer systems around
the world. The Internet had its roots in the Arpanet in 1969 but has
since grown into a vast communications network utilized not only by
government, but by educational institutions, private organizations,
companies, and individuals. |
| |
|
|
| Internet
Browser |
|
Software that allows a user
to view materials on the Internet. The three best known browsers are
Netscape Navigator, Microsoft Internet Explorer, and NCSA Mosaic.
|
| |
|
|
| IP (Internet
Protocol) Address |
|
Typically used to refer to
the numeric address of a computer connected to the Internet. IP addresses
can be static (permanently assigned) or dynamic (assigned temporarily
to enable computer to computer communications and consist of numeric
strings punctuated by a period. For example, the IP address 139.62.208.133
might designate an individual workstation at a specific organization.
The first three strings of numbers would normally represent a particular
computer system within an organization, while the final three digits
of the address would locate an individual machine attached to the
network at that site. |
| |
|
|
| Java |
|
Platform-independent programming
language created by Sun Microsystems.
Similar in structure to C++, Java applets can be imbedded in HTML
documents and enable Web users to run programs on their computers
regardless of their operating system (Windows, Apple, UNIX, etc.).
Both Netscape Navigator and Microsoft Internet Explorer browsers have
built-in support for Java. |
| |
|
|
| JPEG (Joint
Photographic Experts Group) |
|
A standard image representation
format used extensively to make images available over the Internet.
A JPEG image preserves natural color more efficiently than the GIF
format while still providing image compression sufficient to make
it useful as a means of representing images in a computer system.
The JPEG compression algorithm selectively discards color information
based on the amount of compression desired. JPEGs support RGB, CMYK,
and grayscale color modes. |
| |
|
|
| Keyword
Searching |
|
Search capability that allows
querying a database by any chosen words. Keyword searching can accommodate
the use of Boolean operators, phrase and proximity searching, and
allows for inexperienced searchers to find content within a database
without knowing the structure or subject organization. |
| |
|
|
| Linear
Searching |
|
Accessing a data file by starting
at the beginning and going through the entire file looking for a string
of data. Linear searching is inherently slower than other means of
accessing a file, but it can accommodate searching for multiple strings
of data. |
| |
|
|
| LSI (Latent
Semantic Indexing) |
|
Technique for condensing vector
space into fewer dimensions. LSI uses a matrix to facilitate location
of words within documents. LSI also facilitates cross-language retrieval
of documents. |
| |
|
|
| MARC (Machine-Readable
Cataloging) |
|
Designed at the Library of
Congress, MARC is a format for describing bibliographic items. In
a MARC record, each line of the record begins with a coding indicating
the line's content followed by the content to be displayed. For example,
the 245 tag indicates the title of an item, while the 100 tag indicates
the author. MARC records are the foundation for library OPACs (Online
Public Access Catalogs) and facilitate describing, searching, and
displaying information about bigliographic items. |
| |
|
|
| MPEG (Motion
Picture Experts Group) |
|
A Standard for compressing
and storing moving images digitally. MPEG-1 is the most common standard
and is useful for videoconferencing. MPEG-2 is more commonly used
for storing motion pictures on disk. |
| |
|
|
| OCR (Optical
Character Recognition) |
|
Conversion of printed text
into digitally encoded text. A page of print can be scanned using
a scanner and is then converted into a digital representation character
by character. OCR converted texts typically require editing since
character recognition accuracy can vary depending on the quality of
the scanned text. Generally speaking, sans serif fonts are more accurately
converted during the OCR process than are serif fonts. The darkness
of the original type face can also affect OCR accuracy. |
| |
|
|
| PCL (Printer
Control Language) |
|
Developed by Hewlett Packard,
PCL is a language designed specifically for printing. PCL is not as
portable across printer models as PostScript but it does support faster
printing of images than the PostScript language. |
| |
|
|
| PDF (Portable
Document Format) |
|
A display format developed
by Adobe Systems based on the PostScript printing language which displays
an exact image of an original document with all its layout and type
characteristics intact. PDF files can be created using Adobe's Acrobat
or PageMaker software and can be read using the free Acrobat Reader,
which can be used as a plug-in for Netscape Communicator and Microsoft
Internet Explorer to share PDF files over the Internet. |
| |
|
|
| Pixels Per
Inch |
|
A measure of the dimensions
of a graphic image. A pixel is a single small square of color information
that can be displayed on a computer monitor or can be printed on a
printer. The higher the pixels per inch the finer the printed image
will be. Images stored solely for viewing on a computer monitor need
not have a high number of pixels since the typical monitor displays
at the resolution of 72 pixels per inch. |
| |
|
|
| PostScript |
|
A complex programming language
developed by Adobe Systems for generating graphic images that may
include multiple fonts, colors, and bitmapped images. Although PostScript
is usually thought of primarily in terms of printing, it is a full
programming language having its own set of commands that facilitate
page layout as well as printing. PostScript is widely accepted as
a standard language for interfacing with a variety of printers. |
| |
|
|
| Precision |
|
The extent to which a search
of a database produces results that exactly match a user's query.
A search that retrieves 30 documents, 20 of which are considered relevant,
has precision of 66.66%. Precision is used along with recall
to measure success of an information retrieval system. |
| |
|
|
| Public
Key/Private Key |
|
The most secure of two encryption
methods used for exchanging information over the Internet. Each recipient
of a message has both a public and a private encryption key. A sender
uses the public key to encrypt a message. The recipient uses his or
her private key unencrypt it. |
| |
|
|
| RealAudio/Video |
|
Designed by RealNetworks,
RealAudio and RealVideo are generally accepted Internet standards
for enabling streaming audio and video over the Internet. Instead
of waiting for a sound or video file to download, RealPlayer allows
the file to start playing as soon as a user initiates reception. Over
a fast Internet connection, RealAudio/Video appears to come across
in real time. |
| |
|
|
| Recall |
|
The extent to which a search
of a database produces relevant documents. A search of a database
that contains 40 relevant documents that produces a resulting set
of 30 documents is said to have 75% precision. Recall is used along
with precision to measure success of an information
retrieval system. |
| |
|
|
| RGB (Red
Green Blue) |
|
A color model that uses three
colors – red, green, and blue – to reproduce up to 16.7 million colors
on a computer monitor. The RGB model is used by all computer monitors
to reproduce images on the screen. In the RGB model, each pixel that
composes an image is assigned an intensity value ranging from 0 (black)
to 255 (white). |
| |
|
|
| Scanner |
|
An optical device that can
be interfaced with a computer to allow for importing a digital image
of a physical object into a computer program. Scanners can be hand-held,
flat-bed, or sheet-fed. On a flat-bed scanner, the object to be scanned
is placed against a glass panel (usually page size) and is scanned
by a scanning head consisting of a linear array of light-sensitive
sensors. The image is illuminated by a high-intensity light and the
reflected light is picked up by the sensors and transmitted as digital
codes photo-editing or OCR software. |
| |
|
|
| SGML (Standard
Generalized Markup Language) |
|
A syntax for marking up a
document to have certain layout, text display, and image display.
SGML imbeds tags describing certain aspects of a document in angle
brackets (<>) and surrounds all text elements with tags to describe
how the text will be displayed. For example, a line of bold text will
be enclosed inside the tags . The resulting display will
be to show the enclosed text in bold. SGML provides standards for
laying out tables, displaying graphics, and for producing special
characters like the pound sign, the ampersand, and the copyright symbol.
Unlike PDF documents, SGML documents can be rearranged according to
user specifications, producing a document display that may be larger
than or smaller than the original. This provides considerable flexibility
from one machine to another. The down side of this is that an author
wanting control over the appearance of his or her document essentially
loses control to the reader. Current SGML standards include the American
Association of Publishers' Electronic Manuscript Standard, the Department
of Defense Continuous Acquisition and Life-Cycle Support rules (CALS),
and the Text Encoding Initiative standard (TEI). |
| |
|
|
| Speech
Compression |
|
Use of algorithms to sample
voice as it is recorded and reduce the size of the resulting digital
file. The most common standard for compressing speech, that can also
be used to compress music, is the GSM algorithm used for digital cellular
phones. GSM can compress speech by a factor of 5, producing files
that occupy approximately 1600 bytes per second of recorded information. |
| |
|
|
| TCP/IP
(Terminal Control Protocol/Internet Protocol) |
|
The set of protocols that
allow different computer systems to exchange information over the
Internet. TCP/IP is platform and machine independent and serves as
the standard communication language for all computer systems tied
into the Internet. |
| |
|
|
| Telnet |
|
Standard protocol for logging
onto another computer system over the Internet and running programs
from a remote terminal or workstation. Telnet was developed as part
of the ARPAnet project. |
| |
|
|
| TIFF (Tagged
Independent File Format) |
|
A format used to exchange
files between computer applications and platforms (Windows, Mac, etc.).
The TIFF format is shared by almost all photo-editing and paint programs
and is supported by most scanner software as an option for saving
scanned images. TIFFs are bitmapped images that support RGB and CMYK
color schemes as well as grayscale. |
| |
|
|
| URL (Uniform
Resource Locator) |
|
The digital "address"
of a resource available over the Internet. URLs are prefaced with
the name of the protocol used to access a resource. For example, a
hypertext source is prefaced with the http:// protocol designator
while a telnet system is prefaced with the protocol designator telnet://.
The address may be a combination of letters and numbers but is translated
into a numeric value by a domain name server, a computer system assigned
to resolve addresses on the Internet. |
| |
|
|
| Vector
Graphic |
|
An image format that uses
geometrical shapes likes lines and curves to define images. Vector
graphics are frequently used to define type faces or high definition
shapes where the smoothness of the image at all resolution settings
is important. Unlike bitmap images, which can lose their definition
and look chunky at increased resolutions, vector graphics retain their
definition at any resolution. When displayed on a computer monitor,
both bitmaps and vector graphics are displayed in pixels. |
| |
|
|
| Vector
Model |
|
Information retrieval model
introduced by Gerard Salton that represents documents using vectors
based on word frequency within a document. The more times a word occurs
within a document, the stronger the vector representing the document.
A search of the database on a word would produce an ordered listing
of documents whose vectors most closely match the search query. |
| |
|
|
| WAV |
|
Sound file format standard
in the Windows operating system. WAV files typically use quite a bit
of disk space and aren't the most efficient means for transmitting
sound digitally. |
| |
|
|
| World Wide
Web (WWW) |
|
The graphics rich, sound and
video enabled, hyperlinked part of the Internet that many people confuse
with being synonymous with the Internet. The communication protocol
that supports the Web is called Hypertext Transfer Protocol (http).
Anyone with access to the Internet and who is running a Web browser
on his or her workstation can access materials on the Web and get
the full benefit of multimedia content and hypertext links to other
resources. The full Internet includes other types of systems including
FTP sites, Gopher servers, and telnet systems, to name a few. |
| |
|
|
| WYSIWYG
(What You See Is What You Get) |
|
A format for laying out documents
where the image displayed on the screen is the same as what a user
will ultimately print. Developed by Xerox, this standard for layout
and print is used by most word processing programs at this point. |
| |
|
|
| |
|
|
Updated 7 December 1998.