Comments

Page updated
April 2003.

    A Virtual Libraries Vocabulary A to Z

A
B
C
D
E
F
G
H
I
J
K
L
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

        

        

 


Following are some of the most frequently used terms for discussing the Internet and virtual libraries. This is admittedly a limited selection of terminology.

AACR2 (Anglo-American Cataloging Rules) A set of standards for describing materials for access via a library catalog or database.
access management – Means by which a system administrator can control who retrieves materials over a computer network and under what circumstances. Although materials available on the Web typically are not restricted, databases carry access restrictions.
Acrobat -- Software developed by Adobe Systems to facilitate cross-platform presentation of text and graphics. Acrobat generates documents in PDF (portable document format).
administrative metadata – Such information as rights, permissions, and access data as it applies to a specific data object.
aggregator – Services that gather publications from a number of publishers and offer the packaged data to users as a single service.
antialiasing – A technique for smoothing bitonal graphic images by using grayscale pixels to gradually provide transition from the black to the white portions of an image. Without antialiasing, a diagonal line would appear choppy instead of smooth. This is especially useful in text scans where letters may be composed of not only vertical and horizontal strokes but of varying degrees of curves and diagonal lines. The overall effect is a smooth outline for all characters.
applet – a small program that can be served by a network computer and run on a computer workstation. Typically applets are thought of as Java programs that are cross-platform and can be initialized by code strings embedded in a Web page.
ARPANET – Developed in 1969 by Larry Roberts and supported by the United States Department of Defense, the Arpanet can be considered the beginning of today's Internet. At the time, the network was developed to facilitate high-speed communication between institutions doing military research.
artificial intelligence – A long pursued goal of computing system designers, a system capable of artificial intelligence would actually be able to approximate human thought processes, understand spoken commands, and respond appropriately. Speech synthesis has already progressed considerably, but the goal of producing a truly intelligent system has still not been achieved.
ASCII (American Standard Code for Information Interchange) – A text representation standard that stores one alphabetic, numeric, or other character in one computer byte. This is the most common means for representing characters and allows plain text to be shared among multiple applications. Saving a document in ASCII format in a word processor like WordPerfect means that the same document can be used in another different word processor without having to be converted. ASCII does not allow, however, for page layout like word processing and page layout programs do. A document originally created with bold and italic text and with special formatting (tables, indentions, etc.) will lose all special formatting and text characteristics when saved to ASCII format.
assymetric cryptography – Use of one-way encryption of messages. A sender can use encryption to send a message to a person with the capability of decoding the message. The recipient of the message does not have the capability of encoding messages, so the transaction is one-way or assymetric.
authentication – Means by which a computer workstation or computer user establishes a connection to another workstation or to an online system. Authentication may be via a set protocol or via username and password. Many database vendors now offer IP authentication, which works by allowing a user workstation to broadcast its IP address to the database server for verification against valid user IPs. If the IP results in a match in the vendors authorized user base, the user is allowed access to the database. Otherwise, access is blocked.
automatic speech recognition – The ability of a computer system to recognize and interpret human speech by use of software designed to analyze sounds and interpret them according to a programmed vocabulary. The applications of such a system range from enabling a user to command a computer to perform certain tasks (shut down, open file, etc.) to enabling hands-off dictation of entire texts to a computer system to identifying individuals by matching voice and pronunciation to know patterns.
binary search – A means of quickly searching a sequential data list. A search for a match begins in the middle of the file and continues subdividing the list until a match is found or until the list is small enough to be searched sequentially.
bitmap – Also called a raster image, a bitmap uses a grid of pixels to reproduce an image. Each pixel is a small square that is assigned a color value and location within the bitmap grid. The combination of pixels can produce anything from a low resolution bitonal image to a high resolution full color image of photographic quality. Most photo-editing software uses the bitmap format as a standard for editing images. Instead of editing an entire image, a photo-editor can edit each individual pixel to produce desired results.
Bitnet (Because It's There Net) – Network developed by the City University of New York and Yale University for facilitating data transfer between the institutions. The Bitnet helped lay groundwork for the Internet and served as a communications standard between universities for years.
Boolean operators – Logical operators used to facilitate database searching. AND, OR, and NOT are commonly used Boolean operators. Boolean logic is based on the work of British mathematician George Boole, whose work in algebra established the logical principles of set theory.
browser – Software that allows a user to view materials on the Internet. The three best known browsers are Netscape Navigator, Microsoft Internet Explorer, and NCSA Mosaic. Although typically thought of as an Internet only program, browsers can also be used to provide access to local files and to access information offline.
browsing
– Looking through a collection of materials without a particular goal. People often browse the Internet looking for nothing in particular so much as just for something interesting. Browsing physical library collections is also a common user approach. When a user is familiar with the arrangement of a collection (history in one area, literature in another, etc.), browsing can be a productive means of finding specific materials.
cache – A temporary copy of a set of data that is stored on a computer workstation to speed up access times.
catalog – Collection of bibliographic records created according to strict rules.
CCITT Group IV Fax Compression
An international standard for facsimile transmission developed by the Consultative Committee on International Telecommunications Technology (CCITT), now known as the International Telecommunications Union (ITU). The standard supports compression for more efficient image transmission.
CD ROM – (Compact Disk Read Only Memory) A hard plastic disk measuring 12 centimeters (5 3/4 inches in diameter) which is composed of a very thin layer of metal sandwiched between plastic layers. The metallic layer is imprinted with depressions of varying depths that can be interpreted as images, sounds, and text. A single CD ROM can hold around 650 megabytes of information, over 450 times the capacity of a single floppy disk (1.44 megabytes). Even larger storage capacities are possible with the newer DVD technology (Digital Versatile Disk).
CGI
(Common Gateway Interface) – A programming interface that allows a browser to access information services other than Web pages on a network server. One of the most common uses of CGI is to process data submitted via HTML forms in a Web page. The CGI script processes the user input and transmits the formatted data to the server for interpretation/action.
CIE (Commission Internationale de l"Eclairage) – Standard Colorimetric System International standard for representing color in three dimensions -- lightness, red-green, and yellow-blue. The CIE model represents all colors that can be perceived by the human eye. Colors that can be reproduced on a color monitor or in a color photograph comprise only a subset of the full visual range.
classification – An organizing scheme used for grouping materials. Library's typically organize materials by subjects. Similarly, databases will have classification schemes that typically assign subject headings or descriptors to articles or materials included.
client – In the server/client model of computing, the "subservient" machine that depends on the server for its information. On an intranet, the client might be the user workstation, which can access programs and data from the organization's file server. On the Internet, a user's workstation is the client to the many Web servers that offer digitized information.
clustering – A technique for retrieving data based on frequency of use. For example, a text that was accessed numerous times and had been scanned in full (rather than being viewed and passed over) would rank higher in a retrieval list than a text that had only be viewed (rather than fully accessed) or that had rarely been accessed at all. The idea behind clustering is that the more useful items in a collection will be more frequently accessed by users.
CMYK (Cyan Magenta Yellow Black) – A color model based on the light-absorbing properties of color inks on paper. The combination of cyan, magenta, and yellow on paper should produce the color black, but because of impurities in ink these must be combined with black ink to produce true black on paper. The combination of cyan, magenta, and yellow instead produces a brownish color. This color model is used in the so-called "four-color" printing process.
collision chain – A collection of data in a hash table with conflicting values. In other words, because of representing words with numeric values, numerous words may be represented by the same value and are said to collide. Usually a secondary hashing scheme is used to sort through collision chains within a table.
compression – A scheme for decreasing the size of a digital file in order to speed retrieval or to save on storage space. Compression may be lossless or lossy, depending on the algorithm that is used to compress the file. Where space is not so much an issue, lossless compression is favored.
controlled vocabulary – A collection of terms used in indexing materials described in a database. For example, the Library of Congress Cataloging System uses a controlled vocabulary to assist catalogers in uniformly describing materials to be included in a library catalog. Most online databases utilize controlled vocabularies to provide access points to the materials indexed in the databases. For each item described in the database, an indexer/cataloger further analyzes the content and assigns specific subject terms to further describe the item. Most systems using controlled vocabularies maintain detailed thesauri that define and cross-reference the terms used in the vocabulary.
copyright – Legal claim to intellectual property. In the United States, the 1976 revision to the copyright law provided a duration of 75 years. The Berne Convention, which the U.S. will be applying provides for the life of the author plus 50 years as the period of copyright.
cryptography – The process of encoding information using a secret algorithm or key. Cryptography is especially important in electronic communications since it is relatively easy for an outsider to intercept messages being sent over the Internet or even to impersonate someone else when sending messages. People wanting to exchange messages securely can use encryption technology to secure their exchanges.
cryptolope or Secret Envelope – Refers to IBM's encryption technology designed to assist companies in doing business over the Internet. The vendor using cryptolopes can set options which either allow or don't allow users to view, download, copy, print, or otherwise access materials from its Web site. Further information on cryptolopes is available directly from IBM.
CSMA/CD (carrier sense multiple access, collision detection) – Standard for transferring data across an Ethernet LAN. Devices needing access to the network will check for available bandwidth. If the network is free, the device will broadcast. If it is busy, the device waits for a random amount of time before trying again.
CSS (Cascading Style Sheets) – A W3C standard for describing document formatting elements in an HTML document. The strength of CSS is that numerous pages can be changed on the fly just by editing the CSS style document that controls page formatting.
DARPA (Defense Advanced Research Projects Agency) – Formerly ARPA, this federal government agency has traditionally been a major sponsor of computer science research in the United States. One of the best-known outgrowths of DARPA's funding was the ARPAnet, which helped to form the backbone of today's Internet.
data – Information elements that can be gathered and organized and presented in print or via electronic media.
database – A collection of information elements that can be managed with a database management system. A database may be text only or may include media of many types (sound, video, graphics, etc.).
descriptive metadata – In library catalogs or indexes, the bibliographic information for a data object.
descriptor – A subject term assigned from a controlled vocabulary to describe a data object. Most article databases assign multiple descriptors to each article indexed or reproduced in the database in order to provide searchers with subject access to the collections of data.
digital library – Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities. (Source: Digital Library Federation)
Digital Libraries Initiative – NSF, NASA, and ARPA funded program focused on digitizing and providing access to library collections. Six universities are the primary sites for the Inititiative: University of California at Berkeley, University of Michigan, University of Illinois, Stanford University, University of California at Santa Barbara, and Carnegie Mellon University.
distributed computing – The use of clusters of computers to serve data and applications over a network.
distributed searching – Searching over multiple systems.
distributed system – A group of computers that work together to provide access to data.
dithering – A process of approximating a color or shade of gray by using dots of varying values. Dithering is used to improve the quality of a compressed digital image by representing subtle changes with varying dot sizes and intensities. Dithering produces more photorealistic images without drastically increasing file size.
DNS (domain-name service) – One of the key elements of the Internet, the Domain Name Service resolves IP addresses over the Internet.
domain name – The name of a computer connected to the Internet. For example, the domain name for the Whitehouse Web server is whitehouse.gov.
DPI (Dots Per Inch) – A means of describing printed text or image quality based on the number of ink dots that are used to produce a solid image. Standard resolution on paper for black text from an inkjet printer is 300 dpi. Laser printers and some more expensive inkjet printers are capable of 600 dpi or higher. The higher the dpi number, the finer the printed image will appear.
DTD (Document Type Definition) – Document formatting definition file used in SGML. SGML makes use of tags to control the presentation of a document. The tags and their functions are defined in the DTD file.
DVD (Digital Versatile Disk) – High capacity storage medium based on and the same size as a Compact Disk. Where a standard CD can hold up to 650 MB of information, a DVD can hold anywhere from 4 to 9 Gigabytes on a side or up to 17 Gigabytes on a double sided disk. DVD was developed originally as a medium for storing compressed digital movies which were too large to fit on a single CD.
electronic journal – Commonly, a publication that maintains many print characteristics but is produced and distributed online. Term can be used for a journal that is solely digital and for one that exists in both print and digital formats.
electronic library – Often used interchangeably with digital library and virtual library. A collection of resources that users can access over a computer system. In the broadest sense of the term, resources would be available via the Internet. In the most narrow sense, the resources would be served over an Intranet or even on an individual workstation.
email – Electronic mail
Ethernet – The most commonly used local area network (LAN). Developed by Xerox, Digital, and Intel, Ethernet allows connection of up to 1024 nodes over twisted pair, coax, or fiber and is established as an IEEE standard (IEEE 802.3).
fair use – A concept used in copyright to allow for individual use/copying of materials, as long as such use is not for profit or used to produce financial gain.
field searching – Searching in a database for information in specific fields. Metadata searching. In a database that uses author, title, and descriptor indexing, field searching might take the shape of title field searching or author field searching or descriptor field searching. This is in contrast to full text or keyword searching.
field – A specific bit of metadata about a data object. Fields might include author name, title, descriptor, publishing data, publication date, identifiers, etc.
firewall – A machine or software application used to block or limit access to an individual workstation or to a computer network. Just as the firewall of an automobile is designed to prevent damage to the interior of the car and injury to the passengers, a computer firewall is designed to prevent damage to the workstation or file server.
FTP (file transfer protocol) – An Internet communications standard that facilitates the sharing of files from one computer to another. Many Web sites offer file downloads to users using FTP. Indeed, some repositories of digital texts provide users with the option to download full text via an FTP site.
full text searching – Searching through all indexed words in a full-text database. Field searching limits the search for information to metadata. Full text searching is not limited to metadata, but instead probes the data itself.
gamma – A numeric value representing the difference between the input and output of a device. Most commonly, gamma is used in relation to monitors, which may display images either brighter or darker than the original scanned image. Gamma correction is an integral part of photo editing software and can be used to adjust scanned images so that they display and print more like the original.
Gecko – The latest rendering engine designed by the Mozilla project. The Gecko engine is the basis of the Netscape browser and is compliant with current Web standards including HTML 4.0, XML, CSS, and DOM.
GIF (Graphics Interchange Format) – A standard image representation format used extensively to make images available over the Internet. A GIF image can have a maximum of 256 colors or shades, which makes it useful for preserving the sharpness of bitonal or grayscale images. Although not as useful as other formats in preserving the richness of natural color scenes (the GIF compression scheme reduces the number of colors in a photograph from millions of colors to a maximum of 256), it is widely used to preserve color images for presentation over the Internet.
gopher – An Internet file sharing system that predates the World Wide Web. Gopher systems allowed users on different computer systems to share information over the Internet using a text-based indexing system.
hash coding – A search technique that eliminates the need to scan an entire database by providing access to key words based on where they appear in a database. Words and their locations may be represented, for example, by numeric values, which are then searchable. Hash coding provides quicker search times than linear searching.
hierarchical organization – A principal of oranization that effectively ranks information from broader to more narrow. Some indexing systems are hierarchical in nature, providing broad, general terms at the top of the hierarchy, with more specific, narrower terms appearing further down the chain of concepts.
hit – Successful retrieval of information from a database. For example, a search of an article database for recent articles on the 2000 Presidential Election might uncover thousands of "hits." In this example, a hit would be a reference in the database to an article about the election.
home page – The starting page for an individual or organization with a Web presence. Typically home pages are named index.html, quite appropriately. The home page typically introduces the Web site and provides links to other content in the site.
HTML (Hypertext Markup Language) – The standard formatting language used on the World Wide Web. Very similar in appearance to SGML, HTML's primary difference is that it supports hypertext links within a document, one of the hallmarks of Web page design. HTML uses tags enclosed in angle brackets (<>) to determine text characteristics, layout, and position of images and sound files.
HTTP (Hypertext Transfer Protocol) – Standard protocol that enables computer systems to exchange hypertext materials over the Internet.
hyperlink – A "clickable" word, phrase, or object in a Web page that "jumps" the viewer to further information or to other Web sites.
identifier – Information that describes an item in lay terms. Identifiers are used in many databases to provide subject-related access points other than descriptors.
IETF (Internet Engineering Task Force) – The primary protocol and standardization development body for the Internet, the IETF is an international task force that combines the expertise of network researchers and designers.
information retrieval – The process of gathering data from an online system. In the case of a library catalog, the act of locating relevant catalog records in the OPAC.
Internet – An "interconnected group of independently managed networks," the Internet is a worldwide telecommunications backbone that supports communication among computer systems around the world. The Internet had its roots in the Arpanet in 1969 but has since grown into a vast communications network utilized not only by government, but by educational institutions, private organizations, companies, and individuals.
Internet Explorer – Microsoft's answer to Netscape Navigator, a graphical Web browser that is incorporated into the Windows operating system.
interoperability – Functionality that allows users access across many systems using similar or same command language.
IP address – Typically used to refer to the numeric address of a computer connected to the Internet. IP addresses can be static (permanently assigned) or dynamic (assigned temporarily to enable computer to computer communications and consist of numeric strings punctuated by a period. For example, the IP address 139.62.208.133 might designate an individual workstation at a specific organization. The first three strings of numbers would normally represent a particular computer system within an organization, while the final three digits of the address would locate an individual machine attached to the network at that site.
Java – Platform-independent programming language created by Sun Microsystems. Similar in structure to C++, Java applets can be imbedded in HTML documents and enable Web users to run programs on their computers regardless of their operating system (Windows, Apple, UNIX, etc.). Both Netscape Navigator and Microsoft Internet Explorer browsers have built-in support for Java.
JavaScript – A Netscape-developed scripting language that can be used to execute commands in a Web browser. Javascript provides Web pages with more functionality than does HTML code alone.
JPEG (Joint Photographic Experts Group) – A standard image representation format used extensively to make images available over the Internet. A JPEG image preserves natural color more efficiently than the GIF format while still providing image compression sufficient to make it useful as a means of representing images in a computer system. The JPEG compression algorithm selectively discards color information based on the amount of compression desired. JPEGs support RGB, CMYK, and grayscale color modes.
keyword searching – Search capability that allows querying a database by any chosen words. Keyword searching can accommodate the use of Boolean operators, phrase and proximity searching, and allows for inexperienced searchers to find content within a database without knowing the structure or subject organization.
library
linear searching – Accessing a data file by starting at the beginning and going through the entire file looking for a string of data. Linear searching is inherently slower than other means of accessing a file, but it can accommodate searching for multiple strings of data.
LSI (Latent Semantic Indexing) – Technique for condensing vector space into fewer dimensions. LSI uses a matrix to facilitate location of words within documents. LSI also facilitates cross-language retrieval of documents.
MARC (Machine-Readable Cataloging) – Originally developed at the Library of Congress as a format for distributing catalog records on magnetic tape, MARC is a format for describing bibliographic items. In a MARC record, each line of the record begins with a coding indicating the line's content followed by the content to be displayed. For example, the 245 tag indicates the title of an item, while the 100 tag indicates the author. MARC records are the foundation for library OPACs (Online Public Access Catalogs) and facilitate describing, searching, and displaying information about bigliographic items.
markup languages – High-level computer languages that describe formatting of documents or other types of materials. HTML is one of the most well known markup languages.
metadata – Data about data.
MIME – Originally used to describe info sent via email using a two-part coding scheme that provides a generic and a specific description. For example, the MIME type for a GIF is image/gif. Image is the generic description, while gif is the specific description.
mirror – A digital copy of a collection of data
mirror site – A network computer system used to store copies of data collections
Mosaic – NCSA Mosaic was the first widely distributed and used Web browser. Marc Andreesen led development of Mosaic in 1993 and would later found Mosaic Communications Corporation which would become Netscape Communications in 1994.
Mozilla – Development name for the Netscape browser. Mozilla was to be the monster program that would kill Mosaic, thus the name Mozilla. Currently, the Mozilla project focuses on developing an open source browser through worldwide cooperative efforts. Current versions of Netscape Navigator are based on Mozilla.
MPEG (Motion Pictures Experts Group) – A Standard for compressing and storing moving images digitally. MPEG-1 is the most common standard and is useful for videoconferencing. MPEG-2 is more commonly used for storing motion pictures on disk.
natural language query – A database search query issued using normal spoken language syntax. One of the more frequently used Internet search services that uses natural language queries is Ask Jeeves.
Netscape – The company and the browser that really supercharged the Web revolution. Begun by Marc Andreesen, one of the original NCSA developers of Mosaic, Netscape once held over 80% of the browser market with its popular Netscape Navigator Web browser. Microsoft's entry into the browser market with its Internet Explorer came late, but with the considerable clout of the company, IE has reversed the market and now holds nearly 80% to Netscape's less than 20%.
network protocols – Procedures for organizing and exchanging data among computers. Data exchange over the Internet is managed with the TCP/IP suite, which includes Transmission Control Protocol and Internet Protocol.
OCR (Optical Character Recognition) – Conversion of printed text into digitally encoded text. A page of print can be scanned using a scanner and is then converted into a digital representation character by character. OCR converted texts typically require editing since character recognition accuracy can vary depending on the quality of the scanned text. Generally speaking, sans serif fonts are more accurately converted during the OCR process than are serif fonts. The darkness of the original type face can also affect OCR accuracy.
OPAC (online public access catalog) – Computerized version of a library's card catalog.
PCL (Printer Control Language) – Developed by Hewlett Packard, PCL is a language designed specifically for printing. PCL is not as portable across printer models as PostScript but it does support faster printing of images than the PostScript language.
PDF (Portable Document Format) – A display format developed by Adobe Systems based on the PostScript printing language which displays an exact image of an original document with all its layout and type characteristics intact. PDF files can be created using Adobe's Acrobat or PageMaker software and can be read using the free Acrobat Reader, which can be used as a plug-in for Netscape Communicator and Microsoft Internet Explorer to share PDF files over the Internet.
PDL (Page Description Language) – Formatting language used in specifying page layouts, including positioning of text and graphics.
PNG (Portable Network Graphics) – An open source Internet graphics compression format designed to replace the Compuserve GIF format. Developed during 1995, PNG was recommended by the W3C as a Web standard the following year. Like the GIF, a PNG supports lossless, cross-platform graphics that are easily viewed over the Internet. Development of the new format was spurred by the Unisys Corporation announcement that it would charge license fees for its patented LZW compression algorithm, which is the basis of the GIF file format. Not all browsers inherently support PNG at this point, but that should be only a temporary predicament.
PostScript – A complex programming language developed by Adobe Systems for generating graphic images that may include multiple fonts, colors, and bitmapped images. Although PostScript is usually thought of primarily in terms of printing, it is a full programming language having its own set of commands that facilitate page layout as well as printing. PostScript is widely accepted as a standard language for interfacing with a variety of printers.
PPI (Pixels Per Inch) – A measure of the dimensions of a graphic image. A pixel is a single small square of color information that can be displayed on a computer monitor or can be printed on a printer. The higher the pixels per inch the finer the printed image will be. Images stored solely for viewing on a computer monitor need not have a high number of pixels since the typical monitor displays at the resolution of 72 pixels per inch.
precision – The extent to which a search of a database produces results that exactly match a user's query. A search that retrieves 30 documents, 20 of which are considered relevant, has precision of 66.66%. Precision is used along with recall to measure success of an information retrieval system.
protocol – A procedural code. For example, network protocols describe the procedures used by linked computer systems for exchanging data over the network.
proxy – A go-between. In the world of libraries and databases, proxy services are used to link authorized users to research databases.
Public Key/Private Key – The most secure of two encryption methods used for exchanging information over the Internet. Each recipient of a message has both a public and a private encryption key. A sender uses the public key to encrypt a message. The recipient uses his or her private key unencrypt it.
PURL – Persistent URL
query – A request for information from a database issued in a specific format. A user looking for information on an author would issue an author query to a database to determine what data is available.
RealAudio/Video – Designed by RealNetworks, RealAudio and RealVideo are generally accepted Internet standards for enabling streaming audio and video over the Internet. Instead of waiting for a sound or video file to download, RealPlayer allows the file to start playing as soon as a user initiates reception. Over a fast Internet connection, RealAudio/Video appears to come across in real time.
recall – The extent to which a search of a database produces relevant documents. A search of a database that contains 40 relevant documents that produces a resulting set of 30 documents is said to have 75% precision. Recall is used along with precision to measure success of an information retrieval system.
render – In the computer world, to render is to take digital data and display it in viewable format on a computer screen. For example, Web browsers receive document and image data from a Web server and render the data in the form of a Web page or Web pages.
RGB (Red Green Blue) – A color model that uses three colors – red, green, and blue – to reproduce up to 16.7 million colors on a computer monitor. The RGB model is used by all computer monitors to reproduce images on the screen. In the RGB model, each pixel that composes an image is assigned an intensity value ranging from 0 (black) to 255 (white).
RSA Encryption – Considered the standard for encrypting data to be exchanged over the Internet, RSA Encryption is an encryption algorithm developed by the security company RSA. In 2000, RSA announced that it was releasing its encryption standard to the public domain, which meant that anyone wanting to build secure technology based on the RSA standard could do so without licensing the algorithm from RSA.
scanner – An optical device that can be interfaced with a computer to allow for importing a digital image of a physical object into a computer program. Scanners can be hand-held, flat-bed, or sheet-fed. On a flat-bed scanner, the object to be scanned is placed against a glass panel (usually page size) and is scanned by a scanning head consisting of a linear array of light-sensitive sensors. The image is illuminated by a high-intensity light and the reflected light is picked up by the sensors and transmitted as digital codes photo-editing or OCR software.
server – A computer that provides or serves access to client computers.
SGML (Standard Generalized Markup Language) – A syntax for marking up a document to have certain layout, text display, and image display. SGML imbeds tags describing certain aspects of a document in angle brackets (<>) and surrounds all text elements with tags to describe how the text will be displayed. For example, a line of bold text will be enclosed inside the tags . The resulting display will be to show the enclosed text in bold. SGML provides standards for laying out tables, displaying graphics, and for producing special characters like the pound sign, the ampersand, and the copyright symbol. Unlike PDF documents, SGML documents can be rearranged according to user specifications, producing a document display that may be larger than or smaller than the original. This provides considerable flexibility from one machine to another. The down side of this is that an author wanting control over the appearance of his or her document essentially loses control to the reader. Current SGML standards include the American Association of Publishers' Electronic Manuscript Standard, the Department of Defense Continuous Acquisition and Life-Cycle Support rules (CALS), and the Text Encoding Initiative standard (TEI).
shared cataloging – OCLC is prime example of shared cataloging at work. In the OCLC model, member libraries cooperatively contribute to the cataloging database so that no one library must carry the burden of cataloging every published book.
SMTP (Simple Mail Transfer Protocol) – An Internet standard for establishing and negotiating communications between sender and recipient of an email transaction.
snail mail – Traditional paper-based mail.
speech compression – Use of algorithms to sample voice as it is recorded and reduce the size of the resulting digital file. The most common standard for compressing speech, that can also be used to compress music, is the GSM algorithm used for digital cellular phones. GSM can compress speech by a factor of 5, producing files that occupy approximately 1600 bytes per second of recorded information.
speech recognition – Means for receiving and interpreting spoken words in a computer system. Using a microphone, a user is able to voice commands to a software application and execute program routines without use of the keyboard.
spider – In Internet search engine technology, the computer system that actually "crawls" the Web looking for information to bring back for indexing. Most search engines use the combination of a spider or crawler to search for live connections, an indexing and storage system for keeping the information and providing access to it, and a user interface system that allows querying of the database.
SSL (Secure Sockets Layer) – A protocol developed by Netscape for securely delivering private information over the Internet. SSL works using encryption technology.
stop word – Non-searchable words in a database. Some common stop words are short words like prepositions and words like and, or, and not that are used as logical connectors in the database search query language.
structural metadata – Information about document formats and structures.
style sheet – An overall guide that specifies document formatting features. Typical elements included in a style sheet might include paragraph formats, font faces, font sizes, and other page layout information.
subscription – Payment for continuing access to a print publication or to an online system.
tag – An element of a markup language that describes formatting or actions in a page or document layout scheme. For example, HTML uses page formatting tags (indicated by enclosure in brackets <>), font formatting tags, and data embedding tags to provide content in a browser readable format.
TCP/IP (Transmission Control Protocol/Internet Protocol) – Two basic protocols that enable communications over the Internet. IP links computers and networks using a standard protocol or language. TCP facilitates passing data over IP from one computer to another by routing packets of data to a specific IP address.
TCP/IP suite – Set of computer programs shared among computers linking via TCP/IP protocol, including terminal emulation (telnet), file transfer (FTP), electronic mail (Email, SMTP).
telnet – An Internet protocol that allows a PC to act as a "dumb terminal" and is a standard protocol for logging onto another computer system over the Internet and running programs from a remote terminal or workstation. Telnet was developed as part of the ARPAnet project.
TeX (LaTeX) – TeX is a macro processor that provides document creators control over typographical formatting. Not the easiest user interface to master TeX was used as the basis for creating LaTeX and Plain TeX, two typographical formatting tools with easier to master user interfaces.
TIFF (Tagged Independent File Format) – A format used to exchange files between computer applications and platforms (Windows, Mac, etc.). The TIFF format is shared by almost all photo-editing and paint programs and is supported by most scanner software as an option for saving scanned images. TIFFs are bitmapped images that support RGB and CMYK color schemes as well as grayscale.
truncation – Use of special characters to search variant word endings in a database. Some common truncation symbols include the asterisk and the question mark. In a database that uses the question mark as a truncation symbol, a searcher could any form of the word education by using the root of the word followed with the question mark (educat?).
Unicode – 16-bit text format used to represent characters used in most the world's languages on a computer screen. UTF-8 is an 8-bit encoding scheme used for the same purpose.
union catalog – In a multi-site system, the combined catalogs of multiple libraries, searchable as a single database, is called a union catalog.
URL (Uniform Resource Locator) – The digital "address" of a resource available over the Internet. URLs are prefaced with the name of the protocol used to access a resource. For example, a hypertext source is prefaced with the http:// protocol designator while a telnet system is prefaced with the protocol designator telnet://. The address may be a combination of letters and numbers but is translated into a numeric value by a domain name server, a computer system assigned to resolve addresses on the Internet.
URN (Uniform Resource Name) – A locator for a specific Internet resource that is independent of location. In contrast, a URL specifies a specific machine address for locating a resource. The URN is machine-independent.
user interface – The software configuration that allows a user to access features of a program. In a search engine, the user interface would include navigation buttons, an input box (or boxes) for entering data to be located, and various dialog boxes and help screens necessary to effective navigation of the software.
vector graphic – An image format that uses geometrical shapes likes lines and curves to define images. Vector graphics are frequently used to define type faces or high definition shapes where the smoothness of the image at all resolution settings is important. Unlike bitmap images, which can lose their definition and look chunky at increased resolutions, vector graphics retain their definition at any resolution. When displayed on a computer monitor, both bitmaps and vector graphics are displayed in pixels.
vector model – Information retrieval model introduced by Gerard Salton that represents documents using vectors based on word frequency within a document. The more times a word occurs within a document, the stronger the vector representing the document. A search of the database on a word would produce an ordered listing of documents whose vectors most closely match the search query.
virtual library
A collection of resources and services similar to those available in a physical library building but available to users over a computer system, typically the Internet.
Virtual Reality Modeling Language (VRML)
Originally released in 1994, VRML is a file format specification that provides for the creation of 3D environments viewable over the Web.
W3C (World Wide Web Consortium) – A cooperative of over 150 organization whose responsibility is to set standards for exchange of information over the Web. "The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for information, commerce, communication, and collective understanding."
WAIS (Wide Area Information Server) – A search and retrieval system used for accessing databases over the Internet. The full text of databases can be searched using WAIS systems.
watermark – In the electronic publishing world, an embedded digital signature that identifies the creator of the file.
WAV – Sound file format standard in the Windows operating system. WAV files typically use quite a bit of disk space and aren't the most efficient means for transmitting sound digitally.
Web crawler – The part of a search engine that scours the Web for pages that can be indexed. Also known as a spider.
Web page – A document that resides on the World Wide Web. Web pages may or may not have print equivalents, may exist only on the Web, and may incorporate graphics, sounds, motion pictures, and other multimedia elements to provide information.
Web server – In the client/server computing model, the file server on which Web pages reside and from which they are served to client workstations.
Web site – A collection of Web pages that are focused on a unified theme or on multiple themes. For example, Microsoft's Web site has information on the company, assistance for using Microsoft products, and an online store. A Web site can consist of a few pages (as is the case with many personal pages) or may be an extensive collection of hundreds of pages and other servable information.
World Wide Web (WWW) – The graphics rich, sound and video enabled, hyperlinked part of the Internet that many people confuse with being synonymous with the Internet. The communication protocol that supports the Web is called Hypertext Transfer Protocol (http). Anyone with access to the Internet and who is running a Web browser on his or her workstation can access materials on the Web and get the full benefit of multimedia content and hypertext links to other resources. The full Internet includes other types of systems including FTP sites, Gopher servers, and telnet systems, to name a few.
WYSIWYG (What You See Is What You Get) – A format for laying out documents where the image displayed on the screen is the same as what a user will ultimately print. Developed by Xerox, this standard for layout and print is used by most word processing programs at this point.
Xerox Parc – The Xerox Corporation's Palo Alta Research Center (Parc). The company has been essential in the development of many computer products, including graphical user interfaces, electronic document delivery systems, printing technologies, and search technologies.
XSL (eXtensible Style Language) – A system of document description, XSL is a specification that allows for the separation of style and content in HTML and XML pages.
XML (eXtensible Markup Language) – A document markup language based on SGML that allows for user defined tags that specify in what format content is delivered. XML has been touted as the replacement for HTML, but widespread support and acceptance have yet to be seen.
Z39.50 – Digital exchange protocol that allows a computer on one system to search collections of information on a remote system using field searching.

For additional Web terminology, visit the Webopedia at http://www.webopedia.com/.


   VL Home    Introductory    VL Projects    VL Organizations    Terminology