Go homeTerence W. Cavanaugh Ph. D.
About me projects presentations Publications teaching Related sites

Text-To-Speech: Applications Across Abilities and Grade Levels

Terence W. Cavanaugh Ph.D.
University of North Florida, College of Education and Human Services
Florida, USA
Email: tcavanau@unf.edu 
Web: www.unf.edu/~tcavanau 
6/2002

Talking With a Computer

Two important areas that have developed in communication with computers are voice recognition (Speech-To-Text) and speech synthesis (Text-To-Speech). Voice recognition is a category of software that recognizes human speech, and then through the use of software, converts that speech into text in a word processing program, or follows commands on the computer screen. Using such a system, a person uses his/her voice to navigate through menus, start programs, and "write" by dictating into applications. Voice recognition programs use mathematical algorithms to recognize speech patterns that have been trained into them. A Text-To-Speech engine (also known as a screen reader or speech synthesizer) is any one of many software/hardware applications that convert computer text to artificial speech that is spoken through speech synthesizers or the computer's speakers using phonic rules. Text-To-Speech programs go so far as to convert textual information into mp3 files that are then downloaded into players for "remote" listening.

While computer systems are nowhere near the HAL computer from the film 2001, these systems can often seem to be like Hal by the ways that they act, respond, and take voice commands. With current technology capabilities the use of speech-to-text can provide a number of educational applications and opportunities.

Text-To-Speech

As reading and writing are basic components most educational activities, providing alternative formats and supports becomes necessary to reach all students. The theory of multiple intelligences suggests that there are a number of distinct forms of intelligence that each individual possesses in varying degrees, with the implication of the theory being that learning/teaching should focus on the particular intelligences of each person (Gardner, 1983). Text-To-Speech offers users an additional modality for receiving the information. This modality applies to learning styles and individual differences in abilities (e.g., Gardner, Guilford, Sternberg). As a component of a writing activity, Text-To-Speech applies to Information Processing Theory (G. Miller) in that students can use the software as a tool for self-evaluation of work. According to CAST in order "to reach learners with disparate backgrounds, interests, styles, abilities, disabilities, and levels of expertise" the educational materials should be flexible and adaptable for all learning styles (1998).

A Text-To-Speech system is one that reads text aloud through the computer's sound card or other speech synthesis device. Any text that is given the command to be read is analyzed by the software and then restructured to a phonetic system and then read aloud. The computer looks at each word then calculates its pronunciation (certain systems do this better than others) and then says the word in its context. These systems are limited to a standard phonological structure so foreign words, special personal pronunciations, or acronyms are often misspoken. Some Text-To-Speech systems will allow the user to adjust the pronunciation of words by changing their phonological "spelling" to the desired structure, requiring some knowledge of phonological structure by the user. Text-To-Speech programs now exist in multiple languages, using the phonological structure of the target language. Additionally most of the more extensive (more expensive) systems will also read the user options available, usually as the mouse passes over the options, allowing for access to menus, and other computer control features.

One adaptation of Text-To-Speech software that is now available is the talking word processor. With the talking word processor the user can select what and when material is read. User can select from options to have the system read each letter typed, each word completed, and/or each sentence completed. Such a system allows for students to process the information in another format as they create it.

The use of a screen reader can have many advantages over plain printed text. The use of a Text-To-Speech program or screen reader can also enable students with poor or no vision or reading disabilities or difficulties to still have access to the information that they need. 

Research on students with reading disabilities found that comprehension improved when Text-To-Speech was combined with reading (Leong, 1995; Montali & Lewandowsi, 1996; Raskind & Shaw, 2000). Research findings suggest that student control of Text-To-Speech speed while they read along increased performance. Findings state that some students benefit from a slower Text-To-Speech reading speed, while others comprehended better at faster rates (Shany & Biemiller, 1995; Skinner et al., 1995). A component of many Text-To-Speech programs is synchronized highlighting of the text being read. The speech with highlighting can aid the student in recognizing the structure of written language. Students can also highlight words that they find difficult to decode, and have the program say the word aloud. This spoken word support has been found to improve reading comprehension for students with reading difficulties (Wise & Olson, 1994). 

Consider the advantages of using Text-To-Speech software with all students. According to Gardner's theory of multiple intelligences many people need to have information brought to them in different formats (Interpersonal, Intrapersonal, Linguistic, Logical Mathematical, Spatial, Bodily Kinesthetic, & Musical). Using a screen reader will allow the information presented in the text to be brought to the user in multiple learning modalities. First the information is brought in as standard text (linguistic), it enters the system by sound (interpersonal, musical), and because many of the screen readers highlight text as it's read it also brings a sense of movement into reading (kinesthetic, spatial). By allowing the text to be read with the screen reader the student has that many more chances of "learning" the information. 

The application of a Speech-To-Text program has the capability to enhance a wide variety of educational situations. At any level, these programs could be used to assist in student writing, boost reading capabilities, and provide additional support for students who have special needs. Research indicates that many students can improve with the support of such a system. This type of program then needs to become one of tools in a teacher's digital education toolbox.

Activities:

  • Organize a play with voices
  • Writing support
  • Editing of student work (act as first reader in edit process)
  • Revise writing through listening
  • Reading support
  • Assist in the reading of articles, short stories
  • Use to build/test reading speed
  • Read books from web-based libraries
  • Assist in pronunciation
  • Singing
  • Phonetic writing
  • Foreign language discussion

ReadPlease 2002 - freeware - http://www.readplease.com/  
Can be used as a simple word processor that reads what is typed and prints. No method to change pronunciation style.
Plain text (.txt), Rich Text Format (.rtf), Clipboard text

HELP Read (v.92) - freeware -- http://www.pixi.com/~reader1/allbrowser/  
Read only, can't be written in or print. Can change pronunciation (phonic) of words Plain text (.txt), Web pages (.htm), Clipboard text

DecTalk-demo - http://www.fonix.com/products/dectalk/demos.php  
Can be used as a simple word processor that reads what is typed
Plain text (.txt), Clipboard text. Has foreign language readers.

TextHELP! - demo - http://www.texthelp.com/  
Works with other programs to read as material is typed or copy and past.

MS Reader - freeware - http://www.microsoft.com/reader 
Used to read texts in the .lit format.

A common test of the pronunciation capabilities of a screen reader is to make it read the sentences: 
"I will read the book later. I have read that book."