EssaysForStudent.com - Free Essays, Term Papers & Book Notes
Search

Voice Recognition

By:   •  Research Paper  •  2,711 Words  •  May 3, 2010  •  1,245 Views

Page 1 of 11

Voice Recognition

A Speech Recognition Project

Abstract

Voice Recognition is a facinating field spanning several areas of computer science and mathematics. Reliable speech recognition is a hard problem, requiring a combination of many techniques, however modern methods have been able to achieve an impressive degree of accuracy. This project attempts to examine those techniques, and to apply them to build a simple voice recognition system. The project was started with three goals in mind. First, to be able distinguish 'yes' from 'no'. Second, to be able to recognize a vocabulary of 20 words, spoken individually. And third, to be able to recognize combinations of two or more words from this vocabulary spoken in close succession. The project is implemented in Matlab and was successful in achieving the first goal. It has been able to differentiate between a spoken 'yes' and a spoken 'no' with 100% accuracy among 24 samples taken from 8 different people. The method used is a simple one, involving a simple count of the frequency of zero crossings, but it is quite applicable to the voice recognition problem in general.

The Basic Steps

The process of voice recognition is typically divided into several well defined steps. Different systems vary on the nature of theses steps, as well as how each step is implemented, but the most successful systems follow a similar methodology.

Divide the sound wave into evenly spaced blocks

Process each block for important characteristics, such as strength across various frequency ranges, number of zero crossings, and total energy.

Using this charateristic vector, attempt to associate each block with a phone, which is the most basic unit of speech, producing a string of phones.

Find the word whose model is the most likely match to the string of phones which was produced.

Step 2 typically involves performing a spectrum analysis of the block. This can be done with a Fast Fourier Transform (FFT), or with a bank of frequency filters, but the most successful technique to date has been that of Linear Precidive Coding. Additional important features include analyzing the total energy, the change in the features over time, and the number of zero crossings. Step 3 is often done via a decision tree. Each phone often has very prominent characterstics which narrow the field of consideration. Additional characteristics then separate similar sounding phones. The final decisions are often mistaken, and these mistakes must be accounted for later. Step 4 has been implemented with a high degree of success using Hidden Markov Models (HMM's). A HMM is constructed for each word in the vocabulary, and then the string of phones is compared against each HMM, to determine which model is the most likely match.

This project implements steps 1 and 2. In step 2 the program extracts the zero crossing count. The maximum count over all blocks is then taken, which is sufficient to detect the precense or absence of an unvoiced consonant. Because 'yes' contains the unvoiced consonant 's' and 'no' does not contain an unvoiced consonant, this is able to distinguish between 'yes' and 'no' with a high degree of accuracy. See zerocross.m for the algorithm used to extract the zero crossing count in a given block.

A List of Phones

Phone Example

Vowels

IY beat

IH bit

EY bait

EH bet

AE bat

AA Bob

AH but

AO bought

OW boat

UH book

AX about

IX roses

ER bird

AXR butter

AW down

AY buy

OY boy

Consonants

Y you

W

Download as (for upgraded members)  txt (15 Kb)   pdf (199.8 Kb)   docx (16.9 Kb)  
Continue for 10 more pages »