Chapter 4.0 Principles of Operation

20 0 0
                                    

4.1 Schematic Diagrams and Program Explanations

This section will begin by explaining the basic concept of voice recognition. Stephen Cook, writer of "Introduction" and a source that Rahul himself recommended, explains that "Speech Recognition is a process by which a computer (or other type of machine" identifies spoken words. There are several aspects when it comes to understanding voice recognition. For instance, there is Utterance, which is a vocalization of a word, sentences, or paragraphs that symbolize a singular meaning to the computer. When a system is speaker dependent, that means that the system is designed to recognize to a specific speaker.

In terms in how Rahul's program operates, the first step during code execution is acquiring the speech signal, which is collected using the Acquire Sound Express VI. It runs for three seconds at a sampling rate of 11025 Hz. There is an array of indicators at the top of the front panel that notifies that operator that sound acquisition is in progress.

Then there is the Pre-Processing VI which consist of the following steps: Pre-Emphasis, Division of 20 Millisecond Frames, Windowing, Noise Threshold Calculation, and Utterance Detection.

In the Pre-Emphasis step, the objective it to compensate for the high frequency that is suppressed during the human sound production mechanism. The speech signal is passed through a high pass filter that increases the magnitude of some higher frequencies with respect to the magnitude of other frequencies.

In the Framing step, the input speech signal is divided into small frames of 20 ms length with 50% overlap with the adjoining frames to create continuity.

In the Windowing step, each frame is multiplied with the hamming window in time domain which assists in decreasing the discontinuity at the beginning and end of each frame.

In the Noise Threshold Detection step, the energy of each frame of the input speech signal is calculated and stored into an array for detecting the starting of the utterance from the three-second-long input speech signal. The energy is arranged in ascending order and the average of the first fifteen elements delivers energy of the noise. The Peak Detector VI is used to identify the index of the start and end of the utterance.

4.2 Howto Operate the Project

Since the program is speaker dependent, merely executing the code was not enough. There were multiple steps involved to reconfigure the program in a way that would recognize another person's voice. Since the data available in the Dictionary VI was created by Rahul himself, the system would recognize his voice, but no one else's. Fortunately, Rahul provided instructions on how to store one's words in the Dictionary VI.

 Fortunately, Rahul provided instructions on how to store one's words in the Dictionary VI

Oops! This image does not follow our content guidelines. To continue publishing, please remove it or upload a different image.

Figure 10: Main VI Front Panel

The first step was to execute the main VI once without looping over. To do this, a true statement was attached to the while loop in the block diagram. During the code execution, the desired word is spoken. If done correctly, then values in the Match Input should appear in the front panel of the Dictionary VI. From there, the data is copied and pasted into one of the matrices depending on the command. For instance, if the word spoken was "Left", then the data will be pasted into the matrices labelled "Left". 

Figure 11: Modified Case Structure

Oops! This image does not follow our content guidelines. To continue publishing, please remove it or upload a different image.

Figure 11: Modified Case Structure

To ensure that the same data will appear in thelibrary of matrices, then the new data was set to default by right clicking that matrix, selecting Data Operations, and then selecting "Make CurrentValue Default".

When the above steps were accomplished properly,then the indicators will activate based on the spoken command.

Modifications were required to send digital outputs to the LEDs. In the case structure of the main block diagram, an array was attached to the wire of the Boolean indicator for each command followed by an output express VI.

4.3 How to Maintain the Project

The only way to "maintain" the project is to use the same speaker, vocalizing with as much consistency as possible as the program is speaker dependent. Otherwise, the words need to be rerecorded if a different speaker wishes to use the program.

LabVIEW Voice Recognition Project [Senior Project Lab Report]Where stories live. Discover now