We believe that only the technical perfection of speech recognition technology may guarantee the success of speech interface implementation. MicroAsr's large vocabulary continuous speaker independent speech recognition technology allows for the creation of a real, natural man-machine interface. It's on-device technology. MicroAsr's software doesn't need an internet connection to add a speech recognition interface to your device.
MicroAsr's speech recognition offers a number of unique advantages:
-
High accuracy
The accuracy is over 99% in real conditions.
-
Unrivaled speaker-independence
A user does not need to adapt it to his / her speaking manner, dialect or accent.
-
Extensive vocabulary
The vocabulary includes over 100,000 words and synonyms, which allows the system to understand spontaneously spoken commands.
The real-time text-to-phonetic engine can add new words to the vocabulary. -
Short latency
Only 200 DMIPS CPU is sufficient for real-time speech recognition. Cortex-M, Cortex-A, MIPS, Tensillica and other cores are supported.
-
Noise robustness
The system works steadily in a noise environment and inside a car.
-
Excellent compatibility
The engine is suitable for most processors, operating systems, and embedded devices.
-
Small footprint
The engine requires only 700 kb ROM and 256kb RAM
-
Multilanguage
English and Russian are currently supported. Other languages can be supported on demand.
-
Ease of use
MicroAsr's speech recognition engine has a simple API to interface with other software on the device.
MicroAsr offers the development of customized state-of-the-art speaker-independent speech interfaces and licenses its speech recognition engine working at the edge.
MicroAsr's speech interfaces can greatly contribute to the functionality, accessibility, and innovative appeal of many electronic products and software by making them fully interactive, easier to control, and therefore more productive and enjoyable.
Video (English language) Video (Russian language)
MICROASR'S TEXT-TO-SPEECH
The MicroAsr's phonemic synthesizer uses a database of small speech parts - phonemes - and can generate any phrase from a large dictionary of more than 100 000 words. There is no need to customize the dictionary for each application once a language-phoneme database has been created. Currently, TTS supports English, Spanish and Russian. As with MicroAsr TTS SDK, it is easy to apply TTS to an embedded application. Calling the function PlayTTS with the required text is enough to generate a spoken phrase.
-
Extensive vocabulary
The vocabulary includes over 100,000 words and synonyms, which allows the system to speak any text..
The real-time text-to-phonetic engine can add new words to the vocabulary. -
Low-cost MCU
100 DMIPS CPU is sufficient for real-time text-to-speech. FPU is not required. Cortex-M, Cortex-A, MIPS, Tensillica and other cores are supported.
-
Small footprint
MicroAsr's text-to-speech requires only 2MB FLASH and 256 KB RAM.
-
Multilanguage
English, Spanish and Russian languages are supported now. Other languages can be supported on demands.
-
Ease of use
MicroAsr's Text-To-Speech has a simple API to interface with other software on device.
MICROASR'S ADVANCED COMPRESSION
MicroAsr's real-time compression algorithm can record over 1500 hours in 1 GB.
-
Language independent technology.
-
One minute takes about 10.25 KB in memory.
In 1 Mb it is possible to record more than 1.5 hours of speech signal. -
The compression/decompression algorithm in a real-time mode requires a CPU with a performance of 60 DMIPS and memory of 200 КB.
-
Only the decompression algorithm in a real-time mode requires a CPU with performance of 40 DMIPS and memory of 200 КB.
Source audio: | ||
Unpacked audio: |
MICROASR'S SPEECH SDK
Introduction in MicroAsr ™ Speech SDK.
MicroAsr ™ Speech SDK provides tools for developing on-device speech interface. The use of MicroAsr's Speech SDK enables users to operate the embedded device with the help of speech commands without use of ordinary means of input (keyboard, joystick and others). The main component of SDK is the MicroAsr ™ Speech Engine (SE) which traces the user's speech input and, when a speech command is pronounced, recognizes it and then sends the ID of the recognized command to the application.
In order to build the speech interface it is only necessary to define the speech commands to be recognized along with their ID. MicroAsr's Speech Engine will do everything else.
Small footprint
MicroAsr's on-device Speech SDK is compatible with a wide range of modern MCUs. Due to its highly optimized algorithms, MicroAsr's SDK supports IP cores with a performance of 200 DMIPS. The requirements for memory are ROM 700 kb, RAM 256 kb. MicroAsr's SDK is available for most modern IDE and compilers (Keil mVision, NXP LPCXpresso, Freescale CodeWarrior, Atmel Studio, IAR Embedded Workbench, CooCox CoIDE and others).
List of supported MCUs: STM32: STM32F4, STM32F7; Microchip (Atmel) SAM S4, SAM S7, V7; Microchip Technology PIC32MZ; NXP (Freescale) Kinetis KV5x, i.MX RT; Analog Devices CM400; Cypress PSoC 6200; Infineon XMC4000; Texas Instruments MSP430; Toshiba TX04; Renesas Synergy S5; Expressif ESP32 Tensillica LX6
and other MCUs with a computing performance from 200 DMIPS.
Video (English language) Video (Russian language)
Principle of operation of MicroAsr ™ Speech Engine (SE)
The operation of SE may be divided in 2 stages:
1. The application defines the operating mode of SE and if it is necessary sends to SE the list of speech commands.
2. When the user pronounces a phrase (command), SE defines the most probable phrase from the list of received speech commands and sends its ID to the application.
Speech Engine: Commands List Creation
It is realized by the call of the AddPhrase function for each speech command.
void AddPhrase (char * Text, UINT32_t command_Id),
where:
Text is a speech command in orthographic form;
Command_Id is the integer identifier of the speech command that will be returned by SE if the speech command is pronounced.
Commands Definition Sample
AddPhrase ("Open Window", ID_OPEN_WINDOW);
AddPhrase ("Close Window", ID_CLOSE_WINDOW);
In this example, two speech commands (" Open Window " and " Close Window ") are passed to SE: with the identifiers ID_OPEN_WINDOW and ID_CLOSE_WINDOW respectively.
MicroAsr's Speech Engine: Summary
In order to implement the MicroAsr voice interface one has to take three steps:
1.Initialize the MicroAsr Speech Engine.
2.Define a voice commands list.
3.Define the application reaction to a voice command list.
APPLICATIONS OF MICROASR'S SPEECH INTERFACES
Consumer electronics (entertainment, audio systems, set-top boxes, interactive TV, TV / Internet terminals, etc.)
Home appliances
Automated home control (for HVAC, lighting, alarm, etc.)
Automotive (phone dialing, car navigation, telematics, controlling onboard devices)
Vending machines
Elevators
Test and measurement equipment
Software (Web browsers, Web portals; e- / voice mail programs; IP telephony; graphic software, CAD; electronic games, entertainment; etc.)
Information desks / kiosks
Other instances where substituting for a manual input, such as buttons, switches, keys, mouse, or touchscreen, is desirable.