We believe that only the technical perfection of speech recognition technology may guarantee the success of speech interface implementation. MicroAsr's large vocabulary continuous speaker independent speech recognition technology allows for the creation of a real, natural man-machine interface. It's on-device technology. MicroAsr's software doesn't need an internet connection to add a speech recognition interface to your device.

MicroAsr's speech recognition offers a number of unique advantages:

  • High accuracy

    The accuracy is over 99% in real conditions.

  • Unrivaled speaker-independence

    A user does not need to adapt it to his / her speaking manner, dialect or accent.

  • Extensive vocabulary

    The vocabulary includes over 100,000 words and synonyms, which allows the system to understand spontaneously spoken commands.
    The real-time text-to-phonetic engine can add new words to the vocabulary.

  • Short latency

    Only 200 DMIPS CPU is sufficient for real-time speech recognition. Cortex-M, Cortex-A, MIPS, Tensillica and other cores are supported.

  • Noise robustness

    The system works steadily in a noise environment and inside a car.

  • Excellent compatibility

    The engine is suitable for most processors, operating systems, and embedded devices.

  • Small footprint

    The engine requires only 700 kb ROM and 256kb RAM

  • Multilanguage

    English and Russian are currently supported. Other languages can be supported on demand.

  • Ease of use
    MicroAsr's speech recognition engine has a simple API to interface with other software on the device.

MicroAsr offers the development of customized state-of-the-art speaker-independent speech interfaces and licenses its speech recognition engine working at the edge.

MicroAsr's speech interfaces can greatly contribute to the functionality, accessibility, and innovative appeal of many electronic products and software by making them fully interactive, easier to control, and therefore more productive and enjoyable.

Video (English language) Video (Russian language)


MICROASR'S TEXT-TO-SPEECH


The MicroAsr's phonemic synthesizer uses a database of small speech parts - phonemes - and can generate any phrase from a large dictionary of more than 100 000 words. There is no need to customize the dictionary for each application once a language-phoneme database has been created. Currently, TTS supports English, Spanish and Russian. As with MicroAsr TTS SDK, it is easy to apply TTS to an embedded application. Calling the function PlayTTS with the required text is enough to generate a spoken phrase.

  • Extensive vocabulary

    The vocabulary includes over 100,000 words and synonyms, which allows the system to speak any text..
    The real-time text-to-phonetic engine can add new words to the vocabulary.

  • Low-cost MCU

    100 DMIPS CPU is sufficient for real-time text-to-speech. FPU is not required. Cortex-M, Cortex-A, MIPS, Tensillica and other cores are supported.

  • Small footprint

    MicroAsr's text-to-speech requires only 2MB FLASH and 256 KB RAM.

  • Multilanguage

    English, Spanish and Russian languages are supported now. Other languages can be supported on demands.

  • Ease of use
    MicroAsr's Text-To-Speech has a simple API to interface with other software on device.


MICROASR'S ADVANCED COMPRESSION


MicroAsr's real-time compression algorithm can record over 1500 hours in 1 GB.

  • Language independent technology.

  • One minute takes about 10.25 KB in memory.
    In 1 Mb it is possible to record more than 1.5 hours of speech signal.

  • The compression/decompression algorithm in a real-time mode requires a CPU with a performance of 60 DMIPS and memory of 200 КB.

  • Only the decompression algorithm in a real-time mode requires a CPU with performance of 40 DMIPS and memory of 200 КB.

Source audio:
Unpacked audio:


MICROASR'S SPEECH SDK


Introduction in MicroAsr ™ Speech SDK.

MicroAsr ™ Speech SDK provides tools for developing on-device speech interface. The use of MicroAsr's Speech SDK enables users to operate the embedded device with the help of speech commands without use of ordinary means of input (keyboard, joystick and others). The main component of SDK is the MicroAsr ™ Speech Engine (SE) which traces the user's speech input and, when a speech command is pronounced, recognizes it and then sends the ID of the recognized command to the application.

In order to build the speech interface it is only necessary to define the speech commands to be recognized along with their ID. MicroAsr's Speech Engine will do everything else.

Small footprint

MicroAsr's on-device Speech SDK is compatible with a wide range of modern MCUs. Due to its highly optimized algorithms, MicroAsr's SDK supports IP cores with a performance of 200 DMIPS. The requirements for memory are ROM 700 kb, RAM 256 kb. MicroAsr's SDK is available for most modern IDE and compilers (Keil mVision, NXP LPCXpresso, Freescale CodeWarrior, Atmel Studio, IAR Embedded Workbench, CooCox CoIDE and others).

List of supported MCUs: STM32: STM32F4, STM32F7; Microchip (Atmel) SAM S4, SAM S7, V7; Microchip Technology PIC32MZ; NXP (Freescale) Kinetis KV5x, i.MX RT; Analog Devices CM400; Cypress PSoC 6200; Infineon XMC4000; Texas Instruments MSP430; Toshiba TX04; Renesas Synergy S5; Expressif ESP32 Tensillica LX6

and other MCUs with a computing performance from 200 DMIPS.

Video (English language) Video (Russian language)

Principle of operation of MicroAsr ™ Speech Engine (SE)

The operation of SE may be divided in 2 stages:

1. The application defines the operating mode of SE and if it is necessary sends to SE the list of speech commands.

2. When the user pronounces a phrase (command), SE defines the most probable phrase from the list of received speech commands and sends its ID to the application.

Speech Engine: Commands List Creation

It is realized by the call of the AddPhrase function for each speech command.

void AddPhrase (char * Text, UINT32_t command_Id),

where:

Text is a speech command in orthographic form;

Command_Id is the integer identifier of the speech command that will be returned by SE if the speech command is pronounced.

Commands Definition Sample

AddPhrase ("Open Window", ID_OPEN_WINDOW);

AddPhrase ("Close Window", ID_CLOSE_WINDOW);

In this example, two speech commands (" Open Window " and " Close Window ") are passed to SE: with the identifiers ID_OPEN_WINDOW and ID_CLOSE_WINDOW respectively.

MicroAsr's Speech Engine: Summary

In order to implement the MicroAsr voice interface one has to take three steps:

1.Initialize the MicroAsr Speech Engine.

2.Define a voice commands list.

3.Define the application reaction to a voice command list.


APPLICATIONS OF MICROASR'S SPEECH INTERFACES


Consumer electronics (entertainment, audio systems, set-top boxes, interactive TV, TV / Internet terminals, etc.)

Home appliances

Automated home control (for HVAC, lighting, alarm, etc.)

Automotive (phone dialing, car navigation, telematics, controlling onboard devices)

Vending machines

Elevators

Test and measurement equipment

Software (Web browsers, Web portals; e- / voice mail programs; IP telephony; graphic software, CAD; electronic games, entertainment; etc.)

Information desks / kiosks

Other instances where substituting for a manual input, such as buttons, switches, keys, mouse, or touchscreen, is desirable.