Questions tagged [speech-recognition]

0

votes
0

answer
2

Views

How to read contents of an SPF?

Microsoft publishes the Speech Recognition Profile Manager Tool for importing and exporting Windows Speech Recognition profiles as well as the contents of the Speech Dictionary. During export the tool outputs an SPF (speech profile file?) that contains the relevant data, and the tool can later read...
Exergist
1

votes
1

answer
2.7k

Views

How to plot pyaudio input with matplotlib?

How can I plot on matplotlib input signal from microphone? I have tried to plot with plt.plot(frames) but frames is for some reason a string? a) Why is frames variable a string list? b) Why is data variable string list? c) Should they represent energy/amplitude of single sample and be integers? d) W...
Dusan J.
1

votes
1

answer
849

Views

Voice Activity Detection (VAD/SAR) with LIUM

I wrote a shell script to train several GMMs for some kinds of voice activity and silence. I used LIUM speaker diarization toolkit therefore. I want to use this to do voice activity detection. The following script extracts MFCC features from an wav audio file by using Sphinx4, trains GMMs on these a...
Gerhard Hagerer
1

votes
1

answer
309

Views

What should be the output after extracting features from an audio signal using DWT (Discrete Wavelet Transform) in MATLAB?

I am working on a speech recognition system (I'm following a research paper), after denoising the signals I want to extract features from audio signals that are in the form of arrays in MATLAB. Please correct me if I'm wrong but I think that the size of features array (after performing decompositio...
Mughees Ismail
1

votes
2

answer
481

Views

How to make sound signal length the same in MATLAB?

I found this speech recognition code that I downloaded from a blog. It works fine, it asks to record sounds to create a dataset and then you have to call a function to train the system using neural networks. I want to use this code to train using my dataset of 20 words that I want to recognise. Prob...
Mughees Ismail
1

votes
1

answer
1.5k

Views

web speech api - speech synthesis .lang property not working

im trying to use web speech api to transcript a word in Portuguese, i set the property to 'pt-BR' ( unfortunately Portuguese - european is not supported) but, always replies in english. Can someone help? Thanks code: var synth = window.speechSynthesis; function falatarea(){ var utteranceY = new Spee...
Bruno
1

votes
2

answer
1.5k

Views

How to use sphinx dictionary as a grammar file

I am developing Java application for speech to text conversion. have used sphinx library and demo helloworld works fine. I have edited grammar file and appended more grammar in it and it works fine. Now what I want is that it should accept all the input words which exists in real world dictionary so...
NIket
1

votes
1

answer
507

Views

Speech Recognition feedback loop

I am working on a speech recognition system to talk with my computer. Now I have my computer audio output set to a surround sound system. This has caused problems for the recognition system. For example when I say 'test' to see if its online, the system responds with 'test complete'. The microphone...
TS777
1

votes
1

answer
732

Views

Using Microsoft Project Oxford APIs in Java

I already looked it up, and i found that I can load the API through a .dll file. But I keep thinking i'm overcomplicating everything... Is there an easier way to do this? Thanks in advance.
0gener
1

votes
1

answer
568

Views

Running speech_recognition on tkinter makes it freeze

I'm writing a program were you can talk to it and it responds like siri. I'm using Googles speech recognition and espeak to listen and talk. the conversation is then printed to a text box. When ever the program is asked to listen using speech recognition it freezes and clicking on the GUI again the...
joe woodger
1

votes
1

answer
495

Views

Voice Command API for Android

I just got a new smart device that runs a custom Android build without Google APIs, the PlayStore, etc. My app that I'm developing for this device should feature voice commands as an input method. Since I cannot use RecognizerIntent.ACTION_RECOGNIZE_SPEECH, I am looking for an alternative SDK. This...
1

votes
1

answer
547

Views

Audio Encoding conversion problems with PCM 32-bit yo PCM 16-bit

I am using C# in Universal Windows App to write a Watson Speech-to-text service. For now instead of using the Watson service, I write to the file and then read it in the Audacity to confirm it is in the right format since Watson service wasn't returning correct responses to me, and the following exp...
Dima Rudeshko
1

votes
1

answer
1.9k

Views

How Google Speech to Text works?

I would like to know, How google converts speech to text in their Speech Recognition API. Have they stored almost all sounds and match them at particular frequency level or do they have some different audio encoder and decoder algorithm which analyses the voice for different sound pattern like 'A',...
John Cargo
1

votes
1

answer
960

Views

Segment Timestamps in pocketsphinx

I am trying to extract the start and end timestamps of each segment using pocketsphinx. The code below works for extracting the word token. How can I access the timestamps? I've tried looking at the documentation here http://cmusphinx.sourceforge.net/doc/pocketsphinx/index.html but could not find th...
Adam_G
1

votes
1

answer
331

Views

How to implement activation phrase like “Hey Cortana” in SpeechRecognizer?

In the SpeechAndTTS samples in Universal Windows demo apps (link), even the continuous dictation examples requires the user to click on a button to start the recognizer. So my question is how can we implement an always listening SpeechRecognizer? Activated when hearing something like 'Hey Cortana' o...
Blaise
1

votes
1

answer
1.1k

Views

MFCC mean normalisation

Related to: Are MFCC features required for speech recognition Can the mean normalisation be reduced to simple mean subtraction of all the (n,13) MFCCs and be used to train the data? np.subtract(mfcc_feat,np.mean(mfcc_feat))
Ugur
1

votes
1

answer
707

Views

SpeechRecognizer stopListening() not working

I currently have a speech recognizer listening to voice, it is activated via a button which toggles startListening() and stopListening() respectively. My problem is the fact that stopListening() does not actually stop the speechRecognizer after the first time. When using the button to activate and...
onemandan
1

votes
1

answer
701

Views

Android - Speech Recognition Oflline

I am trying to develop an application with some commands & user inputs. Now Google provided extra parameter EXTRA_PREFER_OFFLINE for API 23+ to use speech recognition always in offline mode. I have check several answers and there is no specification how to use EXTRA_PREFER_OFFLINE constant below API...
Gaurav Vachhani
1

votes
3

answer
675

Views

Is there a difference between Microsoft Speech Platform Runtime 11 and Cortana's speech recognition?

I'm currently doing a research project on speech recognition systems, and I'd like to know if Microsoft Speech Runtime 11 and Cortana are different. Cortana obviously is an assistant, but is the recognition of the voice the same for both systems? I know cortana doesn't work on W7 and Speech Runtime...
peter
1

votes
1

answer
2.6k

Views

Installing Python SpeechRecognition package [duplicate]

This question already has an answer here: ERROR:'keytool' is not recognized as an internal or external command, operable program or batch file 16 answers 'pip' is not recognized as an internal or external command 24 answers Can someone please advise me how to download and install speech_recognitio...
Blue Berry
1

votes
2

answer
2.4k

Views

Ionic 2 - Speech Recognition [closed]

is there a way to use speech recognition in an Iionic 2 Project? All I came across are possibilities for Ionic 1 like in this post Speech recognition using ionic framework . Ionic 2 already provides a native API for Text to Speech http://ionicframework.com/docs/v2/native/texttospeech/ but I would ne...
Kniggos
1

votes
2

answer
721

Views

Detect audio from the user and converte to text to command AI bots in Unity

I am making a game where I want to command the AI using word i speak. Say for example I can say go and AI bot goes to certain distance. Question is I am finding asset and no provider is giving me grantee that it is possible ? What are the difficulties for doing it? I am programmer so if some one su...
mayur bhagat
1

votes
3

answer
85

Views

How to concatenate the text from ctl file vertically to horizontally and then save in a new ctl file using python?

I have a mlt.ctl file in which the text is arranged like this: znrmi_001/znrmi_001_001 znrmi_001/znrmi_001_002 znrmi_001/znrmi_001_003 zntoy_001/zntoy_001_001 zntoy_001/zntoy_001_002 zntoy_001/zntoy_001_003 zntoy_001/zntoy_001_004 ....................... zntoy_001/zntoy_001_160 .......................
Andy
1

votes
1

answer
146

Views

Unable to extract delta and delta delta power spectrum computation

I am currently trying to extract the delta + delta-delta using add-deltas binary file provided by kaldi. But for some reason i am not able to extract it. I usually extract power spectrum using the make_spectrum.sh script. I modified it a bit to also include deltas, but the output doesn't to be an...
Loser
1

votes
1

answer
985

Views

Google speech recognition API not listening

I was trying the below speech recognition code using Google Speech API. #!/usr/bin/env python3 # Requires PyAudio and PySpeech. import speech_recognition as sr # Record Audio r = sr.Recognizer() with sr.Microphone() as source: print('Say something!') audio = r.listen(source) # Speech recognition usi...
jophab
1

votes
1

answer
834

Views

Stop recognition in Swift3 without stop word

I'm attempting to simplify usage of the Speech framework in a textview. I can easily start the speech recognition process with code based on entering the textview, startup code or other actions. However, I also want to END the speech recognition without user touches. I have not been able to find any...
user2698617
1

votes
1

answer
403

Views

How to apply MFCC Coefficients to DTW

I am trying to implement a Speech Recognition module using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW). I divide the signal(x(n)) into frames with of 25ms with overlap of 10ms and find the MFCC parameters for each frame. My main doubt is how do i perform DTW in this sce...
Debayan Ghosh
1

votes
1

answer
2.4k

Views

Speech recognition error in Swift 3 & iOS 10

I'm using an iPhone 6s plus, here is the code for the speech recognition viewcontroller: import Speech import UIKit protocol SpeechRecognitionDelegate: class { func speechRecognitionComplete(query: String?) func speechRecognitionCancelled() } class SpeechRecognitionViewController: UIViewController,...
ielyamani
1

votes
1

answer
303

Views

Is there a way to return entire dictionary entry (word + phoneme) in pocketsphinx python?

Here is my code: #!/usr /bin/env python import os import sphinxbase as sb import pocketsphinx as ps MODELDIR = 'deps/pocketsphinx/model' DATADIR = 'deps/pocketsphinx/test/data' # Create a decoder with certain model config = ps.Decoder.default_config() config.set_string('-hmm', os.path.join(MODELDIR,...
P.V.
1

votes
1

answer
1.1k

Views

Spectrograms generated using Librosa don't look consistent with Kaldi?

I generated spectrogram of a 'seven' utterance using the 'egs/tidigits' code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with using Librosa as an alternative to Kaldi. I set up my...
kashkar
1

votes
1

answer
392

Views

How to get the timestamp of when a word was said using Sphinx

I am currently trying to get the timestamp of a word which has been detected using CMU Sphinx. while ((result = recognizer.getResult()) != null) { for(WordResult w : result.getWords()){ if(w.getWord() != Word.UNKNOWN){ System.out.println(w.getTimeFrame().getStart()); System.out.println(w.getWord() +...
Elijah
1

votes
1

answer
263

Views

HTK - What do MFCCs of an HMM model and Input WAV File represent?

While creating MFCCs following Voxforge's tutorial for a Speech to Text System using HTK (Hidden Markov Model Tool Kit), we are required to define a prototype model for our phones. I am trying to wrap my head around this this file. ~o 25 ~h 'proto' 5 2 25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0...
Ajay H
1

votes
2

answer
408

Views

Cordova speech recognition plugin “Invalid action”

I'm trying to use this plugin: https://github.com/pbakondy/cordova-plugin-speechrecognition All set-up ok, but when I call isRecognitionAvailable(), I'm getting a promise rejection with the error msg: 'Invalid action'. I've checked all the permissions and minSDKVersion are correct in the generated...
Dave
1

votes
1

answer
1.9k

Views

Continuous speech recognition in android app without google popup

I am working on a project on 'Home Automation' system with Android app and Microcontroller , connecting them through Bluetooth module. I have incorporated 'speech to text' for voice commands. Although , it works well with the built-in google speech recognition api. All I need is a continuous speech...
Ohidur Rahman Bappy
1

votes
1

answer
237

Views

Pocketsphinx cuts words

I have a problem. When I turn pocketsphinx by console everything works fine: [email protected]:~ $ pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 6764.lm -dict 6764.dic -samprate 16000 -inmic yes -adcdev plughw:0,0 However when i try run it by python script, pocketsphi...
marmite
1

votes
1

answer
273

Views

Should I use android Speech Recognizer in my App for data collection?

I don't want my app to control and execute some tasks on users phone, like setting alarm or calling someone. I have found many tutorials that focused on accomplishing this. I want to collect data using the speech recognizer API that goes to an online server, and later can be requested by multiple us...
Devashish Jaiswal
1

votes
1

answer
296

Views

Unable to parse Podfile.lock error on Pod install/update

Not able to update/ install pods. Error is 'Unable to parse Podfile.lock file'. Last pod installed was: pod 'googleapis', :path => '.' After that the issue started coming. Error Log [!] ERROR: Parsing unable to continue due to parsing error: contained in the file located at /Users/ios/Documents/Proj...
Adarsh Roy Choudhary
1

votes
1

answer
41

Views

Process returned -11 (0x-b) after clicking “Speak” in pocketsphinx UI example

I'm using an exact copy of the code from here. Every time I run the code, the UI appears and things seem to be doing what they should. Except when I click the 'Speak' button, it closes and the terminal prints out the following: Process returned -11 (0x-b) No errors appear. It closes as if it was mea...
Vindicium
1

votes
1

answer
241

Views

Google SpeechML API doesnt work well with noisy audio

I have been trying to develop a python script to transcribe audio from noisy audio files. My specific use case is to get noisy audio parts transcribed correctly. When i send the files to SpeechML API for processing, the responses have either omitted or incorrect responses for noisy audio . Is ther...
1

votes
1

answer
781

Views

How to check wether Speech Recognition is Available or not?

When I am initializing a Speech recognition app, I use this line of code: Boolean b=SpeechRecognizer.isRecognitionAvailable(cContext); Why does b always equals false on some devices (the emulator for example)? I understand what the function does from its description on Android documentation, but the...
Josh

View additional questions