Siri, Amazon Echo and “S Voice” are all voice recognition programs designed to make life easier. Even a Raspberry Pi voice control can be easily built by yourself. Because such a speech recognition is of course very beneficial for home automation.
In this tutorial, I’ll show you how to digitize speech through a microphone, convert it into text, and then respond to it.
The voice recognition feature can already be tested here (in Chrome).
Hardware for the Raspberry Pi voice control
- Raspberry Pi Model B (requires two USB ports or one USB port and one internet connection)
- USB sound card
- a microphone
Alternatively, you can also take a USB microphone, but I have not tested it. Still, it should work.
If you want to control the GPIOs by voice input, for example, a breadboard and jumper cable is helpful for connecting to the Raspberry Pi.
Preparation
Google is the language recognition service. To use the API, you need to join this group with your Google Account.
Next, open the Developer Console and press “Create Project”. Once created, click on “APIs and Authentication” -> “APIs” on the left and search for “Speech API”. You activate this for your project and click on “Access Data” on the left. Here you have to create a new public key (browser key).
You have to insert the now created API key in the script.
It should be said that every day 50 requests are free. If you need more, you can either buy it from Google or create a second project and get another key 😉
Raspberry Pi voice control – software
The principle is the following. An audio file will be created, sent to Google and sent back as text. So let’s start:
sudo apt-get update sudo apt-get install flac
Now we check if the USB card has been detected correctly:
lsusb
It should be such an entry:
Bus 001 Device 004: ID 1130:f211 Tenx Technology, Inc. TP6911 Audio Headset
Now we have the recording devices output:
arecord -l
For me, the output looks like this. The number is important:
**** List of CAPTURE Hardware Devices **** card 1: AUDIO [USB AUDIO], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
At this point, I recommend testing the microphone. For me the microphone was muted (which can be reversed with amixer -c 1
). So we record a short test file and let it play:
arecord -d 10 -f cd -t wav -D plughw:1,0 test.wav aplay -f dat test.wav
If you hear something, everything has worked out and it can go on.
We create a file that sends and evaluates the request.
sudo nano speech2text.sh
The file has the following content. Above you have to enter your API key.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
#!/bin/bash KEY="DEIN_KEY" URL="https://www.google.com/speech-api/v2/recognize?output=json&lang=de-de&key=$KEY" echo "Aufnahme... Zum stoppen STRG+C drücken und warten." arecord -D plughw:1,0 -f cd -t wav -d 0 -q -r 44100 | flac - -s -f --best --sample-rate 44100 -o file.flac; echo "" echo "Ausführen..." wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=44100" -O - "$URL" >stt.txt echo -n "Google Antwort: " OUTPUT=$(cat stt.txt | sed -e 's/[{}]/''/g' | awk -F":" '{print $4}' | awk -F"," '{print $1}' | tr -d '\n') echo $OUTPUT echo "" rm file.flac > /dev/null 2>&1 strindex() { x="${1%%$2*}" [[ $x = $1 ]] && echo -1 || echo ${#x} } # Damit Groß- und Kleinschreibung ignoriert wird. # Falls wichtig, nächste Zeile auskommentieren OUTPUT=$(echo $OUTPUT | tr '[:upper:]' '[:lower:]') # Die zu suchende Zeichenkette muss klein geschrieben sein # (ansonsten den Befehl vorher auskommentieren) if (($(strindex "$OUTPUT" "licht an") != -1)); then # Befehle ausführen, Skripte startem, etc. echo "Licht wird eingeschaltet" fi if (($(strindex "$OUTPUT" "licht aus") != -1)); then echo "Licht wird ausgeschaltet" fi |
Afterwards, the script still has to be assigned necessary rights.
chmod +x speech2text.sh
Now it can be started and talked.
./speech2text.sh
I’ve included two example queries in the lower part of the script that can be used to respond to speech input. For example. another script or command could be executed. What you use this for is up to you.
For completeness, here is my test:
pi@raspberrypi ~ $ ./speech.sh Recording... To stop, press CTRL + C and wait. ^C Execute... Google answer: "turn on the light" Light will switch on
If there should be interest, I could also create and post using a smartphone.
5 Comments
Hi!
Thank you for posting this interesting article! Is it possible to do this using python and google assistant since its available everywhere? and can we connect just one device to one raspberry pi or multiple devices (fans, lights) can be connected to a single raspberry pi?
Hi,
I am new at this but could you please tell me where I have to paste my API key?
Do I replace the “DEIN_KEY” in the script and have the ” ” around my API key or do I paste it somewhere else? Thanks in advance.
“DEIN” is German for “your”, so DEIN_KEY is YOUR_KEY
How do you manually rotate the proxy?
This seems doesn’t actually go over how you would build a raspberry pi with voice control. It just tells you how you can transcript speech to text with your raspberry pi. If you want to voice control your raspberry pi you wouldn’t say info it and run some scripts for it to start listening just so once (or if) the text contains a certain word then it can run another script.