Drone control through a vocal interface

Sep 5, 2022·

Matthias Pirlet

· 0 min read

Abstract

Today the use of drones is widely spread for many tasks, but for some of these, such as firefighting, it is vital that the operator’s hands are kept free to do their job properly. Hopefully speech recognition is an exploding discipline in deep learning. This work therefore focuses on finding a deep learning model that recognises spoken commands to control a drone from a pre-defined set. The first part of the work was to build a training dataset of commands with the combination of complete commands generated through the use of text-to-speech APIs and hand-crafted commands. These were created thanks to the concatenation of an open source spoken words dataset and another set of spoken words acquired through a web platform created in order to complete the missing words of the vocabulary command set. The testing set was acquired by asking people to record complete commands under real conditions. The second part of the work focuses more on the different models that could be developed and all the techniques that can be used. These are presented as an ablation study, in order to improve the results on a test set in real conditions. Several methods were applied in order to achieve the final goal - the first is the use of computer vision models where the input of these models is a simple spectrogram of the different commands. The results using these types of models were not as good as those of the new models which take directly the raw waveform as input and combines vision, attention and self-supervised learning. The best version of this model obtains a F1-Score of 0.9973 on a real conditions dataset.

Type

Thesis

Publication

Master’s thesis, University of Liège