top of page

SpeechNet

While working as a caregiver I lived with a woman who communicated through unique vocal signs.  This project uses a convolutional neural network (CNN) to translate several of her signs into English text.  I built the CNN with Python and Keras, and created a user interface with Tkinter.  One of the major challenges for this project was recording a sufficient number of audio samples to train the network.  I didn't want to spend months recording thousands of samples, so I needed a way to augment a small set of data in order to create a larger one.  The usual methods of augmentation (changing pitch, adding white noise, etc.) weren't working, so it was time to get creative.  I decided to run the audio samples (only about one hundred per sign) through Logic Pro X music editing software.  It has a feature called 'Space Designer' which allows you to add reverb to audio, mimicking the acoustics of different spaces (concert hall, small room, barn, etc.).  This increased my data set tenfold, and the CNN's accuracy jumped to 95%.  This project taught me about CNNs, generating and augmenting your own data and how to tackle real life problems with machine learning.

View code on GitHub

bottom of page