Creating AI Datasheet Parser

Introduction

This project began when I was taking the class SYSEN 6888, Deep Learning. As I was doing a master’s project involved in voice related models, I was interested in taking on a project in voice cloning as a final project. I proposed the project and worked in a group of 5 but we did not have the time to finish it. This page is devoted to my personal developments in the project as I have continued to work on it in my free time. Though I give credit when due to the work of others, it should be noted that 45% of the project (in the state it was when we turned it in) was done by me, such as my dataset augmentation code and all the embedding generation code.

Project Goal

The goal of the project is to train a model which can listen to a few utterances from a person in real time and actively try to imitate their voice. The ideal case would be that, as we provide more samples of the person’s voice in real time, the system improves its imitation. The scope of the project is restricted to English voices and minimal background noise.

Check out the project's GitHub here