AI systems and products are becoming more and more ubiquitous thanks to advances in machine learning and big data. From virtual assistants to in-car navigation, all sound-activated machine learning systems rely on large sets of audio data. However, there is no large enough publicly available audio dataset that can be used to create a new AI system for analyzing the home environment sounds.

Our team is taking a game with a purpose approach to solving this problem by collecting audio data that generates an annotated public audio database. This will help students and researchers in training the AI, thus advancing sound research and development. 

Our project began at the end of August when we were given our project constraints by the client. To collect home sounds including but not limited to dishwasher running, sneezing, footsteps and fire alarm using Twitch platform. The key point from our client was to focus on quantity over quality. They required an audio collection experience to collect large amounts of annotated audio data related to sounds at home.