Newsletter – #3 – What would you say?

Hello again from Kotodama Fruit Juice!logo2

It’s been a very insightful & productive week for us as we have been able to meet with an incredibly diverse group of researches in order to get a better idea of the current progress of speech recognition and where our project could be most impactful.

Heather started out the week by presenting us with a demo of the Microsoft Speech API working within Unity3D — through the demo, we were able to control a small orchestra with our voice commands!

We met with CMU computational language pioneer Alex Rudnicky, who had the following suggestions for us:

  • We should know what kind of acoustics we will be working with
  • We have to know which language are we using
  • We should do something manageable in terms of the domain, know the context. 
  • If you add a previous request context to the new request it might work better
  • If we record people we need a disclaimer – “Agree that we record you, or don’t participate”
  • Paranoid Eliza – it is a very limited thing technologically, but gives a good impression emotionally and we could do something in the way – limited, but powerful
  • It should act as if it’s understanding.
  • Reasoning should be under control: It should ‘Do the right thing’.
  • The vocabulary, the dictionary is the data.
  • Gibberish: You can get the pitch but nothing more. You gotta be doing everything from scratch.
  • Interaction: 1-Reacting/Changing the direction of the story related to the speech recognized. 2- Players choose the continuation.
  • Eliminating others’ speech recognized by the microphone: Headsets might be a solution but the drawback will be putting something to someone’s heads.
  • Kinect’s voice recognition might help in this point. (2 mics can help recognizing who is talking)
  • Another option besides using a grammar to get semantics would be to grab what the guest said and do a keyword search on what was returned (works for Q&A with bot).
  • Immediate feedback on what the guest says (parroting back to them what the system picked up on) may be distracting and may make the system look silly as it often makes some mistakes.
  • Creating an NPC you can interact with that has distinct personality traits may be interesting
  • Having audience interact with application will cause tech issues, better that is be a personal (although not necessarily private) experience.
  • Communication is an exchange of ideas
  • “Make the system ACT like it is understanding…”

This meeting turned out to be exactly what we needed, as it allowed us to focus in on what problems were “solved” with the experiences we are exploring, and what problems are out of our reach.

This meeting, combined with an introduction to the problem from Mike Christel and Scott Stevens, and a blue-sky meeting with designer Jesse Schell allowed us to zero in our strategy as follows:

logo1We will be spending the first half of our project semester developing prototypes around long-standing pain-points in the technology, and then using those prototypes, we will create a polished user experience which will surprise and delight users through both interaction and inventive storytelling.

Targets for the team are currently:

  • Interrupting characters
  • Environmental Awareness
  • Exchange of Ideas
  • Natural Reactivity
  • Ambiguous Input
  • Volume / Intention Reactivity

Using a pre-defined character, we will build prototypes which will attempt to address the above problems with either a) a true solution or b) a strategy for creating the specific illusion.

Next week, our artist Momo will begin developing a single character to explore these virtual spaces, and our programmers Dilara and Heather will be working with Bryan Maher to develop a Unity 3D C# Wrapper for Microsoft’s Speech API so that we can quickly and easily iterate on our grammar and language detection. A nice side effect of this process will be that we will be able to provide the wrapper for ETC and BVW projects in the future. SO AWESOME.

It feels good to have plan.