Peter Plantec is an incredible salesman. He doesn’t desire that you purchase expensive software, specialized hardware or servos and actuators to build robots. He is not attempting to sell you anything more than an idea; the idea that the future interface to the technology which surrounds our lives must become more human. He is not seeking to improve the interface to our devices through more easily interpreted affordances, as Norman and Neilsen desire, but by making our interactions with technology more literally human like. He demands that his readers become active participants in building the first generation of this world, and that is what sets Virtual Humans apart from other books imagining the future. You are not alone in this endeavor. The book along with its accompanying CD contains the necessary software and guided inspiration that you will need to make it happen. This is not a book meant to be read casually while curled up in a cozy chair, unless you also have room for a laptop. It is meant to be delved into while at your computer, installing the included software packages and playing with this technology. It is only through working with the agents, as both user and creator that you can start to appreciate the zeal that permeates the author’s words on the subject.
The foundation of this work is the realization that it is not presently possible to create anything on the order of a true personality or intelligence. Furthermore, given current research into the quantum nature of consciousness(Plantec, 2003), and the incredible complexity of our language faculties (Pinker 1999), the likelihood of ever being able to accomplish this task is slim. The author permits, in fact encourages, his readers to skip over sections of this chapter to prevent them from getting bogged down early in the book. My admonition to you is to ignore this advice and to cling to the argument presented, by your fingernails if need be. The argument is difficult, but it has a remarkable liberating effect. Since there is no way to realize an actual virtual consciousness we are free to employ artifice instead.
The bulk of the book focuses on the incredible complexity of developing “the illusion of personality”. The face we present to the world is a product of genetics combined with an amalgamation of all of the experiences of our lives and our choices in dealing with the events, big and small. Through our experiences, knowledge of cultural idioms, capacity for empathy, knowledge of how to emote, how to behave and self censor as situations warrant we possess a shared vocabulary that transcends language. How daunting a task it is to approximate this in a series of rules starting from nearly nothing? Fortunately, the technical “heavy lifting” has been done through years of research, experimentation and programming. The CD-ROM included with the book and referenced websites are loaded with tools that handle the daunting task of interpreting and generating speech, but they are just tools. What stands before those who accept Plantec’s challenge is most intimidating, a white canvas, a blank page, a tabula rasa, software devoid of any recognizable humanity.
Virtual Humans focuses on assisting you in developing the character you want to create. While there are sample files provided to jumpstart your explorations, it is clear that Plantec’s intent is that the reader will move beyond working with pre-built agents and begin to let her own creativity manifest. This is a difficult task for the uninitiated. Thankfully, the pages are filled with exercises and techniques that encourage her to step back and challenge her perceptions of her interactions with others, to contemplate the nature of conversation, to observe the language of the body as well as the spoken word, to become a student of humanity that she might more accurately represent it in her creation.
It is interesting that when confronted with the problem of synthetic actors, science fiction author Neal Stephenson opted to place real actors behind virtual characters in his novel The Diamond Age, allowing the communication to take place person to person and relegating the technology to the role of virtual make-up artist and set designer (Stephenson 1995). Confronted with the idea of virtual actors able to respond to a person, Stephenson rejected the idea as too farfetched for his world. The current levels in our technology require us to be bolder, to attempt to anticipate every conceivable response, and like game level designers, to try to prevent users from seeing the edge of our imaginary space while providing the illusion of limitless exploration.
The path the creator walks is more like that of a novelist or screenwriter than that of technologist. Imagine a situation where a woman, Silvia, wishes to develop a character, Andrew, as an interface to her personal information management software. Following Plantec’s advice, she wants her character to be more than a software interface. Andrew needs depth. Life for a character begins with motivation. Who is he? Where did he come from? What does he enjoy? What does he dislike? Are there any mannerisms that he has? How will he represent them? How does he respond to a compliment or to rudeness? The author utilizes over half of his book to equip Silvia to inhabit the mind space of her creation, to get to know Andrew as well as she knows herself. In this she can start to understand how he will speak, how he might respond to a given question. This is of utmost importance, because she will be providing every comment that will issue from his mouth in the underlying speech database.
While the bulk of the book focuses upon the textual, there are nods toward the importance of the visual aspects of the virtual humans as well. Once again invoking The Diamond Age, Stephenson envisions salons where actors have grids embedded in their flesh that allows their performance to be read and instantly mapped to control points of virtual characters, controlling not just gross body movement but also nuanced facial expressions, a permanent, real time version of the performance capture exemplified in the Polar Express, but without the unnatural creepiness evoked by that film’s descent into Masahiro Mori’s uncanny valley (Stephenson 1995, Clinton 2004, Bryant). At present it is difficult to imagine actually reading body language and facial expressions of a virtual human, but the technical and cost barriers will fall over time. At present the software included with the book allows static images to be mapped as textures to 3D models, and the resulting images to be morphed between key frames. This allows our aforementioned creator, Silvia, to select Andrew’s facial expressions, providing the targets through which his image will morph to visually represent pleasure or dissatisfaction. The next likely evolution will be to model facial muscles to achieve greater emotion. Valve software is pushing the envelope of expressiveness in game characters with the animation capabilities of Half Life 2. It has been reported that the facial animation tools allow for such nuanced manipulation that they are being considered as a way to teach autistic children to recognize facial expressions of emotive states.
As costs drop and computing and rendering power continue to increase, virtual humans will need to learn to emote, possibly with entire virtual bodies, and as such will need to learn this from people skilled in this area. This brings to mind the technical expertise that WETA Workshops brought to bear in creating Gollum for the Lord of the Rings trilogy. The technical and artistic know-how was immense, but it wasn’t sufficient. There is a great gulf between the initial visualization of Gollum visible for a moment in the mines of Moria and the menacing emotive Gollum of the latter movies. The major difference between the two was not more advanced technology, but increased collaboration between accomplished technical artists and a gifted actor, Andy Serkis. While most developers will be unable to afford the impressive talents of Mr. Serkis, calls to his agent went unreturned, and most users are unlikely to want Gollum as a trusted advisor, it would be nice to be able to examine and interact with a character that was as emotive in real time. In developing agents that must act it is important to draw upon the experience of actors, and in this Plantec steps aside and encourages his readers to explore the work of Ed Hooks, author of Acting for Animators and an acknowledged expert in helping animators derive true performances from their creations.
When all of the techniques described by Plantec are implemented to their fullest capability, the result is still a very elaborate virtual doll. How might virtual humans be incorporated into society for its benefit? Some ideas espoused in Plantec’s work include providing corporate directory and transactional assistance, aiding teachers by being able to provide individual attention to one child, providing companionship for shut-ins and lonely individuals, acting as a personal assistant, interfacing to domestic control systems, and providing deeper, more engaging entertainment experiences. It is already possible to complete a transaction via the phone with some corporations, relying solely upon speech as your input device, and the author shares anecdotal evidence of virtual bots being used in education and of seniors who have developed a strong affinity for virtual personalities with which they have interacted. Home automation systems are starting to be more widely deployed and the costs are dropping for new comers to enter and experiment with computer controlled lighting, heating and appliances. With minimal additional programming it is possible to interface the A.I. bot engines to feed commands into the control software.
From the reviewer’s perspective perhaps the most intriguing, or perhaps it is merely the safest, possibility for virtual humans is in the realm of entertainment. In this environment the user/player is entering the world of the virtual rather than asking the virtual to be accepted into our world. Within the “magic circle” of play we are more forgiving and willing to accept some of the limitations of a virtual character. Sadly, most current games that have non-player characters (NPCs) suffer from abysmally short dialog trees with a minimal number of responses (Spector 1999). The NPCs exist solely as dispensers of information and as mild plot devices, not as living entities within a realistic world. Providing interesting motivation for the characters is in direct conflict with their role as a device for furthering game play. It is imperative that they not stray from their scripted spot, their virtual feet either nailed to the floor or placed upon some predetermined path, cursed to walk until their job is complete.
Massively multiplayer games attempt to mitigate the arbitrary nature of virtual people by allowing real players to interact with one another. Unfortunately the world and the NPCs that inhabit it are still dispensers of items ("phat lewt"), quests and experience. Players are incapable of using their avatars to express real emotion, locked by pre-determined animation keys and non expressive facial textures. Consequently players have little incentive to buy in to the fiction that there is anything grander occurring in the world apart from the quest for advancement. Also, the act of introducing other human players to the game world introduces conflicting play styles, which may diminish the entertainment experience (Yee). Where is the drama, the humanity, the pathos? As Warren Spector famously stated at the Game Developers Conference, “I haven’t cried because of a video game since Floyd died.” Can fully realized virtual humans provide the bridge to something other than physics and projectiles based gaming experience?
Michael Mateas and Andrew Stern have attempted to answer this question through the creation of Façade, an interactive drama in which you find yourself at the wrong place, your friends, Grace and Trip’s apartment, at the wrong time, the moment when their relationship begins to crumble not just before you, but in response to each and every action you take, place that you look, and word that you speak during your visit (Matteas & Stern 2003). Under development for more than five years, it is ambitious in the way it breaks new ground in the genre. Here are two characters that interact and emote not only with the player, but with each other in ways determined by the multilayered speech and emotion engines. When all systems are working well it is possible to forget that these characters are not real. Of course, the sense of immersion one feels in interacting with Grace and Trip owes itself, in part, to our willing participation in meeting them in their world. In this virtual apartment, Mateas and Stern have imbued the virtual objects of their creation’s virtual lives with meaning, allowing not just our words but our actions in this space to alter which conversational and emotive rules will fire and drive the arc of the story. If only this kind of depth could be found in the world of NeverWinter or Norrath. The complex interaction of systems provides a compelling example of what will be possible, if not commercially viable, within entertainment software in the near term, but extremely difficult to implement if the virtual actors need to be able to break through the ‘fourth wall’ of the monitor and interact in our space.
A critic’s eye
The possibilities do appear fascinating, but these are fanciful dreams of a future which may be long in coming. The present reality is that there are flaws in current implementations of agents, not just in games, but across the spectrum. A perusal of the sites on the Microsoft Agent web ring revealed that we are still at a cautious experimental stage. The agent websites were mildly interesting, but none of the sites were using the agents in its capacity as an agent. They instead had the virtual human (or parrot or robot) speaking text which could have been more quickly read. The act of speaking the text in a mechanized voice did nothing to improve the delivery of the information. The majority of the sites were also using agents generated by someone else. This led to a limited palette of actions and expressions. The agents were forced to display scripted animations that often did not fit the actual sentiment of the text and resulted in distraction rather than deeper engagement with the agent and the information the agent was providing.
In these sites the agent didn’t seem to have a life of its own at all, and this is Plantec’s point. If these things are ever to become more than interesting technical toys on the fast track to obscurity then developers need to get to the point where the agents are actually acting and interacting with one another. Incorporating Plantec’s ideas about generating personality for the bots is necessary, but not sufficient to realize his goals. The challenge is to provide a level of interaction similar to those provided by Mateas and Stern in Façade, but in an environment where the agents have to perceive and interact with the “real world.” The difficulty is that to allow for this to happen the systems must have the capacity to interchange data via a common language or protocol, and an acceptable level of trust across systems at the back end of the interactions must exist.
In his book, Plantec blithely envisions a situation in which he is traveling to a city in a country he hasn’t visited before. Upon arriving at his hotel he uses his personal virtual assistant to handle the act of checking in, his software interfacing directly with the hotel’s systems. The virtual guide then suggests local restaurants that he might like based upon the restaurants he has visited in other locations and reminds him of gifts to bring home to his family, including suggestions for what they might like and where he might find the items. Instead of having to build up a database of preferences entered manually like software that is currently available, this information was entered and indexed through conversations with the virtual agent. While the book provides the tools to handle the generation and storage of his local preferences, Plantec doesn’t describe the technology that sits behind the scenes that allows the agent to know how to interface with the hotel’s systems to select a room that will be to his liking and the details of restaurants and shops of this unknown city, but it is likely that it would have to be of similar size and scope to the semantic web specification drafted by the World Wide Web Consortium (W3C).
Plantec’s vision ties back to Tim Berners-Lee’s desire for the software agents to communicate smoothly via a “Semantic Web”. In a Scientific American article in 2001 of that name Berners-Lee posits the following situation:
"The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I'm going to have my agent set up the appointments." Pete immediately agreed to share the chauffeuring."
"At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules." (Berners-Lee, 2001).
This is clearly in line with Plantec’s goals for intelligent software assistants that can act as agents for their master. What the author removes from this picture is the necessity of the handheld web browser and has replaced it with a virtual majordomo who will act intelligently in its stead, freeing up Pete and Lucy, and us if we allow it, to work on other matters. Honestly, when compared with the scope of the W3’s scope for how agents will communicate, phones anticipating the needs of the user and controlling the surrounding audio devices, automatic filtering of data based upon agent to agent interaction, the human agent interface seems somewhat trivial, but these initial impressions are incorrect. I am certain that there are many computer users who either don’t fully understand the technological environment in which they find themselves and would appreciate a human interface and others who are simply too busy to enter manually all of their details and preferences into a database, but who would love to learn that the information had been recorded and was available after a brief series of conversations that took place at their convenience. The addition of a human interface allows users to interact with and query data stores in a more natural mode. Many users would welcome the appearance of a trusted guide. But would there be real trust? And what happens when that trust is breached?
Suppose a forward thinking company decided to implement some of the suggestions espoused by Plantec and modified their automatic voice system to respond with some sassiness, perhaps flirting with the customer, or making small talk during the lag times. Such a system, properly implemented could generate interest in the company, the elusive buzz of viral marketing. It is conceivable that people would start calling just to interact with the system because it is amusing and engaging. This is innocuous enough, but what if responses were being recorded and analyzed to determine if you fit a desired target demographic? Would your opinion change if the analysis was being performed to determine the ease with which the caller could be persuaded to purchase the desired product? What if you the caller was singled out for a meaningful “heart to heart” with a witty, engaging corporate representative who bantered with and flattered him as an important customer? Would it matter if he couldn’t tell that he was speaking to a machine?
While this sounds alarmist, there are already implementations of bots that interact with humans and use the resulting conversation to determine their personality type. (ALICE Artificial Intelligence Foundation) While this bot is explicit in its purpose, others could be created where the interaction is recorded and analyzed secretly. Plantec doesn’t ignore these issues in his book. As we start down this blurring of the real and the virtual he wants us to be aware of the pitfalls that arise. The technology is not predetermined to lead us to a desirable end, and for every good application of engaging virtual agents there are unscrupulous and exploitative possibilities to consider.
“Virtual Humans” appears at a critical juncture in the development of agent technology. The crux of the matter is that many present implementations, while interesting on the technical level, is not terribly interesting on the human level. Plantec wishes to advance the field further by encouraging designers to get creative, to have fun in developing back stories and personalities for their creations, to see them as characters and not technical implementations, and as a result to create characters that people will want to not just use, but interact with. This is the right encouragement at the right time. I am looking forward to adopting his design methodology and introducing my students to my first bot, but I am more eager to see in what directions they will take this technology.
Once you get past the “gee whiz” factor that the virtual person is in fact responding to your typing or, even your voice, you begin to realize that the creative work has been done by the design team. This is what makes the promise of virtual humans intriguing for education. It affords us the opportunity to interweave technology, logic, programming, psychology, art, creative writing and linguistics in a compelling package. This technology shines when the designers surprise an unsuspecting user by anticipating a thread of conversation and allowing for it, even though it may only be tripped one time in a thousand. When a character responds in this way is catches the user off guard with its humanness. The creators know that it is a testament to their masochistic tendencies, their willingness to spend long hours digging into their character’s psyche and the production of a rich, deep database of conversational possibilities. To the user, however, it is one more step along the path to believing in the ghost in the machine.
References
Berners-Lee, T. (2001). The Semantic Web, from http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C...
Bryant, Dave The Uncanny Valley Why are monster-movie zombies so horrifying
and talking animals so fascinating?
http://www.arclight.net/%7Epdb/glimpses/valley.html
Clinton, Paul (2004). 'Polar Express' a creepy ride – (Nov 10, 2004) http://edition.cnn.com/2004/SHOWBIZ/Movies/11/10/review.polar.express/
Matteas, Michael, Stern, Andrew (2003). Façade: An Experiment in Building a Fully-Realized Interactive Drama
http://www.quvu.net/interactivestory.net/papers/MateasSternGDC03.pdf
Plantec, P. (2003). Virtual Humans A Build it Yourself Kit. New York, NY: Amacom.
Pinker, Steven (1999). Words and Rules The Ingredients of Language. New York, NY. Harper Collins
Stepheson, Neal (1995) The Diamond Age. New York, NY Bantam Books
Spector, Warren (1999). Remodeling RPGs for the New Millennium http://www.gamasutra.com/features/game_design/19990115/remodeling_01.htm
Yee, Nick The Norrathian Scrolls http://www.nickyee.com/eqt/home.html
Recommended Websites & Books:
Agent Technology
Microsoft Agent components - http://www.microsoft.com/msagent/default.asp
AIML language and Award winning chat bots - http://www.alicebot.org/ -
Hosting service for ALICE chatbots - http://www.pandorabots.com
Add virtual human with lip synching on your website - http://vhost.oddcast.com –
Peter Plantec’s v-people website - http://www.ordinarymagic.com/v-people/ -
Façade (Michael Mateas and Andrew Stern) - http://interactivestory.net –
X10 corporations home management system - http://www.x10.com/activehomepro/ -
The Uncanny Valley
Dr. Mori's Uncanny Valley http://amos.indiana.edu/library/scripts/valley.html
The Buddha in the Robot http://www.kosei-shuppan.co.jp/english/text/books/robot.html
Pixar and the Uncanny Valley (2004) http://www.robotjohnny.com/archives/2004/10/pixar_and_the_u.php
VFXWorld - Feature - All Aboard the CG Polar Express http://vfxworld.com/?sa=adv&code=57c5ed8a&atype=articles&id=2289
Acting and Animation
Hooks, Ed Acting for Animators and http://www.edhooks.com
Andy Serkis Official Website http://www.serkis.com/
The Semantic Web
Semantic Web at W3C http://www.w3.org/2001/sw/ -
Mon, 11/02/2009 - 16:38
I can see virtual humans been used in everyday life in the very near future, such as news reading, TV presenting etc. We are very close to this age