Cutting Edge: Week Sixteen

The Work This Week

Finals

A big portion of this week was focused on our finals presentation, which we presented on Tuesday:

Final Narrative Experience

We also finished up our narrative demo experience, making some last polishes:

Documentation Videos

In addition to this experience, this week saw us finish the second part of our deliverable: our documentation. We constructed four detailed documentation videos detailing our process with each key transition, which can be viewed below.

(NOTE: This videos can also be viewed on YouTube. The corresponding text refers to the narration of each video.)

Millennium Actress Transition

Our Millennium Actress transition came out of our initial research stage of the project, when we were doing a deep dive into the filmography of Japanese animator Satoshi Kon.

We also wanted to try and develop prototypes based on some of Kon’s most evocative kinds of transitions, which utilized match cuts, jump cuts, and dissolves in ways that were practically impossible to achieve in live-action.

We eventually settled on adapting a particularly noteworthy transition from Kon’s 2001 film Millennium Actress, which centers around an aging actress recalling events from when she was younger. The transition itself is essentially the visual thesis of the movie, taking the character on screen backward in time without necessarily cutting to an earlier time period. Kon does this by melding what is essentially a match cut with a dissolve, morphing the “model” of the actress as she shifts from older to younger.

In first trying to translate this kind of transition into VR, we had to first think about how the character actually was going to move in a 3D space. The original sequence had the advantage of being essentially on a 2D plane, and so Kon was free to morph the central character from one side of the frame to the other without much movement, especially since the shot itself was a close-up. However, because VR doesn’t have the benefit of shot lengths, and because the transition itself seemed to require movement, this made our application of the transition in the VR space more challenging.

To sidestep this, we made the design decision to map the movement of the character who “initiates” the transition to a guest’s gaze, so when the guest moves their head left or right, the character “follows.” Additionally, we mapped the path of the character to the degree of the transition as well, meaning that as the character moved from left to right based upon the guest’s gaze, the environment will also shift from one environment to the next.

There are three cameras in the scene. The first two are slave cameras whose movements are synced with the headset. They both have their own camera rigs with their initial position and facing direction set up as desired, according to the story need. We then capture the two slave cameras’ render result to temporary stereo textures, and we use special shaders to restore these texture’s pixels’ world coordinates and clip them out if it’s needed by the transition effect. The third camera is the main camera. It doesn’t do any actual rendering, as it only blends the two processed textures together according to the blending process, and then displays the final result back to the HMD.

The motion of the character herself simply moved at the same speed as the guest’s head rotation, controlled directly by the gaze of the guest.

We made a VR prototype of Kon’s original transition by creating an interior space reminiscent of the first part of the transition, and an outdoor train station that was similar to the last part of the final transition. We also iterated on the kind of transition that takes place when the character moves from point A to point B, these being:

A dissolve
A wipe effect, like the one used in the Star Wars films
A “spreading” transition, in which the transition begins on the character herself rather than the environment, and then spreads radially outward.

When playtesting this transition, we wanted to answer the following questions:

How fast does the character need to move in order for the transition to feel natural? Should we adjust her speed linearly or exponentially based upon the speed of the guest’s head rotation?
Which kind of transition works best given the central mechanic of the prototype, a traditional dissolve, the wipe, or our “spreading” transition?
How sensitive should the trigger be for the gaze on the character? Should we make it larger or smaller?

From our playtests, we discovered that:

Directly linking the speed of the character to a guest’s head rotation gave some people the impression that they were “throwing” the character using their head, rather than the character moving with their gaze.
Most playtesters seemed to prefer the usage of the dissolve over the other kinds of transitions.
While we found that the overall sensitivity of the gaze on the character was actually okay, we did adjust the initial trigger for the transition to be a bit more sensitive than its initial iteration.

We fixed these issues not only by narrowing down our transition to simply a dissolve and adjusting the sensitivity of the initial trigger, but also by redesigning how the character herself moves in relation to a guest’s gaze and head rotation.

Because this transition involves a moving object, this movement is sourced as an animation clip. The animation plays back in its normal speed when we believe we have the guests’ attention, and plays at a much slower speed when we don’t, in order to reduce the chance of guests missing important story moments. We use the guests’ head direction to estimate their attention. So if the direction of their gaze falls on the animated character’s position within a certain threshold, we register this as having the guests’ attention. The more the guests’ are looking away from their natural forward direction, the more forgiving the threshold will be.

We were able to translate our prototype into the one found in our final version without too much iteration, though we did adjust the sensitivity of the overall transition and made the threshold for when the guests look away a bit more forgiving.

So in summary, here are a few takeaways we learned about executing this transition properly:

Dynamically adjusting the speed of the character’s movement based on “guest’s attention” seemed to work best, as it made both the character’s movement and the transition appear both natural and relatively seamless.
And with this kind of transition, dissolves worked best, as the transition lends itself to emotionally fraught scenarios due to the character “walking away from you” by design.

Match on Action and Object Cuts

Our Match on Action and Object cuts were inspired by much of the material we researched before creating our prototypes earlier this semester, as it seemed to be a key editing tool utilized by many of the films and experiences we looked at, from the filmography of Satoshi Kon to the gameplay editing of Virginia.

For those unfamiliar with film terminology, a match cut is a kind of edit where either the shape or form of an object, or a similar motion, is “matched” from one scene to the next, with perhaps the most famous of them being this one from 2001: A Space Odyssey.

While we didn’t build a prototype for this kind of cut in the first half of the semester, we did include it in our final experience. We based our match on action and/or object transition off of a cut from Ari Aster’s short film Munchausen (2013). In this scene, one of the main characters picks up a suitcase, the camera remaining fixed and tracking on the suitcase while the environment shifts from the character’s bedroom to the trunk of a car.

Similarly, for our final experience, we built a scene in which you were seated on your bed, with a suitcase placed next to you. We assumed that the natural inclination for most people would be to pick up the suitcase, and so we initially based our trigger for the cut around a guest picking up and dropping the suitcase on the floor.

To achieve this effect, we first tried loading a new scene on collision, triggered when guests drop the suitcase. A new environment is loaded in, and the suitcase subsequently appears in the next scene.

However, internal playtesting of this transition made us rethink how this match on action and object should function in our final experience. We made the following changes:

We found that if you bind a cut to an action, a delay in loading can break the transition. So to fix this, we put both environments in one Unity scene and loaded new assets instead of a new scene so that the transition appeared more seamless.
We also made the position of the suitcase fixed instead of making it fully interactable, meaning that you can only open and close the suitcase rather than being able to pick it up. This meant redesigning this interaction entirely. So now instead of having a closed suitcase situated next to your bed, we now had an open suitcase and a few other items positioned to your left on the bed. The actual cut was then linked to the act of closing the suitcase rather than lifting it up.
Making the position of the suitcase fixed also allowed us to let it occupy the majority of the frame while the cut occurs, which allowed us to ground the guest just enough into the next scene while still making it apparent that a cut had happened. We also added environmental sound to make the transition more obvious than it had been in its initial iteration.

At this point in development, there was still a minor frame freezing during the transition. So for the final iteration, we decided to set both environments active at the beginning but hugely distant to each other in the scene. The suitcase was put under the camera rig, which allowed us to ‘teleport’ the guest and the suitcase to another location when the cut was triggered without a delay being noticed. The performance of this transition was optimized by replacing the loading with a simple translation.

In its final implementation, we were surprised at how well match on action and object cuts worked within the VR format, and we got similar comments from those that playtested this transition during development. Out of all the other transitions we developed for the project, this one seemed to be one of the most promising transitions we created, as it showcased a unification of interactivity and editing that seemed uniquely suited for VR.

So in summary, here are a few takeaways we learned about executing this transition properly:

To keep the cut seamless, you should not only keep both environments within the same scene, but opt to translate the position of the player rather than load in new assets to minimize delay.
Keeping the focus object fixed within both scenes is key, as having it remain static allows it to take up the majority of the guest’s FOV and ground them into the next scene.
Changing sound cues are helpful in communicating that a cut has occurred.
And finally, not only can match cuts be executed in VR, but seem to be well-suited for the medium, linking VR-specific interactions and traditional film editing together in a way that felt idiosyncratic.

Montage Editing and Quick Cuts

Our Montage Editing sequence was inspired by a similar driving montage from the game Virginia. In Virginia, you embody the role of an FBI agent whose latest investigation often necessitates driving around rural Virginia as you search for clues regarding the disappearance of a young boy.

Because Virginia is unique due to its cinematic usage of editing, during these driving scenes the “shots” often cut forward in time and space so you don’t have to watch the entirety of the distance travelled, in a way that’s reminiscent of montage editing in cinema.

For those unfamiliar with the term, montage editing is a technique in film editing that stitches together a succession of short shots in order to condense space, time, and information in a brief sequence.

Since montage editing is a very common technique in cinema, and was used successfully even within an interactive experience like Virginia, we wanted to try to extend this to VR.

As a prototype, we created a sequence similar to the one that’s used in Virginia, where while you are driving down a road, the environments – such as from a well-lit daytime road to a forest at night – and the songs on the radio change to indicate the passage of time.

Given that the kinds of edits found in these sequences are mostly hard cuts, our major goal for implementing montage editing in VR was to make sure the transitions were not jarring for the guest. Since guests will view this experience from a first-person perspective, we wanted the positions of the near-space objects to be consistent while the environment is changing, in order to keep the guest “grounded” even though the environment around them was in flux. To do this, we intentionally made the car as the reference frame in our driving montage sequence so that the base action of driving remains through different time and locations.

Fortunately, we found that when playtesting this transition, the actual cuts themselves didn’t prove to be jarring for people who tried this prototype, although they did note that there was a bit of a lag in cutting from one environment to the next, which we subsequently worked to minimize.

Due to the success of this montage editing prototype, we also wanted to test if quick, hard cuts were possible in VR with the absence of a grounding object like the car. While we didn’t build a prototype for what we later termed our “quick cuts” sequence, we did build a quick cuts sequence for our final product.

This Quick Cuts sequence in our final experience was also inspired by Virginia, especially during a sequence where the protagonist of the game imagines a future where she informs on a long list of fellow agents, and time passes quickly before her eyes through a quick series of cuts. This sequence uses the same environments and spaces (such as your boss’s office and the FBI bullpen), but mixes some of the details up just enough to convey the feeling that change is happening. In essence, montage editing, but without any grounding object to pull you from one scene to the next.

In our final experience, we created a similar sequence where you, the protagonist, watch your relationship deteriorate over a period of time, and in doing so, showcase a series of quick cuts in between two specific environments, the open road and your apartment, to show this passage of time. We hoped that since quick cuts had worked well enough for our “montage editing” prototype, that it could extend to this sequence as well.

For quick cuts in general, we implemented dramatic lighting difference to help indicate the changes in environments.

Our playtesting of our quick cuts sequence revealed very quickly that fast edits like this were a lot more jarring in VR than we anticipated.

While in film, these kinds of edits are both very basic (moving the camera from one position to the next) and often unnoticeable, in VR, quickly changing position feels like teleportation and is disorienting to the guest, especially since the “camera” in VR is the guest’s FOV and where they decide to look; whisking them away to another scene without warning proves to be very uncomfortable. So while quick, hard cuts in our montage editing sequence worked relatively seamlessly, in the absence of a grounding object, these quick, hard cuts do not work quite as well.

To fix this sequence, we did the following:

We removed the driving interludes in the sequence, thereby containing the action to one scene in the apartment.
From the lessons we learned with our gaze interaction-triggered cuts, we decided to change the quick edits in this sequence to “softer,” fast dissolves where your girlfriend, from within the room, moves farther and farther away from you.
1. This also solved our issue, on the storytelling front, of getting across the right emotional affect, as with the addition of the dissolves and the confinement to one location, playtesters now have remarked that this sequence allows them to feel the both the emotion of the scene and digest the edits seamlessly.

In this altered sequence, we also incorporated details from the famous montage seen in Citizen Kane. In the film, Kane’s first marriage devolves, which is represented visually through a series of edits where his wife moves from being positioned right next to him at a small, intimate dinner table to being across the room from him at a massive yet alienating dining table.

So in summary, here are a few takeaways we learned about executing montage editing in VR successfully:

It is incredibly important to have one relatively large grounding object if your montage requires quick, hard cuts, as the guest will be uncomfortable and disoriented without one.
In the absence of a grounding object, you are much more limited as to what you can achieve with the montage. Neglecting to use a grounding object for us meant confining the action to a single environment and replacing hard cuts with “softer” dissolves to make the sequence more easily digestible.

Gaze Interaction

Our gaze interaction-triggered cuts were inspired by a few other VR short films we had researched that utilized gaze detection within the experience, such as Chris Milk’s SightLine: The Chair (2014). However, while the central conceit for Milk’s experience was that your environment changed based upon where you were NOT looking, we decided instead to try and invert this mechanic, and design a cut that was triggered by what the guest was directly focusing on instead.

Since one of our original design pillars was to give guests greater agency over when a cut or transition would be activated, we designed our gaze interaction-triggered cuts such that the transition itself would occur whether or not the guest looked at an object in their hand.

We wanted the trigger for these cuts to be based on the information of whether a person’s gaze is entering or exiting the target object, and so we developed a tool to help detect what we’ve termed “guest’s attention.” The tool itself is built around Unity’s Raycasting system. The script is attached to the main camera in the scene, and the ray is generated from the center eye anchor. From this ray, we defined a range of interactivity such that when the guest picks up an object and moves it within this range, the transition is triggered.

We believed that, due to the greater sense of “being” in VR experiences, that guests would interact with the environment in ways that mirrored their actions in real life. So because people generally look at the objects they pick up in the real world, we naturally assumed that this would extend into VR.

However, we wanted to test these assumptions to refine this transition into something usable for the VR format. To do so, we mocked up an environment inspired by Marcel Proust’s famous “Madeleine Moment,” creating two different dining table environments to establish a distinct before and after. We next tested a few different parameters:

How long do guests usually gaze at an object?
Do guests actually look at their hands when they pick up an object, and if not, would it be better to queue the transition off of the action of grabbing the object instead (meaning that gaze as a mechanic was not as effective as we hoped it would be)?
Is it more natural to queue the transition AFTER the guest looks away from the object, (meaning that the guest must first pick up an object, look at it, and then look at the environment), or can we queue the transition after some fixed amount of time when the guest looks at the object in their hand?
Lastly, is it jarring for the actual cut itself to be a jump cut, or would a dissolve work better?

Upon both our own internal playtesting and playtesting with people from outside the project, we made the following determinations:

We quickly discovered that people’s gaze times wildly diverged, and that there was no true mean gaze time. Some guests never looked away from the object at all.
Not only do guests look at their hands when they pick up an object, but they also spend so much time staring at the object that they often didn’t notice that the environment changes unless the changes happen to small details directly in front of them.
Triggering a transition off of when they were NOT looking at the objects proved difficult, because this hardly ever happened or happened too fast, as guests were content with playing with and staring at whatever it was that they were holding. So setting a fixed time of around 2 seconds for the transition to occur seemed to work best relative to our other parameters.
From our playtesting sessions, there was only a slight preference for dissolves over a traditional jump cut.
Finally, We didn’t expect that people would want to interact with the object, such as throw it across the room or bring it up to their face to try and “eat it” within the experience.

Based off of these observations and feedback from our playtests, we changed the mechanics of the gaze-triggered interactions in a few key ways.

We modified our transition so that we had gaze-interactable transitions that occurred when you looked both at something in your hand, AND when you looked at an object further away in the environment. This helped alleviate some of the issues of guests not noticing when a transition was taking place.
For objects that are in your hand, we added a fixed transition time of around 2 seconds when a guest looks at the correct object.
1. Additionally, for these kinds of transitions, we darkened the rest of the environment when the gaze interactable-object appears, so that the actual transition from this “black” environment to the next is a lot more obvious.
While the version of the transition that was linked to looking at an object in your hand guaranteed that a guest’s attention would be focused on that object, we had to add motion to versions of this transition that used some stationary object in the environment. This was done in order to help pull attention towards what we wanted the guest’s gaze to interact with, without forcing their perspective.

The implementation of these new versions of the gaze interaction-triggered transitions in our final product also revealed that, though there was no preference between hard jump cuts and dissolves in our prototype, playtesters vastly preferred the use of softer cuts in an actual narrative experience, as jump cuts made people relatively uncomfortable.

So in summary, here are a few major things we learned about executing this transition properly:

Gaze Interaction-Triggered Cuts that are based off of holding an object in your hand need to have very obvious environmental changes in order for guest’s to perceive a cut, and should trigger about two seconds after the guest looks at the object.
Gaze Interaction-Triggered Cuts that are based off of looking at objects in the environment need some kind of indirect control to motivate their usage, though you can queue these instantaneously.
And softer cuts are favored over harder jump cuts.

Technical Tools

Finally, the last aspect of our deliverable: Our technical tools that we used for our project are all available to explore on GitHub under https://github.com/sherryfan/CuttingEdge-VR-TransitionTool

Going Forward

With our semester complete, we can look back on the work we did. It wasn’t always easy, but we stayed true to our core goal (develop a narrative experience showcasing our cuts and transitions) from beginning to end. We utilized a detailed production process in prototyping first, then developing the plan for our final narrative, then making all of our assets, then polishing at the end.

We are currently in talks to try and showcase our work at conferences in the future, like SWSW or GDC. We’ve learned a lot this semester and we want it to continue.
Keep on the lookout for more videos and updates on where we’ll go next. Thank you all for a wonderful project.