Author Archives: Eamonn Bell

Perceptual Bases for Virtual Reality: Part 2, Video

This is Part 2 of a post about the perceptual bases for virtual reality. Part 1 deals with the perceptual cues related to the spatial perception of audio.

The chief goal of the most recent virtual reality hardware is to simulate depth perception in the viewer. Depth perception in humans arises when the brain reconciles the images from each eye, which differ slightly as a result of separation of the eyes in space. VR headsets position either a real or virtual (a single LCD panel showing a split screen) display on front of each eye. Software sends to the headset a pair of views onto the same 3D scene that are rendered from the perspective of two cameras in the virtual spaces, separated by distance between the user’s eyes. Accurate measurement and propagation of this inter-pupilliary distance (IPD) is important for effective immersion. The optics inside the Oculus Rift, for instance, are designed to tolerate software changes to the effective IPD within an certain operational range without requiring physical calibration. All these factors taken into consideration, when the user allows themselves to focus on a point beyond the surface of the headset display, they will hopefully experience a perceptually fused view of the scene with the appropriate sense of depth that arises from stereopsis.

However, stereoscopic cues are not the only perceptual cues that contribute to a sense of depth perception. For example, the widely-understood motion parallax effect is a purely monocular cue for depth perception: as we move our head, we expect objects closer to us to seem to move faster than those that are further away. Many of these cues are experiential truisms: objects farther away seem smaller, opaque objects occlude, and so on. Father Ted explains it best to his perenially hapless colleague Father Dougal in this short clip.

Others are less obvious, though well-known to 2D artists, like the effect that texture and shading has on depth perception. Nevertheless, each one of these cues needs to be activated by a convincing VR rendering, and implemented in either the client application code or the helper libraries provided by the device vendor (for instance, the Oculus Rift SDK). Here, I discuss three additional contingencies that impact the sense of VR immersion that go beyond typical depth-perception cues, to show the importance of carefully understanding human perception in order to produce convincing virtual worlds.

Barrel distortion

As this photograph, taken from the perspective of the user of an Oculus Rift, shows, the image rendered to each eye is radially distorted.
Image from inside Oculus Rift
This kind of bulging distortion is known as barrel distortion. It is intentionally applied (using a special shader) by either client software or the vendor SDK to increase the effective field-of-view (FOV) of the user.  The lenses used in the Oculus Rift correct for this distortion. The net result is an effective FOV of about 110 degrees in the case of the Oculus Rift DK1. This approaches the effective stereoscopic FOV for humans, which is between 114 and 130 degrees. Providing visual stimuli in our remaining visual field (our peripheral vision) is important for the perception of immersion, so other VR vendors are working on solutions that increase the effective FOV. One solution is to provide high-resolution display panels which are either curved or tilted in such a way as to encompass more of the real FOV of the user (e.g. StarVR). Another solution is to exploit Fresnel lenses (e.g. Weareality Sky), which can provide an effective FOV larger than regular lenses in a more compact package that is suitable for use in conjunction with a smartphone. Both of these methods have drawbacks: the additional cost of larger panels increases the total cost of the ‘wrap-around method’, while Fresnel lenses produce ‘milky’ images and their optical effects are more difficult to model in software than those of regular lenses.

‘Smoothness’

An exceptionally important factor in the perception of immersion in virtual reality is the sense of smoothness of scene updates in response to both user movement in the real world, and avatar movements in the virtual world. Perhaps the most bottleneck in this process to tackle is the rendering pipeline. For this reason, high-end gaming setups are the norm for the recommended system specifications for virtual reality. Builds in excess of 8GB of system RAM, a processor beating at least an Intel Core i5, and mid- to high-range PCIe graphics cards with at least 4GB of VRAM on-board are de rigeur. NVidia have partnered with component and system manufacturers to develop a commerce-led set of informal standards known as ‘VR Ready’.

Even if the rendering pipeline is able to provide frames to the display at a rate and reliability sufficient for the perception of fluid motion, the user motion detection subsystem must also be able to provide feedback to the game at a sufficiently high rate, so that motions in the real world can be translated to motions in the virtual scene in good time. The Oculus Rift has an innovative and very high resolution head-tracking system that fuses accelerometer, gyrometer, and magnetometer data with computer vision data from a head-tracking camera that infers the position of an array of infrared markers in real space. Interestingly, even very smooth motions in the virtual world can induce nausea and break the perception of immersion if those motions cannot be reconciled with normal human behavior. So, for instance, in cutscene animations, care must be taken not to move the virtual viewpoint in ways which do not correspond with the constraints of human body motion. For example, rotating the viewpoint around the axis of the human neck in excess of 360 degrees entails disorientation and confusion.

Contextual depth cues can remedy confounding aspects of common game mechanics

Apart from those aspects of rendering that are application-invariant, VR game programming poses special problems to the maintenance of user immersion and comfort, because of the visual conventions of video game user interface. This excellent talk by video game developer and UI designer Riho Kroll indicates some of the solutions to potentially problematic representations of certain popular game mechanics.

Kroll gives the example of objective markers in the context of a first-person game that are designed to guide the player to a target location on the map corresponding to the current game objective. Normally, objective markers are scale-invariant and unshaded, and therefore lack some of the important cues that allows the player to locate them in the virtual z-plane. Furthermore, objective markers tend to be excluded from occlusion reckonings. The consequence is that if the player’s view of the spatial context of an objective marker is completely occluded by another game object, almost all of the depth-perception cues for the location of the marker are unavailable. Kroll describes an inelegant but well-implemented solution: during similar conditions of extreme occlusion, an orthogonal grid describing the z-plane is blend-superimposed over the viewport. This orthogonal grid recedes in to the distance, behaving as expected according to the conventions of perspective and thereby providing a crucial and sufficient depth-perception cue in otherwise adversarial circumstances.

Motivated Object-oriented programming: Build something from scratch!

This semester, I am putting together a 3D data visualization tool using Processing, to demonstrate the usefulness of this language for rapid prototyping of ideas for 3D graphics applications, including virtual reality. The code, documentation, and issue tracking are, as of recently, hosted on Github here.

If you’ve ever taken a programming course or encountered an intermediate online tutorial in a language that encourages object-oriented programming (OOP), you will have probably studied an example that describes the language feature in terms of a motivating example (or two). You might have designed Dogs that subclass Animals, which emit ‘woof!’s; Cars with max_speeds; Persons with boolean genders, and so on…

But these are toy examples that are underwhelming and too straightforward to help in practice. Furthermore, they’re not all that interesting. Once you’ve completed the typical OOP introduction, you often end up with code that performs a function implemented elsewhere a hundred times over, with greater efficiency. That in itself is not that bad: we all need pedagogical examples simple enough to introduce in a 75-minute lecture. What’s worse is that you’ve written code that you simply don’t care about. And that’s a recipe for demoralization.  For the beginning programmer (yours truly) OOP becomes a convenient abstraction that bored you to death once or twice (and went ‘woof!’).

Nothing motivates design like the task of modeling a system with which you are otherwise quite familiar. Daniel Shiffman’s free online book, The Nature of Code, focuses on the simulation of natural (physical) systems and gently introduces OOP as a means to modeling “the real world” (rather than a toy example of a Car with an Engine). Of course, Shiffman’s models are simplistic too, relying basic mechanics and vector math to animate their construction. Nevertheless, his examples and exercises leverage your best guesses about how the world works and challenge you to implement them in code, which is the nature of (many kinds of) programming: to take the big world and, in code, make a small world that — invariably imperfectly — reflects the large.

My advice, then, is dive right in with a project that you care about, preferably a project that requires many “moving parts” such as interdependent entities (nodes that talk to and consume others), user extensibility, and large amounts of object reuse: a model of a mini-universe of sorts. The model doesn’t have to be physical: it can be of social relationships, knowledge, data. All it has to do is matter.

In the remainder of this post I will sketch the design of the project I am currently working.

Project design

The goal of the software is to generate 3D data visualizations from quantitative and qualitative data ingested from a CSV file. The display of the visualization should be separate from its construction, so that ultimately different display ‘engines’ can be swapped in and out to allow for the presentation of the visualization on, for example, the computer screen, a VR headset, a smartphone, or even in the form of a 3D-printed model.

As it stands, the engine has a structure as depicted below. Incidentally, there is a well-documented and ‘popular’ domain-specific language for the description of the relationships between objects, which can generate similar-looking diagrams out of code, called the Unified Modeling Language (UML), but to take it on requires its own post. So, the picture below is a rough approximation of the design of the engine rather than a reproducible blueprint (such as that provided by UML and the like).

There is exactly one Scene in the application, which contains a list of PrimitiveGroups, which themselves contain Primitives. A Primitive corresponds to a single data point: one row of the CSV input. A Primitive has a location in 3D space, as well as a velocity, which allows for the animated restructuring of the Primitives on the fly. Some simple primitives are included: a sphere (PrimitiveSphere), a cube (PrimitiveCube). New Primitives must subclass the Primitive class (which should never be instantiated: it is an abstract class). Primitives must have display() and update() methods. The display() method contains the calls to Processing’s draw functions (e.g. box()). At this point, you realize that Primitive should (and can) be implemented as a Java interface. After all, Processing.org is Java at base. The Scene also contains an Axis object which can be switched on or off.

How does the engine generate the Primitives per the contents of the data file? And according to what rules? In many ways, this is the heart of any visualization engine. The concept of a DataBinding is introduced.

A DataBinding realizes a one-way mapping from the columns of a data source (i.e. kinds of data) to the properties of a Primitive, by returning a PrimitiveGroup which contains one Primitive for every row in the data source (read by the DataHandler, which is a very thin wrap around Processing’s Table object).

The mapping is specified by the contents of a DataBindingSchema, which is a hashmap (read in from a YAML file, see examples 1 2) in which the keys are the properties of Primitives and the values are column names in the data source. As a consequence, the DataBindingSchema specifies how the visual properties of Primitives respond to the data stored in the CSV file that is being read in. The DataBinding also has a validation method which throws a custom exception when the DataBindingSchema refers to column names and/or primitive properties which do not exist. It will ultimately also do type-checking.

 

Perceptual Bases for Virtual Reality: Part 1, Audio

An important part of creating a truly immersive VR experience is the accurate representation of sounds in space to the user. If a sound source is in motion in virtual space, it stands to reason that we ought to hear the sound source moving.

One solution is to this problem is to use an array of loudspeakers arranged in space around the user. This technique – so-called ‘ambisonics’ – is not only expensive, but also requires space way in excess of the footprint of the average user seated at a consumer-grade computer. For example, Tod Machover’s (MIT) setup is shown below, and is typical of some ambisonic setups. The 5.1 standard for surround sound in home theatres (or related extensions, such as 7.1 – meaning 7 speakers plus a subwoofer) is consumer-grade technology which operates on a similar principle. Clever mixing and editing of movie soundtracks aims to trick the listener in to perceiving tighter sound-image associations by cueing sounds, the sources of which are apparent from the visual content of the media being displayed or projected, in a location in the sonic field corresponding to their virtual source.

Ambisonic sound set up with a circular array of Bowers and Wilkins loudspeakers surrounding a listenr

Tod Machover’s Ambisonic Setup (Source: http://blog.bowers-wilkins.com/sound-lab/tod-machovers-ambisonic-sound-system/)

It might seem counterintuitive, but most of the psycho-acoustical cues that humans use to localize sounds in space can be replicated using headphones. This follows from the unsubtle observation that we have only two ears, and the slightly more subtle reflection on the results of experiments designed to establish precisely which sources of information our brains depend on in determining the perceived location of a sound source. This behavior is known in the related psychological literature as acoustic (or sound) source localization.

Jobbing programmers, however, don’t have to wade through the reams of scientific research that substantiate the details of the various mechanisms of acoustic source localization, as well their limitations and contingencies. The 3D Audio Rendering and Evaluation Guidelines (Level 1) spec provides baseline requirements for a minimally convincing 3D audio rendering, and provides physiological and psychological justifications for these requirements. Whilst it is exceptionally outdated and outmoded, it still provides a useful overview of the important perceptual bases for VR audio simulation. In particular, this specification is one of the motivating documents in the design of the (erstwhile) open source OpenAL 3D audio API and its descendants. In the remainder of this post, I briefly describe the most important binaural (i.e. stereo) audio cues which are thought to facilitate acoustic source localization in the human brain.

Interaural Intensity Difference

In plain terms: the intensity of the sound entering your ears will be different for each ear, depending on the location of the sound with respect to your head. This is due to two factors:

  1. sound attenuates in intensity over time as it passes through a medium, your ears are a non-zero distance apart, and sound propagates at a finite speed
  2. (more significantly) your head may ‘shadow’ the source of the sound when the source is off-center

You might think that you don’t have a big head, but it’s big enough to make a difference!

Interaural Time Difference

Since sound has a relatively unchanging velocity as it passes through the most common media that we may wish to model virtually, the time that it takes for sound to propagate from the source to one ear differs, very slightly. Our mind is sensitive to these differences, perhaps owing to the evolutionary utility of knowing the location of noisy predators (or prey). Knowing that the speed of sound is roughly constant, the mind performs a rudimentary triangulation in order to locate the sound source in the relevant plane, relative to the listener.

Audio-Visual Synergy

Finally, a less physiological cue: the co-incidence of aural and visual stimuli tricks the brain into attributing the contemporaneous sound to the source denoted or signified by the visual stimulus. By keeping latency between aural and visual stimuli low, we improve the likelihood of the perception of audio-visual synergy. This, in combination with the careful modeling of the above phenomena (amongst many others), contributes towards a more immersive aural experience. In turn, this improves the credibility of VR simulations that have an aural component.

Introducing VR and the Processing programming language

My name is Eamonn Bell. I’m a third-year Ph.D. student (GSAS) in Music Theory at Columbia University, and I have a particular interest in the application of mathematics and computational methods in music research.

I’m delighted to be working with Jennifer Brown at the Digital Science Center, at the Science and Engineering Library for the 2015/16 academic year. This semester, I will be developing a virtual reality data visualization for the Center’s VR equipment based on the music listening habits of library users. This first blog post is about the use of the Processing programming language as a tool for the rapid prototyping of VR experiences. More information about the project can be found at my website, http://www.columbia.edu/~epb2125/#!listening.md.

Press coverage of the latest virtual reality (VR) technologies is becoming unavoidable. Oculus, bolstered by the PR talents of the infamous John Carmack (legendary programmer of 90s shooter Doom), is beginning production of their consumer head-mounted virtual reality solution.

In 2014, at an industry conference, Google introduced a low-cost and self-assembled VR headset which leverages the sensors and high-resolution displays of the now-ubiquitous smartphone, named after its construction medium: Cardboard. Only yesterday, the New York Times announced that it was partnering with Google to deliver one million Cardboard headsets to its subscribers in early November, with the intention of co-distributing immersive 3D video content to be consumed using the device (October 20, 2015).

In the space of just over a year, Cardboard will have made the jump from dorky tech conference swag to middlebrow info-/edutainment content delivery system.

In this abbreviated summary of VR tech, mention must also be made of Microsoft’s near-production HoloLens; the enigmatic, but transformative, Magic Leap, and this recent work from Disney Research.

In short, for better or worse, VR is hot right now. But the technical barrier to entry in terms of development is quite high.

So, what’s a graduate student with a basic grasp of a scripting language traditionally underutilized in graphics development (Python), a data visualization project, and an Oculus Rift DK2 at hand to do?

Fortunately, the Processing programming language provides a great platform to get started with programming in 3D. Created in 2001 by Casey Reas and Benjamin Fry, Processing provides a simplified Java-like interface to slew of static and animated 2D and 3D graphics functions in a graphical IDE.

One advantage of Processing its minimal syntax and built-in draw routines. Most Processing sketches contain two blocks of code, the contents of two static functions: setup(), which is called when the Processing “sketch” (project) is loaded for the first time; and draw(), which is called once per frame. Changes made to the canvas in the draw() function implement animation and/or interactivity. Another nice aspect of Processing is the recent development of a JavaScript interpreter for a restricted subset of the language, which allows interactive, dynamic sketches to be embedded and run in modern browsers without the need for a Java plugin. My example code listed here (http://www.openprocessing.org/sketch/224525) shows some of the basic drawing and interactivity functionality of Processing.

Processing has a large community of users and developers, and a number of introductory texts have been written which step the reader through the most common features of the language, sometimes with clear applications in mind. I can recommend Fry’s now-dated Visualizing Data (http://shop.oreilly.com/product/9780596514556.do), or the excellent The Nature of Code (http://natureofcode.com/), which focuses on physics simulation in Processing.

Processing also has extensive support for 3D rendering, which we I use to prototype the immersive data visualization that is the goal of this project. Since Processing is in effect a subset of the Java language, the Java standard libraries and third-party classes can be used in Processing sketches. This opens the door to a vast number of applications outside of simple drawing and visualization, even to VR applications. This Processing library (https://github.com/kougaku/OculusRiftP5) exposes the Oculus Rift in a very straightforward way that can be used to rapidly protoype VR experiences using Processing. The example code included runs well, and though it has not been updated recently, seems to be stable enough. The class implemented by this library will be the basis for my first experiments with the Oculus Rift. More to come!