Original English text of article that appeared in French as: Buxton, W. (2000). L'infographie en pleine évolution. Pixel, 53 Septembre/Octobre, 14-15.

The Changing Coiffure of CG

Bill Buxton
Chief Scientist
Alias|Wavefront, Inc.

Introduction

Imagine that you go to the barber and get a buzz cut. Now your hair is really short. Day-to-day there is no visible change, and yet, at a certain point, your hair is long again. You can braid it or tie it in a pony tail. But when did this happen? How did you miss the day when you hair switched from short to long, especially when you face yourself (and your hair) in the mirror every morning?

This is a lot like the computer graphics industry. We know that things are changing and that the changes occur day-to-day. Likewise, at a certain point, they have changed enough to change the nature of the beast. But unlike hair, missing when the change occurred might mean more than missing an early opportunity for a new hairstyle. It might mean losing the whole thing. (There must be an equivalent in graphics to baldness!)

So, by way of providing a kind of CG industrial strength Rogaine I thought that it might be worthwhile to articulate some of the significant trends that I see from the luxury of my position as a researcher in the industry.

The Rant Before the Storm To begin with, we should qualify where the change is, and is not, taking place.

Despite progress on some fronts, I perceive that the industry as caught up in the inertia of past practice. Stand back 2 metres and look at your machine and your way of working with it. How much change has there been in the past 10 years? Are not the icons, the windows, the mouse, the keyboard, and virtually every aspect of the user interface the same as the first Macintosh that you used, perhaps as early as 1983?

And if you look at how we think about the workflow in computer animation, for example, how much has that changed? Do we not still model, animate, light, composite and render in much the same way? How much has our basic thinking changed? Is not the basic 2D-3D workflow about the same? Do we do things the way that we do and think of them the way we do because it is the right way, or because that was the only way that we knew how to do them with computers in the '80s when the habits were established?

Machines are faster, cheaper, and there are more of them. There are also new features in the software. Nevertheless, my view is that we are not doing things the best way, we can improve on the status quo, and the resources to significantly improve things have been available for a while, and are ever improving.

What follows are a few thoughts on the nature of some of the emerging trends that will take us beyond predictable faster, smaller cheaper type change, and lead to a new generation of tools that truly improve the state of the art for computer graphics artists.

The Quick Overview Much of my thinking is guided by seven observations, or trends, that I see affecting the industry.

The rise of strong specialized systems and the demise of weak general purpose ones
The society of appliances
A movement from construction to assembly
A movement from "telling" to "showing"
Lessons from music: sampling as equal partner to synthesis
A redefinition the 2D/3D relationship
Moving up and downstream in the workflow

None of these are independent from the other, and certainly my list is not complete. But we have to start somewhere. For me, an important aspect of this starting point is to remain focussed on workflow and the user. For anyone who knows me, this should not be a surprise. The only consistent thing in my life is a dogged obsession with a human-centric, vs. techno-centric view of things. Technology is important in that it is the prosthesis which enables us to augment human skill, but is the human skill, not technology, which is the key.

Having finished that particular editorial, let's look at each of these in more detail.

From weak general to strong specialized systems What I find funny is that habits and ways of working that we would never accept in our day-to-day life are seemingly accepted without question when it comes to computers. Think about your kitchen, for example. You may have gotten a food processor such as a Cuisinart for a wedding present. Nevertheless, I suspect that you do not use it as the primary appliance in preparing your daily nourishment. Rather, it is far more likely that you use specialized tools such as spoons, blenders, knives, graters, etc.

The same could be said for those really fat Swiss Army knives that you see in stores and your old aunt may have given you for your birthday when you were 16. These have integrated tools for pulling corks, opening cans, and beer bottles, not to mention a screw driver, nail file, saw, and toothpick. Nevertheless, I suspect that in your normal life you do these things with much simpler specialized devices, each of which may well be kept in a different location.

I mention this, since the design of our computers has far more in common with food processors and Swiss Army knives, than with the knives, spoons, screw drivers, etc. that see us through the rest of our lives. Up to a point, having all of this capability integrated is fine, just like that Swiss Army knife might be great on a camping trip. But is there not a huge sense of déjà vu when you are on that trip, and try to find the wood saw on that 3 cm thick knife? Doesn't it seem like the last time you tried to find that fillet function that you know is buried somewhere in the modeling toolkit of your favorite software?

My view is that computers are going to go the way of the kitchen, and we are increasingly going to find ourselves using a suite of relatively simple, specialized, powerful tools, all of which work well together, but each of which knows what it is (and is not) for. Because of this, each can have embedded into it a far deeper knowledge of the task domain and how to apply it. Hence, it will be both simpler and more powerful that the solutions that are offered today.

If you need an example, just think of your digital camera. It is a specialized computer that has light for input (as opposed to a mouse) and pixels out. Since it knows that it is a camera, it can have all kinds of knowledge about exposure, aperture and focus built into it. It is useless for doing spreadsheets, but who cares? And it is so good at what it does that you call it a camera rather than a computer, despite it likely having more power than your first desktop machine.

Finally, it is usable, respects most of the skills that you learned using your old camera, and it is (hopefully) really easy to integrate the data (viz. Pictures) that it generates into other applications.

It is a law of evolution that specialized organisms win out over general purpose ones. The reason is that through specialization comes strength, while general organisms sacrifice strength for their breadth. The days of monolithic general purpose systems are numbered.

The society of appliances A key consequence of the movement to more specialized tools is a movement away from the general purpose workstation itself. We have already seen this in the example of the digital camera. This is an example of how we will stop thinking of applications as different software modules that run on a common platform, and begin to think about the application as a purpose-designed device which consists of software running on a specialized hardware.

Digital light meters are another example. In the not too distant future, these specialized computers will have GPS-like location sensing capability. By walking around, one will be able to construct a map of the light levels on the set. This data will then used to set the light values in the 3D CG software. Hence, by working together, the light meter will become a remote user interface to the renderer.

As this example illustrates, not only will there be a range of specialized devices, but the power and value of each will be augmented by its cooperation and interacting with the other devices on the job.

Movement from construction to assembly Whether you are an industrial designer or an animator, the tradition in 3D graphics is to construct your design, model or animation from scratch. The tradition is characterized by a nearly ubiquitous workflow of defining points, from which we construct lines, from which we construct patches, from which we construct surfaces, from which we construct objects, from which we construct characters, from which we construct scenes, etc.

Like traditional drawing or watercolor, computer tools tend to have a bias which assumes that you start each project with a blank sheet of paper. This is going to change.

If you are an automobile designer, for example, the point of departure could likely better be what is called a speedform, a well constructed piece of geometry that is the canonical form of a car, for example. Hence, in this case, conceptual modeling is more an exercise is transforming than constructing. Both are creative, but the former lets one begin from a much higher level point of departure.

Likewise, an animation, there seems no need to create all of ones characters from scratch. Many of you will have seen Chris Landreth's stunning animation, Bingo. One of the most compelling things about this piece for me was the richness, depth and diversity of the different characters, especially the clown, the "balloon girl" and the poor protagonist, "he who was not Bingo." What was hopefully not obvious, yet extremely significant, was that all three of these characters were the same model, just transformed in different ways!

Companies such have Viewpoint, have pioneered the provision of ready-made models to designers and animators. So there is already the core of a tradition to what I am talking about. But what I am suggesting is that what we have seen is only the tip of the iceberg.

The change will really happen when three things occur:

Higly parametricized models. Models will have versatility beyond that demonstrated by Landreth's characters. They will go way beyond mere geometry, and incorporate behavior, expression, and most significantly, high-level parameterization that enables them to be customized along dimensions that correspond to the animator's or designer's way of thinking about their "role."
Tools are designed such that they are more oriented for the transformation and, blocking of predefined characters and designs, than constructing them from scratch.
In character animation, for example, the skills shift to being more akin to casting, makeup, and directing, rather than doing the equivalent of stitching together the DNA of the models and building them gene-by-gene, cell-by-cell.

My examples fit into instances of specialized tools as discussed in the preceding section. But be clear that the idea is more general. The computer science equivalents are "component based architectures", "object oriented programming", "cloning rather than programming." Regardless of what you call it, across the board we are going to see far more assemblage and modification of components and presets than we see today, and this is going to have a marked impact on the tools as well as usage. From "tell" to "show" Having started with the hair metaphor, indulge me and consider the following. Assume that you walk in to a friend's apartment and they are so taken by your great coiffure that they beg to have you enlighten them how they too could look so good. Now I will give you two choices as to how to oblige them. You chose which you think would be most effective:

Show them: take out a comb, and redo your hair so that they can see how you did it.
Tell them: write down on a piece of paper the instructions for how to comb their hair to look like yours.

My guess is that almost everyone would chose the former rather than the latter. If I am right, then what I want you to do is this: consider how this contrasts with how you communicate most things to your computer.

Regardless of if you are programming, modeling or doing animation, the main way that we communicate with computers is by telling them what to do. The ubiquitous graphical user interface (GUI) has taken us some distance in letting us show things, but things are still very primitive.

Consider the example of animating the wag of a dog's tail. You could tell the computer how to perform the animation by drawing an animation curve using the mouse. Alternatively, you could grab an IK handle on the tip of the tail and set the pose for each of a number of key frames. But notice that here you are really specifying position of the tail, and only indirectly controlling its motion.

To show the computer how you want the animation to behave, the all-to rarely used direct approach would be to manually "grab" the tail with the mouse and wag it the way that you want it to move, and have that recorded. What you have here could best be described as "desk-top motion capture" or "desk-top performance animation.

Consider the following question:

When computational power is sufficient to enable us to manipulate complex shaded models in real time, what is the difference between manipulation and animation? The answer is pretty simple: the only difference is whether you have the record pedal down or not.

Faster systems need not just imply doing the same old things more quickly. They can (yet still too rarely do) enable fundamentally different ways of working at the desktop and elsewhere.

Lessons from music: sampling as equal partner to synthesis One of the most reliable techniques that I have found to predict trends in the computer graphics industry is to simply look at what has already happened in computer music, which is where I began my career. Because music is computationally less complex than graphics, many things happen there first.

In musical terms, you can think of most computer graphics companies as synthesizer companies since they provide tools to synthesize images using geometry. We are now at a point similar to where the music industry was in the mid 1980's when inexpensive samplers started to emerge, and forever change music technology.

With these new devices one no longer had to make sounds from scratch. Rather, you just digitally recorded sounds from the physical world and transformed them to suit your purposes. That is exactly what we are starting to see with digital imaging.

Like music today, computer graphics will transform over the next 2-3 years such that the resource materials used will be about an equal mix between those which are synthesized and those which are sampled. Furthermore, unlike today, the graphics tools used will provide an integrated way to work with both classes of material.

Sampled materials are rich an varied, and include:

Textures: such as captured using 2D scanners and digital cameras
Objects: such as captured using 3D scanners
Animation Curves: such as captured using motion capture techniques
Camera Location: such as with match-move software
Sets and Locations: such as captured with LIDAR scanners and photogrammetry
Lighting: such as extracted using photogrammetry.

The growing in importance of Image Based Rendering is a strong indicator that this move to integrating sampled data into the workflow is well under way. What is missing still is an environment that lets us smoothly integrate these resources with the more traditional synthetic ones, but this will change soon. And with that change will come the freedom for artists to more conveniently use the most appropriate materials, rather than restrict themselves to those supported by the tools. Redefinition the 2D/3D relationship One of the areas where we are most rooted in the old way of doing things is in how we integrate data from a 3D CG package with 2D film and video images. This is the familiar process of compositing.

I would argue that we have now taken compositing to the level of the absurd, and we must change how we do things simply to save ourselves from being driven to insanity. The industry has gotten to the point where we consider shots with 80-150 levels of compositing as not out of the ordinary.

But what is causing us to go to such extremes? My view is that it is the result of us trying to do 3D effects using 2D technology. The way to return to sanity is simply to do 3D in 3D. The good news is that some of the sampling technologies discussed earlier make this now possible.

To begin with we now have a number of products, like MayaLive, RealViz, and 3D Equalizer, that let us sample the camera position in a live action shot and then use the data to drive a CG camera. Hence we can at least have a hybrid CG/live action matched move shot.

That is a start, but the live action image is still 2D, so there will not be proper interaction (such as obscuring or shadows) between the live action and CG elements. But what if we had a Z-buffer for each frame of the live action footage? Then, the interactions between the CG and live action could be determined in the 3D domain.

There are a few emerging technologies that will increasingly provide us with this depth information. These are what I previously referred to as set and locations samplers. These include:

LIDAR scanners: such as being used by Alan Lasky and his team at Panavision. These are essentially laser scanners that can scan up to 360 degrees around them up to an effective distance of about 70 metres. The technology has now been used on a number of movie productions to gather the geometry of the location where filming was taking place. Consider the technique as a form of 3D location photography that is that can then be registered with the live action footage in post production.
Photogrammetry: this is a form of computer vision. What it does is compare views of the same location from different viewpoints. By knowing the relative position of each viewpoint, the technology can use differences in the views to infer the geometry of the scene. This data can come from separate still photographs or, if the film camera is moving, sometimes from the live action footage itself. An example of the former is Modelmaker from RealViz. An example of the latter is available from Synapix.
Z-Cam: this is a technology available from a company called 3DV Systems. It is an adapter to a professional video camera that captures a second, registered gray-scale video image where the level of gray indicated the distance of that pixel from the lens. Thus, the Z-Cam provides, in real time a pixel-by-pixel depth value for the video image. This enables a range of real-time effects, many of which previously required blue-screen, but now can be done by yon clipping, using the depth map.

These technologies are just the beginning, but they give a good indication of what is to come. As a result, the days of mindless rotoscoping and absurd numbers of layers in compositing will soon be over, and digital film making will reflect the fact that the world is not flat. Moving up and downstream in the workflow Behind everything that I have been saying is an emphasis on usage and workflow rather than technology, per se. In human terms, my characterization of what the major changes are going to be has to do with who is doing what,where and when. That is, it is the changes are mainly sociological and contextual.

In film making, for example, what this means is that much of the work will move out of post production houses where it currently resides and be done on set. The "workstation" (actually, to be consistent with our society of appliances, I should say "one of our workstations") will be our movie camera itself. Furthermore, the user will be the cinemaphotographer, not one of the VFX team, and due to the computational power of the digital camera, we will have a return to "in camera" effects. In effect, at least a previsualization resolution, we will get to the point of "what you see is what you get" (WYSIWYG) cinemaphtography. As a consequence, creative decision making will be largely returned to the director and the director of photography, and will take place on location when there is still a chance to correct mistakes.

Likewise, in computer aided industrial design (CAID) and computer aided design (CAD), we will see computer graphics being used further and further downstream. Up to the present, CG has been used mainly to create digital assets, such as 3D models. The new generation of digital tools will address the questions of "Why did you create that model?", "Who needs to see it?", "What are they going to do with it?" and "Where is all of this best done?". Hence, this will result in a broad new set of tools for visualization, supporting design reviews, and product marketing and retailing.

Fundamental in all such cases is the notion of providing the right solution tailored for the right person in the right form at the right place at the right time, which finally recognizes that the activities that we are trying to support have different people, locations, and goals associated with them.

Conclusions It is always a risky thing to predict the future. While I do not mind taking risk from time to time, writing this article is not one of them. The reason is perhaps best explained by a quote attributed to William Gibson:
The future is already here. It is just not uniformly distributed.

The reason that I feel so confident with what I am telling is that, for the most part, this is a future in which I live. At a primitive level, but live it just the same. The best thing about it is that it is a future where the artist, not the technology, reigns supreme. And it is a future that we, as a company that builds tools for artists, are proud to help bring to fruition. Finally it is a future that takes us from tools to the art that is made with them. Now that is something worth working for.