Cinematic Metadata
I had a recent opportunity to spend a long lunch with seminal New Media theorist and practitioner
Lev Manovich discussing all manner of topics and ideas related to New Media aesthetics.
In particular Lev prompted me towards a notion of thinking of cinema in broader IT terms - of visual information as 'data-sets'. This idea really set my intellectual hamster among the pigeons and the result that sprang forth was a new frame work to re-consider the taxonomy by which we discuss cinema aesthetics.
"There is a misconception we are surrendering something of art to a technology that will do it for us. That is never the case, Cinema IS technology."Francis Ford Coppola
Whilst this statement has irrefutably always been true the digital age has made the distinction between Cinema Technology and broader categories of Information and Communication Technology (ICT) distinctly vague and indeed arguably irrelevant.
Simply put, where once the tools and technology of cinema production were unique and specialized they are now largely one and the same with those of a broad category of ICT. Essentially the same technology tools and concepts used to create and distribute cinematic media are the same tools and concepts used to build databases, control systems and all manner of other products that have nothing to do with either cinema or art.
Any Editor worth their salt in the 21st century is as well versed in RAID storage, Local Area Networks (LAN) and File Transfer Protocol (FTP) as they are with EDL's, Timecode and Video Formats.

With this in mind it stands to reason that the language by which we discuss, analyze and understand cinematic process has drawn ever closer, whether we want it to or not, to the common taxonomy and discourse of IT.
Whilst many who identify as Filmmakers or Artists will run away from this idea screaming in terror, I find I am drawn to the possibilities for re-defining what cinema is by shifting the language used to discuss and understand it.
Over over 100 years of cinema the core idioms by which we discuss and examine it as an artform have been derived from Painterly and Theatrical constructs - the Frame and the Stage - referred to as the Mise en scene, meaning literally to Put on the Stage or Put in the Scene.

But in the digital age of ever diversified and hybridized cinematic form, and deeper technological base, these aesthetic yardsticks of the Frame and the Stage simply cannot account for the myriad of new creative aesthetic variables and perspectives.
What we need is a new metaphor... We need a new language framework to serve new cinema forms based on an ever deepening ICT technical base of creative process. What's needed is a metaphor of conceptual understanding that can flexibly be applied as a functional tool of analysis; not just to cinema as it is now but also for whatever it may become in the future. To do that, such a metaphor must be beyond the staid paradigms of existing theatre and painting arts else it become adherent to their largely fixed and outmoded parameters.
Just such a linguistic metaphor solution may lie in the technical structure of digital media files themselves.
From a media technology perspective a Digital Media file, such as the open exchange format MXF, is made up of three parts:
- Essence, the central stream of raw visual/aural data.
- Format, the wrapper around the Essence that dictates how the Essence is moved transmitted and delivered.
- Metadata, the information parameters that dictate how the essence is interpreted. A data repository of detail that governs the articulation of the Essence.
From this we might view cinema aesthetics in the same light and following a parallel metaphor. If we take a cinematic work, a movie (be it a shot, scene or entire film) we can break it similarly into three distinct parts:
- Essence, the raw indexical audio/visual content. The 'things' and 'events' in the scene
- Format, the delivery medium that conveys the work (tv, theatrical, online, game, mobile) as well as the media form from which the work is shaped (video, animation, 3D, interactive, linear and so on)
- Metadata, the creative articulation of the Essence in context and concert of the Format.
So if we take for example the simple scene below we might break it down like this:



- the Essence is the pure indexical image content - a Corridor and a Running Girl; object, verb and noun.
- the Format, in this case, is Online but might easily be any number of delivery mediums each one having an impact on form, experience and mode of the work.
- the Metadata, the information that articulates the Essence; eg mid-shot, progressive montage sequence, dark lighting, dollyback, brooding sound with reverb etc. The Metadata is all the information that imbibes the Essence with meaning, metaphor and engagement.
In the 1920's Russian film theorist Lev Kulesov engaged what was to become the seminal experiment in cinema history - the Kulesov experiment clearly illustrated the montage construction of context purely through sequence. The idea being that the Image unto itself is without concerted meaning and it is through sequence and arrangement that context is constructed in the mind of the viewer.

This idea can be seen clearly in the Metaphor of Essence and Metadata. The raw content is of itself largely lacking meaning and context. It is the Metadata of composition, lighting, sound, camera movement, performance that crafts cinematic meaning.
Effectively this 3-part conceptualization of Essence, Format and Metadata serves to re-pose traditional ideas of Mise en scene into a coherent triumvirate of:
Content (the Thing)
Delivery (how the Thing is conveyed)
and
Context (how the Thing is interpreted and contextualized).
Thus far this framework simply serves the same purpose and much the same conclusions as Mise en scene. However, when we add virtual cameras, 3D, layered space, compositing, interactivity, non-linearity, machinima and motion graphics - just some of the plethora of new media influences on the broad domain of cinema - the Essence/Format/Metadata framework metaphor takes on new significance and functionality as a tool of analysis. One that is able to accommodate and account for everything cinema is and could become without being trapped in the limited painterly and theatrical paradigms of what cinema has been.
Lets take for example a narrative driven video game such as Bioshock. A grand and gloriously realized experience, Bioshock draws its narrative and stylistic building blocks from Art-Deco and Nouveau, the nihilistic Utopian writings of Ann-Rand and the rampant historical ramifications of industrialization in the first half of the 20th century. Amid this landscape Bioshock weaves a story invested with politics, greed and meglomanic madness - Rapturess Freedom gone wrong.

If viewed through the prism of the Mise en scene we see distinct auteur choices in a the muted neon colour palette, the processed sound of analogue audio technology, the filtering of vision and perspective to drive backstory. All these elements certainly a part of the Mise en scene but of themselves they cannot account holistically for the cinematic engagement of Bioshock. By its reliance on the Frame and the Stage mise en scene fails to be able to account for the Player perspective, the fact that this perspective is infinitely variable and dynamic, nor for the compositional process involved which, by virtue of free-form interactivity, is not built on fixed framic sensibilities at all.
But, Cinema Metadata can account for these elements and variables by defining Essence, Format and Metadata for Bioshock as a moving image artwork.
Take this part of the Bioshock opening sequence :

The 'Essence' of this scene is an Undersea City of spires and towers. A slow moving Whale. A series of Glass Tube structures connecting buildings. All these idexically represented in the scene as scenic contents.
The 'Format' has two elements - the format of production and the format of delivery. The former is that of 3D graphics and 3D modeling. The levels of the game, the rooms corridors and spaces are designed and build as virtual 3D architectural structures. The avatars are likewise built of polygon vector-based graphics and photo-textures. The Format of delivery is as a computer game, viewed via a screen and manipulated by keyboard or game controller.
Subsequently the 'Metadata' is that which shapes and articulates the Essence built from the Format. In the case of this scene from Bioshock the core Metadata is obviously the First-Person perspective of viewership. From that we have various arrangement and compositional details and auteur choices such as the framed view from inside the bathysphere that vignettes the scene. The determined forward progressive movement along a fixed path of the players view. The use of dynamic lighting filtered through the undersea water beyond the glass. The musical score and sound design drawing upon the period but also growing darker as the bathysphere approaches its destination introducing cries and directions from unseen people in panic and action. All this is the Metadata that creates context for the Essence content which is created and delivered via the Format.
Both Essence and Format contribute the raw ingredients for the cinematic cake that is then mixed and baked by the selected Metadata. The key element to this construct is the role of the metadata in processing the Essence and Format. Without the Metadata the work is just content and a designated technology. It's the Metadata that creates the Cinema. But likewise the Metadata is variable and the same Essence and Format but with new or different Metadata would produce an entirely alternative Cinema work.
We can see this idea in purely practical terms as common elements of contemporary cinematic production. Digital Intermediate Colour Grading for example and RAW format cinematography makes it increasingly common for the image to be shot as a blank set of data with the traditionally camera-based mechanical elements of Exposure, Aperture, White Balance decided after the fact in post-production. This is directly a process of acquiring an Essence and then using infinitely variable Metadata to manipulate that Essence into meaning and context.
Numerous other technical processes of cinema employ this same fundamental Metadata idea - a 3D animated film will design complete 3D environments and then occupy those spaces with virtual cameras. The Essence is the indexical environment of content. The Metadata is the placement of the virtual cameras into that environment; a process that, unlike live-action cinema, leaves the manipulation of that Metadata open right through the process. This too can be applied to live-action where decisions about perspective, angle, placement and camera movement can be decided and enacted as Metadata manipulations after the fact of acquiring the raw image Essence.
Whenever there is a change in technology of creative process there are unavoidable ramifications for visual aesthetics. We see this throughout the history of cinema. When celluloid film companies such as Kodak-Eastman introduced much faster film stock in the very early 1940's cinematographers immediately found themselves with new options, namely the ability to shoot with the same exposure from a smaller iris aperture. The result was deep focus cinema where the small iris allows for objects in both the near and far distance to remain in sharp focus; a dep depth of field. This simple technical advancement resulted in a new addition to the cinematic language.
Likewise as purely digital technologies deliver unprecedented elements to cinema - composited layers, virtual cameras and 3D environments - there is inevitable impact upon cinematic language and aesthetics. It is to this ever widening mediascape that we must apply new theoretical frameworks that can account for and articulate the meanings carved from the cinematic medium.
What Mise en scene also fails to account for in its restricted paradigm is Format of Delivery. In traditional cinema, drawn from the standard of theatrical release, this simply wasn't an issue. Delivery was predictable and singular, presented in a unified way within a defined and dedicated space; a space that for all intents and purposes was consistent from venue to venue. From a filmmaker's perspective the process of Composing did not need to consider or take into account the means and mode of Delivery as having dynamic and variable impact upon form and visual/aural aesthetics.
In the 21st century this is certainly not the case. Cinema has evolved rapidly in the later half of its short history to become a multi platform scalable environment where a singular work can be processed to be delivered in any number of ways. Moreover each of these delivery modes (from the mobile phone screen to the stadium, from gaming to DVD) all carry unique visual and aural aesthetics that tangibly impact upon the experience. Mise en scene fails because it has no way to account for these variables.
Cinema Metadata by contrast is a much more holistic framework that incorporates all the elements of mise en scene but applies them in a more functional analytical tool that accounts for acquisition, delivery and production process which all have tangible, direct and profound impact in the cinema aesthetic.
For too long both filmmakers and theorists have baulked at the notion of mixing cinematic language and concept with generic ICT discourse and terminology. The division at a technical level has quickly dissipated and now its perhaps time to dispense with the division at a theoretical and analytical level as well.
Posted at 12:00AM Jan 03, 2008
by Mike Jones in moving image theory |
Men's 6 Inch Boots the more intense and less regular relationship in the community.Men's Roll-Top Boots Although the world is full of suffering,Men's Custom Boots it is full also of the overcoming of it.
Posted by runtimberland.com on July 04, 2009 at 12:17 AM EST #