Abstract
Photographs are flat objects, but we have no difficulty perceiving spatial depth in them. What makes this possible are embedded perspectival cues in the form of changes in color and brightness. There is a rich history of thought about perspective in the context of painting and drawing, as well as, more recently, computer graphics and machine vision. The relevant photographic thinking, by contrast, is often confused and ill-informed. Revisiting the Renaissance discovery of perspective nearly six hundred years ago may help improve the situation. A simple idea from Leon Battista Alberti's essay on painting from 1435–36 can be developed into an intuitive understanding of how photographs depict space and of the effects of different lenses and camera angles. The idea also serves to frame the problem of tonal reproduction, i.e. how to translate the rich colors and tones of the world into the restricted palette of a print or a computer screen. Digital image processing offers powerful tools to shape this translation, with important consequences for perceived depth.
Prologue: Parallax
How is it that our surroundings look 3-dimensional to us? Much of it comes down to binocular depth perception or stereopsis. Hold your head still, close one eye, and much of your visual sense of space is lost. Two eyes provide two vantage points from which the world looks ever so slightly different. The displacement of features from one retinal image to the other is called parallax. Your visual system can measure the parallax and reconstruct spatial relationships from it, in effect triangulating the lay of the land in front of you.
Hold out a hand and look at it first with one eye closed, then with the other eye closed. You see it shift sideways relative to the background. The direction and size of the shift enables your brain to compute a spatial representation of the scene before your eyes.
Parallax makes stereo photography possible. Two photographs, taken from two sideways displaced vantage points, are served up separately to each eye in order to create the impression of a world seen through eyes placed at these vantage points. The stereo pair below is prepared for cross-eyed viewing. You look at it with your eyes crossed so that the left eye looks at the right photograph and the right eye looks at the left photograph. Your eyes are toed in correctly when you see three images float side by side. The one in the middle will gradually gain depth, at which point you can begin to relax and scan it. It may help to hold up a finger in such a way that, when seen with the right eye, it points at the left photo, and when seen with the left eye, it points at the right photo. Focus on the finger, then let you gaze drift to the image floating behind it.
In stereopsis, we experience parallax simultaneously. Our visual system is also able, to some degree, to process parallax information that is presented sequentially. The impression of space is less palpable than with simultaneous parallax, but it is there. You can verify this by moving your head side-to-side with one eye closed. The movie industry understood this early on. It took the camera off the tripod and put it on moving dollies, booms, cars, helicopters, drones, etc. The spatial impression from these devices can be very strong, so strong, in fact, that true 3D-photography doesn’t add enough extra pop to justify the expense. This, I suspect, is the main reason why 3D-films never really caught on.
A single photograph contains no parallax information. So it should be a poor device for conveying space. But this is not the case. Even a flat photograph can convey a sense of depth, albeit less vivid than a stereo pair. How is this possible? The answer lies in a whole range of cues: geometry, shading, occlusion, texture gradients, depth-of-field, fading colors, etc. Many, perhaps all, of these cues can be gathered under the rubric of perspective.
Perspective
Perspective was (re)discovered in the early 1400s by Filippo Brunelleschi in Florence and codified into a set of drawing techniques by Leon Battista Alberti. Everything you need to know about photographic perspective flows from Alberti’s insight that realistic depiction of space is like painting on a window, and that viewing the result is like looking through that window from the painter's point of view:
A “truthful” picture (Alberti’s term) is like a window on the world, and such a picture can be made by tracing the view of the world as if on a windowpane.
It’s there already in the name: perspicere, to look through, from per- ‘through’ + specere ‘to look’. Alberti’s window is not just a clever metaphor. It is a precise model of how to construct 2D-pictures that, under the right viewing conditions, are indistinguishable from the 3D-configurations they represent. Geometrically speaking, Alberti’s eye-window-world arrangement is a linear projection in which the window defines an area on the flat projection or image plane and the eye of the painter or viewer is the so-called “central” or projection point.
In recent years, “central point” perspective has come to epitomize Western “hegemonic” business, for reasons that I cannot fathom. The “central” point here isn't even central, unlike in photography, where it lies between the object and the image plane. To avoid distraction, I will mostly use the less charged expression “projection point” in what follows.
In Alberti’s setup, the image of a point O on the far side of the image plane is the intersection of the image plane with the sightline that connects O to the projection point, where the painter's eye is located. The image of an object is the set of images of the points that constitute the object and are visible from the projection point. (This needs a little refining to deal with partially transparent objects, but it will do for now.) Because object points and corresponding image points lie on the same sightlines when the image is viewed with the eye placed at the projection point, the image will be perfectly aligned with the object world, hence “truthful”. This holds whether or not the projection plane is flat.
If a camera is positioned with its lens at the projection point and its sensor parallel to Alberti’s window, it will produce a photograph that is similar to the image on the window in the geometric sense that the shapes in the two images are the same, though other properties like color or scale may be different. If you want to verify this, think about similar triangles spanned by the sightlines. (Talk about lens location is a bit loose. What counts is the location of the center of the entrance pupil of the lens; ditto for the painter's and viewer's eyes.)
The similarity between photograph and painting means that Alberti’s perspectival projection shares all its geometric properties with photography. The next few paragraphs will lay out some basic facts about the projection of straight lines and planes. If you don't like geometry, just skim over the details.
Straight lines are imaged as straight lines except when they go through the projection point, in which case they are imaged as points. This can be gleaned from the diagram below. The straight line L, together with the projection point P not on L, defines a flat plane, G (green). The (dotted) sightlines from P to any point on L all lie in G. Therefore L’s image, consisting of the intersection points of the sightlines with the image plane, also lies in G, namely at the place where G and the image plane intersect. And the place where two (flat) planes intersect is always a straight line. In the special case where L goes through the projection point, the image is the single point where L intersects the image plane.
As a straight line recedes from the image plane, its image approaches a “vanishing” point beyond which it does not extend. The image lies between the vanishing point and the line’s intersection with the image plane. In the diagram, L recedes from the image plane towards the right. We draw a line parallel to L through P, which lies in plane G and intersects the image plane at some point V. The image L' of L lies between V and Q, the point where L intersects the image plane. The bounds exist because sightlines must first pass through the image plane before they meet L, and no sightline passing through the image plane to the right of V or to the left of Q meets L behind the image plane. If L is infinitely long in both directions, then its image is the entire line VQ; if L is a finite line segment behind the image plane, then its image is a segment of VQ. If L goes through the projection point, V and Q coincide and the line VQ is reduced to a point.
Vanishing points become interesting when more than one line is involved. The image of a family of straight lines that are parallel to each other but not parallel to the image plane is a family of straight lines that converge on the same vanishing point. This is because the property of L that determines the vanishing point is L’s direction, not its location in space: we found the vanishing point by drawing a line through P in the same direction as L. Any line L* parallel to L has the same direction and therefore the same vanishing point. You may think of generating the image of L* by rotating the green plane around the line PV until it meets L*.
The floor corrugations in this photograph converge on a vanishing point on the horizon, but columns and girders do not converge. They represent the special case of families of straight lines that are parallel, not only to each other, but also to the image plane. Such families are imaged as families of parallel straight lines. Pick a line L parallel to the image plane and again draw a line A parallel to L through the projection point P. A does not intersect the image plane, so there is no vanishing point. But the plane AL, defined by A and L, does intersect the image plane, and the image L' of L will lie on this intersection. (In the limiting case where AL is parallel to the image plane, the distance between L and A is infinite and there is no image.) L' and A must be parallel. If they weren’t, then A would intersect the image plane, and so would L, in virtue of being parallel to A, but this contradicts the assumption that L is parallel to the image plane. For any other line L* parallel to L, the argument is exactly the same: its image L*' is parallel to A, and thus all images are parallel to each other.
To conclude this excursion into geometry, I want to mention without proof a couple of important facts about planes that echo the facts about lines that we just established. First, the vanishing points for coplanar lines (lines that lie in the same plane in the object space) are located on the same image line. This image line is called the vanishing line for the plane in question. The most prominent example is the horizon line. The horizon line in an image is the locus of the vanishing points for all lines in the ground plane that aren't parallel to the image plane and thus have vanishing points. (The ground plane is the plane on which the painter stands.) The horizon line is the vanishing line for the ground plane. In fact, the horizon line is also the vanishing line for all planes parallel to the ground plane. More generally, for any family of parallel planes that are not parallel to the image plane, there is a common vanishing line whose location in the image is determined by one distinguished member of the family, namely the plane that contains the projection point. The vanishing line for the entire family is the intersection of this distinguished plane with the image plane, just as the vanishing point for a family of parallel lines was the intersection with the image plane of a distinguished member of that family, namely the line that runs through the projection point.
One often encounters talk about one-, two-, and three-point perspective, as if there were significant distinctions to be made between kinds of central point perspective based on the number of vanishing points. (What is bandied about as four-, five-, or six-point perspective are altogether different kinds of projection from Alberti's.) This tripartite classification is really a classification of drawing techniques for certain simple spatial arrangements, not a typology of perspective. As a typology of perspective, it would be woefully incomplete because there is no upper limit on the number of vanishing points. There are as many vanishing points in the image plane as there are families of parallel lines in the object space that are not parallel to each other or to the image plane. The following photograph has no fewer than seven vanishing points.
If we placed a cube on the floor of the building in such a way that its sides aren’t aligned with the walls, we would bring the number of vanishing points to nine. And so on. The number of vanishing points tells us something about the depicted scene, not about the manner of depiction.
Alberti and others devised techniques for producing perspectival images that combine geometric construction with physical machinery. Some of it gets rather involved. One may wonder if the engraving on the right from Albrecht Dürer's Unterweysung der Messung (Measurement Tutorial) from 1525 was meant to illustrate a practical method or to embellish the logic of projection.
You’ll notice that the subject in Dürer’s engraving is treated as if it were made entirely of outlines: a wireframe model whose image is a set of points connected by lines. But of course the visual world is not a wireframe model, and paintings aren't line drawings. What is missing is color and tone.
Our model of projection is an idealization with distinct theoretical advantages, but here we confront one of its shortcomings. The model deals with points and lines in space and their images – points and lines again – in the projection plane. But points and lines lack extension and therefore color. What we see and paint and photograph are not mathematical points but tiny yet still extended colored flecks. To conceptualize this, we need to replace sightlines with steep visual pyramids. These pyramids, with their apexes at the projection point, intersect the image plane in square pixels and the visible surfaces of objects in messy cross-sections.
Working this out in detail is tedious and doesn’t matter here. What matters is that a conceptual shift from points to pixels is a natural way to incorporate tonal reproduction into the treatment of perspective. Tonal reproduction is the translation of the subject's colors into the colors of the painting (or representation, more generally). If a painting is supposed to be like a window onto the world, then it needs to get not only the world's geometry right but also the world's colors and tones—and not as they are (whatever that means, exactly), but as they appear from the painter's point of view.
There is dramatic variation in the concrete floor in this photograph, from dark brown to almost white. The real floor is greyish brown, but how it looks depends very much on the point of view. It acts as a dull mirror, and mirrors show different things from different angles. For example, the highlights move across the floor as you wander around the space (looking at the real floor, not at the photograph). All surfaces are like that, to a greater or lesser extent. Therefore color and tone are a matter of the projection point and thus a matter of perspective. So-called “aerial perspective” – a brightening of colors and a shift towards blue with increasing distance, caused by atmospheric light scattering – becomes part of perspective proper. I will have to say more about tonal reproduction towards the end of this piece. (Homework assignment: generalize to reproduction with a restricted color palette, with black-and-white as the limiting case.)
Let's return to the techniques and devices proposed for constructing perspectives. In Dürer’s engraving, we got a glimpse of some of the complications involved. There is an elegant shortcut that avoids all the complications, at least in priciple: the camera obscura (literally, dark chamber, which, by the way, goes back at least to the 11th century Egyptian scholar Alhazen).
A small hole projects an upside-down image on the back of the dark chamber which is then traced by the painter.
The camera obscura offers an elegant solution to the problem of perspectival representation—at least in principle. In practice, things are less elegant. The camera is called “obscura” for a reason: inside of it, it is very dark, so dark, in fact, that tracing outlines is a challenge, and getting colors right is nigh impossible. You’d have to mix your pigments in the dark and paint each shade of color under very dim light of exactly that same color. This really messes with your judgment. Much better to evict the painter from the camera and instead place light-sensitive material at the back. This move of course had to wait for the invention of the right chemistry in the 19th century.
The image recorded at the back of a photographic camera is an upside-down version of Alberti’s window. In order to see it properly, we need to turn it right-side up, switch it from negative to positive (unless we use slide or Polaroid film), and place our eye in the correct viewing position, like so:You may think of what is going on here as running the camera in reverse, turning it into a projector. We project the image from the back of the camera through the hole or lens outwards onto a screen. When one looks at that projected image from the point where the light enters the camera, which is the central point of the projection, then the image is exactly superimposed onto the world: the semblance of truth is as good as it gets. (You will have to be clever to avoid blocking the projection with your head when you peek through the hole.)
I have glossed over the role of the photographic lens in all this. This is because the lens has no bearing whatsoever on perspective. All it does is project a brighter and sharper image than the image projected by a little hole. Some lenses project larger image circles that include more subject matter, others project smaller image circles that include less subject matter. But their images all match up perfectly when scaled to the same magnification (provided they are focused on the same object plane and exposed through the same size lens opening—not f-stop).For the painter inside the camera obscura, a lens trades one problem—darkness—for another: focus. In order to let in the maximum amount of light, the lens opening has to be quite large. This means that only a thin object plane will be in focus at any given time because depth of field is inversely related to aperture. To paint the out-of-focus parts of the scene, the painter has to refocus the camera. This can be done by moving the back of the camera, which changes the magnification of the image. Or it can be done by moving the lens, which changes the projection point and hence the perspective. A headache either way, and something to ponder in the debate about whether painters like Vermeer actually used lenses in this way.
More on the correct viewpoint
Alberti’s idea implies that, in order for a perspectival picture to look right, the viewer’s eye must be in the same place relative to the picture as the camera lens was relative to the film or the digital sensor. He writes: “Know that a painted thing can never appear truthful where there is not a definite distance for seeing it” (De Pictura, Book 1).
When you take a photo or draw a perspective, you know where the correct viewpoint is because you got to choose it. But what if you are handed a picture for which you don’t know the viewpoint?
One option is to go to the original site and superimpose the picture on it until it blends in seamlessly. There is a whole cottage industry devoted to this.
If you are sufficiently obsessive, you can fine-tune the alignment and mix older and newer photography:
Without the benefit of the original site behind the picture, we can often still determine the right viewpoint experimentally by closing one eye and moving our head around in front of the picture until it looks spatially most compelling. Closing one eye is essential. With both eyes open, there is a conflict between stereopsis telling us that we are looking at a flat surface and perspective cues suggesting depth. Closing one eye turns off the conflicting stereoptic information.
I highly recommend this exercise. You can perform it with images on a reasonably sized computer screen or with prints in books or magazines. But the real fun is with large pictures in a gallery or a museum. It doesn't work with your phone unless you are extremely near-sighted because you cannot see the screen in focus from up close enough.
In some cases, the viewpoint can be constructed geometrically. The construction is always based on certain plausible but unproven assumptions. This is different from stereopsis where depth information can be extracted from the visual input without any further assumptions. (Up to a point. When you move your head around in front of the stereogram in the prologue, the bridge will appear to follow you. The depth information remains ambiguous without fixed eye positions.)
What kinds of assumptions have to be made? The simplest one is that certain lines in the image that converge on a point represent parallel lines in the world, i.e. that the point is their common vanishing point. Another assumption is that certain angles in the image represent right angles in the world. These kinds of assumption are a robust guide to the architectural world surrounding us, which is full of rectilinear structure.
Let’s try it for Panini’s painting of the Pantheon. Certain lines defined by the floor pattern converge on a point V. On the assumption that the floor pattern is a flat rectangular grid, V is their vanishing point. By implication, E is the vanishing point for diagonals drawn on the floor grid. On the further assumption that the grid is square and at right angles to the image plane, the situation looks like this from above:
We can run the earlier vanishing point construction in reverse. Earlier, we found the desired vanishing points on ancillary lines drawn through the projection point. Now we find the desired projection point on the same lines drawn through the vanishing points. Like so: we draw a parallel to the floor orthogonals through V and a parallel to the floor diagonals through E. These lines must pass through the projection point P which is thus found at their intersection. In the resulting triangle PVE, the angles at P and E are equal, and so the sides PV and VE are equal. The viewpoint is therefore at distance |VE| from the painting, on the perpendicular through V.
Putting your eye at the right viewpoint would be quite uncomfortable. You would be so close to the painting that you could not make out its upper parts without tilting your head. But as soon as you tilt your head, you will lose the viewpoint. I will return to this observation shortly when I talk about wide-angle lenses.
The viewpoint for Cranach’s Melancholy Allegory is actually to the right of the frame, a hint that the painting at one point probably extended much further to the right and was subsequently cropped. The painting is now hung so high that one cannot put one’s eye in the right place. Sadly, this happens all too often in museums. For example, Canaletto’s vedute of Venice are almost always hung so high that one cannot appreciate their hypnotic spatiality. Besides, when paintings are high up on the wall, they tend to be overpowered by reflections from overhead light. Even paintings hung at the right height are all too often impossible to view from the right vantage point because of reflections. I, for one, am mystified by this common disregard for viewing conditions in the very temples of art.
The assumptions underlying the viewpoint reconstruction for the Panini painting would be of little help if our built environment looked like this:
For Renaissance artists, depiction of space and architectural regularity went hand in hand. It is tempting to infer from this that architectural regularity is required to sustain depth in pictures. But this inference would be mistaken, as the photograph of the Earth House demonstrates. What is true is that constructing a perspectival view of the idealized Renaissance city by geometric means is quite straightforward, compared to the tedious work required to construct a perspective of the Earth House. And determining the correct viewpoint for the Earth House photograph is difficult as well without the help of vanishing points and the like.
The simple assumptions underlying the reconstruction in the Panini example and others like it are a special case and not universally applicable. But where they are applicable, they can be very powerful, so powerful that they override other, contravening depth cues and give rise to strong optical illusions. A famous example is the Ames room which exploits our proclivity to perceive right angles and parallel lines where in reality there aren't any:
There are many other delightful ways in which we are lead astray by setups that defy our assumptions about their geometry. The Japanese mathematician and artist Kokichi Siguhara has turned this into an art form:
Notice how motion parallax in Siguhara's video enables an intuitive grasp of space. The illusions rub in the point that inferences about space based on observation from a single viewpoint always go out on a limb. The pièce de resistance in this regard is Bernard Pras’s anamorphic sculpture of postman Ferdinand Cheval:
Wide-angle and telephoto lenses
Being clear about viewpoints can resolve a lot of confusion in the photography world about how different lenses affect perspective. Wide-angle lenses are said to introduce “perspective distortion”, telephoto lenses are said to “collapse perspective” or “foreshorten distances”. What people have in mind is the following.
The building wings in this telephoto image look “telescoped together”; it’s hard to tell how deep the space between them is.
A wide-angle lens appears to exaggerate depth and distort objects near the edges of the frame. The red oblong table in the foreground is circular in real life.
A wide-angle example bordering on the grotesque is a White House photograph in which President and Mrs. Biden appear to dwarf President and Mrs. Carter. The New York Times, upon consulting with experts, attributes the effect to the “perspective distortion” inherent in wide-angle lenses.
The clout of the Times notwithstanding, what we see here are not distortions of perspective brought on by certain types of lenses, for the good reason that such distortions do not exist. This is not to say that nothing noteworthy is going on, it’s just to say that lens properties aren't the issue. Lenses do not affect perspective unless they are badly flawed or of an unusual design, e.g. fisheye or anamorphic. Therefore there cannot be any wide-angle distortion or telephoto collapse of perspective. What lenses of different focal lengths do is this: they draw dimmer or brighter, smaller or larger, more or less inclusive circular images on the projection plane. That's all there is to it. Aside from differences in brightness and scale, all lenses depict the world in the same way. They just take in a wider or narrower view. A wide-angle lens draws a circle whose center portion exactly matches the circle drawn by a telephoto lens; the photo taken with the telephoto lens is simply a small part of the photo taken with the wide-angle lens.
To illustrate, here is a wide-angle shot.
And here is a moderate telephoto shot from the same camera position. Except for scale, it is indistinguishable from the central portion of the wide-angle shot.
Actually, that was a lie. The second photo is not a telephoto shot; it is a crop from the wide-angle shot. But a telephoto shot would look identical. If I showed it to you, you wouldn’t know the difference, which is why I didn't bother.
Don’t take my word for it. Do the experiment. Photograph a scene with a wide-angle and a telephoto lens from the same vantage point. Then resize and overlay one shot on the other.
The exercise raises the obvious question: what is a telephoto lens good for if we can get the same effect by cropping a wide-angle photograph? The answer is resolution. The more we crop, the less of the information captured by the film or sensor we retain, and soon we reach the limit of acceptable resolution. A telephoto lens remedies the situation by projecting a narrow view of the world onto an image circle as large as that of the wide-angle lens, thus filling the entire sensor instead of just a small part of it with the relevant information.
We choose a lens for a given sensor format depending on the angle of view we want to capture. And then we present the images captured with different lenses at the same output size and look at them from the same comfortable medium distance, forgetting that their viewpoints are at very different distances, one far removed, the other uncomfortably close. From a medium viewing distance, the telephoto lens indeed seems to compress depth, and the wide-angle seems to exaggerate depth and squeeze objects near the edges. But these impressions are entirely the result of inappropriate viewing. When each image is viewed from the correct vantage point, it does not show any perspectival distortion.
A painting from the 15th century is very instructive in this context:
Many art historians consider Christ's feet too small. They reckon that Mantegna departed from perspective to make the feet “look right”. But are they really too small? From close up, it surely looks that way. But that’s the least of Mantegna’s problems: from close up, the entire body looks ill-proportioned. As we move away, however, things begin to look more proportionate.
From somewhere around 30 times the image width, the body parts fall into place, the figure looks natural. The way to convince yourself of this is not by shrinking the painting, because it just gets too small, but by photographing a real person from far away and comparing the photograph to the painting.
The comparison suggests that Mantegna painted the figure with the narrow angle of view of a super-telephoto lens. He did this a century before the invention of the telescope. How did he do it? I have no idea. It is like painting your tennis partner from the other end of the court, for which you need a hawk's vision. And why would Mantegna have done this when it makes the painting look strange from any more reasonable viewing distance?
There is another puzzle, to do with the marble slab on which the body lies. From afar, the slab sticks up way too much for a horizontal object the length of Christ. If the slab is indeed rectangular, horizontal (we ignore its slight downslope to the right), and twice as long as it is wide, then it is painted with a very different perspective from the figure, one that induces a viewing distance of only 3.5 times the image width. As my friend Richard Holton points out, the perspective of the women lamenters on the left appears to match that of the slab.
The painting was found in Mantegna’s studio after his death, and nobody knows how he would have displayed it. In some of his other works, Mantegna chose quite unusual viewpoints, some even outside the frame, but never anything this radical. With those other works, which are frescoes, the unusual viewpoints match the natural position of the viewer. Perhaps he was looking to express some idea of divine remoteness in the Lamentation?
So much for telephoto effects. Before I say more about wide-angle “distortion”, let me address what lies between wide-angles and telephotos.
The idea of the normal lens
Before zoom lenses came of age, camera bodies were often sold in conjunction with a “normal” lens. A normal lens is conventionally understood as having a fixed focal length about equal to the diagonal of the film or sensor onto which it projects. The diagonal of 35mm film and so-called full-frame sensors is 43mm. The most common normal lens happens to be a bit longer, 50mm, a discrepancy that continues to irk many photographers. But what is normal about such a lens in the first place, and what does it have to do with perspective?
The answer that is repeated ad nauseam is that a normal lens produces an image that roughly matches what the human eye sees and therefore makes for natural viewing. Is this true?
This is a sketch drawn by the physicist Ernst Mach at the end of the 19th century (the Mach familiar from airspeed measurement in terms of the speed of sound). The sketch shows Mach’s study as seen out of his left eye, with his nose framing the image on the right side.
I reclined on my couch at home and took a photo in the direction of my feet with a normal lens. Here is part of that photo overlaid on Mach's sketch, with me stepping into his right shoe to make sure I get the scale right.
And here is the entire frame captured by the normal lens.
As you can see, the frame includes only a fraction of what the eye sees. The normal (50mm-equivalent) lens covers an angle of about 46 degrees, measured diagonally across the frame. The angle of view of the human eye is on the order of 210 degrees without moving the fixation point.
You might say: the claim about normal lenses was meant to be restricted to our sharp central vision. But that cannot be right either. The angle of foveal sharp vision is only 1.5–2 degrees. The angle of macular vision, where things are still somewhat sharp, is only about 17 degrees. What makes a lens normal must therefore be something else.
A normal lens, I submit, is simply one that follows the lead of the lens designed in the early 1920s by Max Berek for the first commercially available “miniature” camera, the Leica I. The camera was such a success that its lens set the standard for subsequent 35mm cameras and other formats as well. Berek’s choice of 50mm had, I suspect, nothing to do with musings about how the eye sees and everything to do with optical and mechanical design constraints. One motivation behind the development of the Leica was that it would be used for test exposures of 35mm motion picture stock. The usable area between the perforations of 35mm film is about 24mm wide. In a movie camera running the film vertically, this defines the width of the frame; the frame height at the time was 18mm. Oskar Barnak, the Leica's designer, decided to run the film horizontally, for a frame height of 24mm, and settled on 36mm as the width for a new 2:3 aspect ratio. The new format could record more detail than the movie frame, but it remained small by still photography standards and required a high quality lens if it was to be enlarged to any reasonable print size. Motion picture lenses only covered 18mm x 24mm, and other lenses with larger image circles were too unwieldy or didn’t have the required quality. A new lens design was needed. Beside covering the new negative format, the lens had to be usable in many different settings and therefore couldn't be too wide or too narrow; it had to be small enough to suit the camera; and it had to be fast enough for handheld shooting on film stock that was very slow by today’s standards. A focal length in the vicinity of the negative diagonal best met these constraints. To this day, lenses whose focal lengths are near the film or sensor diagonal offer the best combination of application range, speed, size, optical performance, and price.
So a “normal” lens is one that best meets a list of design constraints. It turns out that this lens also yields images that “work” perspectivally under quite natural and once very common viewing conditions. Let me explain.
These days, nobody fusses over viewing conditions. Photos taken with very different lenses all get presented at a range of sizes and viewed from various distances—in an album, a book, a gallery, or on a smartphone—without any consideration for the focal lengths involved. Earlier, when darkroom prints were the preferred currency of photography, much photographic work was appreciated in the form of 8" x 10" prints held at reading distance. And for that condition, the normal lens turns out to be just right to produce photos that look spatially natural because they are viewed from the perspectivally correct distance.
8" x 10" used to be the most versatile photographic paper size; it is what darkroom students to this day are told to buy first. The size is convenient for contact printing and archiving many small negative formats. It yields the most gorgeous contact prints from 8" x 10" negatives which reigned for much of the 20th century in studio and landscape photography. It is considered by many to be the smallest size for a respectable but not ostentatious print and has been the default for work prints and portfolios (now gradually ceding this role to letter-size inkjet paper). The press standardized its operations for 8" x 10", as did most organizations that needed to archive prints (for example the Farm Security Administration which commissioned one of the most significant photo documentary projects of the last century).
When you hold an 8" x 10" print at comfortable reading distance, which for most people is in the 14"–16" range, the distance between eye and print slightly exceeds the print diagonal, which is 13". Likewise, the focal length of a normal lens slightly exceeds the film diagonal. Therefore the geometric relationship between eye and print is about the same as that between normal lens and film (or sensor), so that the print works like Alberti's window—as long as it shows the entire image without much cropping or empty space. If we let the reading distance fall in the middle of the range, at 15”, then it exceeds the 8" x 10" print diagonal by a factor of 15/13. A perfectly normal lens should thus have a focal length of 15/13 of its format diagonal. The diagonal of 4" x 5" film, whose aspect ratio matches 8" x 10" paper, is 163mm. 163mm x 15/13 = 188mm. The 180mm lens widely considered normal for 4" x 5" film comes very close. The 35mm negative with its 2:3 aspect ratio yields an image diagonal of about 12" on 8" x 10" paper. So its “normal” lens should have a focal length of 15/12 x 43mm = 53.75mm, again very close to the customary 50mm. Some fabulous normal lenses by Zeiss are even closer at 55mm focal length.
To recap: if you hold an 8" x 10" print created with a normal lens at reading distance, then everything will look spatially natural. The print is a perspectivally sound representation because it works just like Alberti’s window. This accord of perspective and lens optimization will surely please Alberti's ghost.
Wide-angle lenses
As I have stressed, we tend to view photos taken with wide-angle and telephoto lenses in the same manner as we view photos taken with a normal lens. Consequently, we view wide-angle photos from farther away than their viewpoint, and we view long-lens photos from too close up. This is where the lore of perspectival distortion originates. Mantegna’s Christ looks distorted, all right, but as we saw, this is no flaw of the painting but a consequence of our looking at it from too close.
If you want to use a telephoto lens and show convincing depth, you must place your viewers in the right position. This means, you must either show very small prints or else keep people at a distance. But this is not what you bought that big expensive lens for. So the lens is a poor choice for conveying space. A normal lens is good with respect to perspective under normal viewing conditions. But it is often still too narrow to capture what we are interested in. So we are forced to use wide-angles. What about their distortions?
Let’s begin by convincing ourselves that, as with telephoto-lenses, the distortion is not a lens defect but the consequence of looking at the photo from the wrong vantage point.
The objects in this photograph, taken with a 14mm wide-angle lens on a full-frame sensor, especially the ones near the edges, look seriously deformed. The only ones that look normal are those that have no significant depth and are oriented parallel to the image plane: the photographs and window on the back wall, the bicycle, the yellow envelope next to the computer. Could this really be just a matter of the viewpoint?
Consider the two oranges, which in reality are about the same size and shape. What happens to round objects, in particular to spheres, in perspectival rendering? Remember that the image of an object is the intersection of the image plane with the sightlines from the projection point to the object. The sightlines to a sphere always form a cone with the projection point at the apex. The image plane intersects this cone in a generally oblong cross-section, more precisely, an ellipse. The major (longer) axis of the ellipse points in the direction of the image’s principal point, the point where a perpendicular dropped from the projection point intersects the image plane. The ellipse reduces to a circle if the sphere is located on that perpendicular behind the principal point. It becomes more elongated the further off to the side the sphere is located because the angle at which the image plane slices the cone becomes ever more acute towards the periphery.
And this is exactly what we see in the photo. The orange below the image center, which happens to be the principal point, is slightly elongated in the vertical direction; the orange at the lower left is strongly elongated in a diagonal direction.
An ellipse looks oblong from pretty much anywhere: you can move your head around in front of your device, and the orange at the lower left will continue to look oblong. But you are not looking at it the way it wants to be seen. The wide-angle lens renders the world as if seen from close behind a large window. The viewpoint of a wide-angle image is correspondingly close to the image, in this case at a distance of about half the image height. This comes to a little less than 5" from the center of my 15" laptop screen when the photo takes up the entire screen. My eyesight fails me at that distance. But I can hold my phone camera there, and here is how it sees the orange in the left bottom corner:
The orange now looks round, the distortion is gone. From the correct viewpoint, we see the elongated image at an angle that exactly compensates for the elongation. This has to be the case: from the apex of a cone, any cross-section of the cone looks circular. (You may notice that this phone picture has its own wide-angle “distortions” towards the edges. This is because the phone’s lens is itself a moderate wide-angle, corresponding to a 28mm lens on a full-frame sensor.)
You can easily verify this by clicking on the photo to enlarge it on your computer screen and looking at it through your phone. Keep in mind that the phone lens must be centered on the screen and at a distance of half the image height.
The following video is my attempt to scan the entire wide-angle photo from the correct viewpoint.
As you can see, there is nothing wrong with the 14mm-lens. Spheres look appropriately spherical when viewed from the right vantage point, and all the other objects look the way they should as well. There is no wide-angle distortion. What you can also see is how difficult it is for me to hold the phone in the right place. We encountered this problem of maintaining the viewpoint already in the discussion of Panini’s painting of the Pantheon.
The fact that spheres are imaged as ellipses (with a circle as the limiting case) can be used similarly to our earlier use of facts about the images of straight lines to find the viewpoint of a picture. For a picture with at least two distinct ellipses representing spheres, the viewpoint is the unique point from which both ellipses look circular. Two ellipses are needed because any one ellipse looks circular not just from one but from infinitely many points, all located on a hyperbola whose vertices are the ellipse's two focal points. The viewpoint is then the point in front of the picture where the two hyperbolas corresponding to the two ellipses intersect. I spare you the mathematics. If you want to get a feel for what is going on, use Dandelin spheres to construct the family of cones that intersect the image plane in a given ellipse.
Having dispatched perspectival distortion as a matter of inappropriate viewing, it is time for me to say a word about real distortions that can beset photographic images: barrel and pincushion. These are distortions caused by physical properties of the lens, and they look different than what we have seen so far. Let me show you the barrel distortion that is common in wide-angle lenses.
You'll notice the outward bend of the table’s edge in the foreground. Longer lenses tend to exhibit the opposite pincushion distortion. Barrel and pincushion distortions do violate perspective because straight lines are imaged as curves. Both distortions can be removed in software, which mostly happens automatically these days. The version of the photo you saw earlier had the distortion automatically removed.
The offensive wide-angle “distortion”, by contrast, cannot be removed in software. The only way to tackle it is by getting the viewer into the right place. How can this be achieved? By printing large, really large.
This photo was made on 4" x 5" sheet film with the equivalent of a 14mm lens. The right viewing distance is again 1/2 of the image height, too close for a normal print. When I had the photo in a show (about a decommissioned coal-firing power plant), it was printed at 8 ft x 10 ft. At that size, it can be viewed comfortably and correctly from 4 ft away, without danger of losing the right viewpoint by tilting one's head. The spatial impression thus created is strong and convincing (e.g. in the appearance of the stairs top-left or the fire extinguisher).
What if you cannot print large enough? Maybe software can at least fudge things for a more pleasing look. There is a “Volume Anamorphosis” tool by a well-known French software company that claims to correct wide-angle distortion. Let's give it a try. First the original photo, then the results of running two different correction algorithms.
In the output from the first algorithm, the drum, and especially the drummer, look better from a distance, but now we have noticeably bent lines on the left wall and ceiling that should be straight.
In the output from the second algorithm, the drum is nicely round, the drummer looks good, but we have barrel distortion on the right wall and the floor.
It is like trying to straighten out a bulging carpet. All the software can do is push the bulge around; “correcting” one part of the image messes up another part. How obvious this is depends on the image. In most architectural photographs, I wouldn't consider the tradeoff. I might paste one of the “corrected” versions of drum and drummer into the original photo, but only in the context of architectural photography, where marketing is the name of the game. I wouldn’t do it in the context of documentary photography, where I am willing to live with the “distortions”. I try to minimize the problem by avoiding to place round objects near the edges of the frame.
Since wide-angle photos are hardly ever viewed from sufficiently close up, it remains an interesting question to ask what people make of the perceived “distortions”. The architectural historian Claire Zimmerman summarizes her misgivings about a set of architectural photographs from the early 1930s depicting the famous Villa Tugendhat as follows:
“the relative dimensions of the space were altered in the photographs. This is an effect of the wide-angle lens. [...] the wide-angle lens stretches unevenly, elongating the space closest to the camera and compressing the space behind.” (Zimmerman, Photographic Architecture in the Twentieth Century, University of Minnesota press 2014, pp. 111-2)
This sounds to me like a deformation in one dimension, perpendicular to the image plane. Later, Zimmerman says:
“the wide-angle lens [...] distorts perspectival space into a trumpet shape” (ibid. p. 227).
This sounds like a different, more complex deformation.
Let’s try to figure out how exactly the space in front of the camera might be stretched or squeezed in order to give rise to the offending visual impression. For this exercise, we assume that the (too distant) viewpoint is in fact the projection point and work backwards, from the image towards shapes in the world, but not just any of the infinitely many possible shapes, rather those that can be produced from the actual shapes by fairly simple squeezing and stretching. We ask how the object space might be transformed in order to look like that.
The following sketch illustrates two possible transformations, one a stretch perpendicular to the image plane, the other my best stab at Zimmerman's trumpet shape. We look at the situation from above. The point E on the bottom right is the correct viewpoint for the sphere in the object space on the left, E' is the actual viewpoint. The sphere with E as the projection point yields the same image as the ellipsoid or the squished form with E' as the projection point.
Here is a video to illustrate the simpler of the two transformations. It shows how a regular and an orthogonally stretched scene can look exactly alike when viewed from different viewpoints. It starts out slow to give you time to contemplate the situation.
If stretching in depth is the relevant transformation, then it has to be uniform, not “uneven” as Zimmerman claims. I wish I could show you the trumpet-shaped distortion. But the architectural modeling program we used couldn't compute the more complex transformation.
This concludes what I want to say about the effects of using different types of lenses. If you still dislike the way wide-angle photographs look, then at least you know now that you should blame geometry, not the lenses. The same goes for telephotos.
For what it's worth, my own preferred general-purpose focal length is 35mm. At normal viewing distance, the slightly exaggerated depth compared to a normal lens is not too obvious, and the lens allows me to include more of the context surrounding the main subject, which is something that I often like to do.
Straight Verticals
With the examples I have shown so far, I have smuggled in a convention that has been in force in architectural photography from the very beginning. In fact, the convention predates photography by centuries. It stipulates that real-world verticals align with the sides of the picture. The origin of the convention lies, as far as I can tell, in architectural practice and ultimately in gravity.
Let’s take Alberti by his word and envisage a real window, cut into a real wall. Real walls are mostly vertical because the ones that aren’t tend to have a short lifespan. A practical opening in a vertical wall has vertical sides. These sides run parallel to all other verticals in the vicinity (we may ignore the Earth’s curvature in this context). In a central projection with the window as the projection plane, all these verticals are parallel (remember the remarks about the behavior of families of parallel lines).
Here I am pointing the camera horizontally, so that its sensor is parallel to the window whose verticals are parallel to the building verticals. The convention is satisfied.
Trouble arises as soon as I point the camera up or down. Maybe I want to show less of the foreground and more of the buildings. After all, I am violating the compositional rule of thirds, if that’s the sort of thing about which you care. So I tilt the camera up:
I tilt the window together with the camera; the two continue to be aligned. This means that the sides of the window remain parallel to the sides of the image, but they are no longer parallel to the verticals in the world. There is nothing perspectivally wrong here. This is the way things look through a slanted window that is perpendicular to the raised line of sight. But it is not what we want. What we want is a view through a vertical window, like this:
The building verticals are once again aligned with the window verticals (the grid lines). But neither are parallel to the photo’s sides. What can be done?
First option: crop. To do that, we need a wider lens to begin with, one that allows us to include the building tops while the camera is level, like so:
Same camera position as before, focal length now 14mm instead of 24mm. We crop around the window.
In the old world of view cameras with flexible bellows, the cropping was done in camera. That’s what the fabled “perspective controls” of these cameras amount to. View camera lenses project image circles that are considerably larger than the film area. This gives the leeway to frame the image by sliding the front or back of the camera sideways and up-and-down, thereby moving the film holder across the image circle.
This is the image circle of a lens designed for a 4" x 5" view camera projected on 8" x 10" film. We can slide a 4" x 5" frame around in it and pick the part of the image that we want. Tilt-shift lenses sold for digital cameras are more expensive tools to accomplish the same thing.
Second option: transform in software, then crop. Every serious image editing package offers the tools.
This is geometrically equivalent to the first option. Make sure during the shoot to give yourself enough of a margin for cropping.
The “corrected” photos are still perspectivally sound. The only difference is that the viewpoint is no longer centered on the photo but has moved down with the horizon. (The viewpoint always lies on a plane that includes the horizon line, and this plane is at right angles to the image plane as long as the image plane is vertical.)
After years of working with view cameras, I now prefer the transformation in software. The reason is that I can use the smaller image circles of less extreme wide-angle lenses to their full potential, instead of using the outer parts of the larger image circles of more extreme wide-angle lenses, where things tend to get quite dark and blurry. The images from most digital cameras have enough pixels to survive the transformation without much loss of definition.
The straight-verticals convention works nicely with the preferred display mode for photographs: in tasteful frames on a vertical wall. The display ensures that real-world verticals and verticals in the photograph are parallel and that the image horizon is level. This makes it possible to view the photographic space as continuous with the real space inhabited by the viewer. The same continuity would obtain if we hung perspectivally “uncorrected” photos at the appropriate angle, like skylight windows in a pitched roof. But I am not aware of anyone ever having done this.
There are some bits of practical advice for depicting space that can be drawn from the discussion.
1. Use the least extreme wide-angle lens you can.
2. When you are excited about a sense of space and want to photograph the scene, stop moving (to avoid motion parallax) and close one eye (to avoid stereoptic parallax). Then see if the remaining depth cues are powerful enough to work in a photograph.
3. Inspect the scene on your camera’s back screen. The screen being so small, your actual viewing distance is much greater than the perspectively correct distance. This will accentuate any “distortion” problems towards the edges and corners.
4. If you want to be informative about a space, choose a projection point that is robust in the sense that changing it slightly doesn’t change the depth cues dramatically. This is violated in the illusions referenced above. On the other hand, a specific projection point may give a preferred 2D composition or juxtaposition.
Tonal Reproduction
You have no problem discerning the dominant shape in this picture, and it is not because of geometric cues like lines and angles. Maybe you think this is easy because you see a familiar object. (By the way, this is a medical teaching model.) But look, even Photoshop without AI can figure it out.
I asked Photoshop to make a 3D model from the original photo and then render it. I doubt that Photoshop has a shape library that contains buttocks. If it did, it shouldn’t make the mistake of turning specular reflections into protrusions.
Much of the information that makes the 3D elaboration possible lies in the gradients of light and shade. There are plenty of tutorials, in books and on the web, both for photographers and painters, that teach how the direction and quality of light affects the modeling of form. It's fun to experiment by taking a simple object like an egg and casting different kinds of light on it. (You want to play with a light gray or white object to better see the shadows and not be distracted by colors.) On-axis harsh front lighting is the worst to model the form, soft but still directional side or back lighting is the best.
The tea-sieve figure in the cup showcases soft window light coming from the side and back and being reflected from the inside of the cup. The spatial information is all in the tonal gradients. I won't go into much detail here. I only want to talk briefly about one aspect that specifically concerns architectural spaces.
Architectural interiors are technically challenging to photograph because they present an enormous brightness or dynamic range range from the darkest shadow inside to the brightest highlight outside. This range outstrips the recording capacity of both traditional film and digital sensors: you cannot capture it all in one exposure. You have to make a decision. You can base your exposure on the interior light, but then the exterior is blown out.
Or you can base the exposure on the exterior, but then the interior will be too dark and shadows all blocked up.
Not good if what we aim for is something like this, which is much closer to what the place looks like:
A further complication is that architectural interiors are mostly painted in light colors, so they should look reasonably bright in a photograph. But equally often, they are not exposed to enough light and hence will come out looking murky.
In the olden days, solving these problems required some heavy lifting. In order to balance interior and exterior light levels and color temperature, architectural photographers (or their assistants) replaced incandescent lightbulbs with brighter and bluer ones, put colored gels on fluorescent tubes, covered the windows with dark transparent foil, and brought in studio lighting to brighten the interior. The results can look very slick, but often we sense the artifice.
Things have become easier with digital technology. We now can make a number of different exposures to capture a wide dynamic range and then stack them in software. This allows us to take the highlights from one frame and the shadows from another frame (so-called High Dynamic Range processing). One could do this in principle with film, but in practice it is impossible to line up the frames precisely enough because film always buckles a little.
Of course there is no free lunch. We are trying to squeeze a vast dynamic range into the much smaller range that can be displayed on a computer screen or printed on paper. For those of you with a feel for f-stops (remember: each f-stop amounts to a doubling or halving of the light level): an architectural interior often encompasses 15 stops from shadows to highlights. A print can hold about 5 stops. So we are talking about a compression by 10 stops or a factor of 2^10 = 1024. No matter how you do the compression, you will lose contrast in the process. And contrast is what models shape.
This kitchen provides a pedestrian example. Above is the best overall exposure straight from the camera. Shadows are okay, highlights lack detail or are entirely blown out: backsplash, grill, counter top, and pendants.
This is after HDR processing. We now have highlight detail, but the image looks very flat. I could have tried to jazz it up in the HDR software, but that tends to yield the kind of horrible result with which the web is flooded:
Beside the kitchen looking flat, the view through the screen doors is way too blue; it was pouring rain outside. So let’s adjust the white balance accordingly:
Bad! There is no setting that makes both exterior and interior color look acceptable. I had no choice but to adjust the color locally. This was especially tricky in the reflections off the chairs and floor and in the mauve wall paint in the alcove (which was of great importance to the architect). I also adjusted the overall contrast and colors and brightened up the ceiling. Never mind the details; the end result is this:
We finally have decent modeling of shapes by tonal gradients, and the colors look plausible. Coloring in Alberti’s window turned out to be a non-trivial exercise.
An interesting alternative to HDR processing is a technique that sometimes goes by the name "flambient", where the interior is photographed twice, once under ambient light, and once lit up with a powerful flash. The exposures are then cleverly blended in software. The ambient light provides the atmosphere, the flash gives neutral color information. The blending needs to be done very gingerly lest the flash impose an unnatural, almost clinical look on the interior, visible in a lot of real estate photography. The technique has another trick for recovering exterior detail. A flash overexposure of the interior is used as a mask for pasting in the "correctly" exposed exterior. This is very effective but can make the exterior look like a landscape photograph stuck in the window frame. One telltale sign of a poor flambient job is too close a match in brightness, contrast, and white balance between interior and exterior.
Starting with Alberti’s thoughts about the “truthful” representation of space, I urged that we include tonal reproduction under the heading of perspective. Now we see that in order to make a photo look right, we need to deviate in subtle and not-so-subtle ways from straightforward, globally defined tonal mappings. Put bluntly, we need to cheat locally in order to approach a more global semblance of truth. It is this massaging of tones, together with the choice of a good vantage point, that makes shapes come alive in a flat picture.
Painters have known this since at least the Renaissance. The interior in Carlo Crivelli’s Annunciation is almost as bright as the exterior, which wouldn’t be the case in reality. In fact, Crivelli is so concerned about enlivening local contrast that he is forced to give up much of the more global modeling through light. Many of his contemporaries worked similarly, relying more on geometric form than light for overall spatial impact.
Compare this treatment by Pieter Janssens 200 years later. Janssens presents light in an almost photorealistic way, 150 years before color film. This realism is hard-won. If you have ever tried to photograph a dark interior lit by a few sunny spots, you know how extensively you need to rework local color and contrast to arrive at such a vivid result. It would be interesting to study the dramatic shadows for verisimilitude. Hint: the chair’s shadow on the back wall is wrong.
Classical painters are known to have purposefully manipulated perspective not just with respect to color but also with respect to shape. The sphere at the bottom of Cranach’s Melancholy Allegory may be a case in point. Another example may be the apple and cucumber on the foreground ledge in Crivelli’s painting. They are certainly placed prominently enough to draw attention.
When centered in the image, my apple resembles Crivelli’s more closely. Is this the look he preferred? Did he purposely distort his apple to make it look better, in the way in which I might paste a "corrected" version of the drum and drummer into an otherwise perspectivally sound photo?
Apple distortions aside, Crivelli’s cucumber is massive for a pickling varietal. He was apparently obsessed with cucumbers.