The process of hearing a sound and being able to locate it in space is called “psychoacoustic localization.” Just like your vision system, your hearing system evaluates and compares two inputs (left and right ears) to determine where a sound is coming from. Where your eyes evaluate angle and focus to determine distance, your ears evaluate time delay and frequency response.
Imagine you are sitting in front of a pair of speakers and you turn off the left speaker. Both ears continue to hear the right speaker, but the left ear, because it’s farther away, hears the sound slightly after the right ear. This difference in arrival time for sound sources that are off-center is called the “interaural time difference” (ITD). In addition, because the left ear is somewhat in the shadow of the head, it hears sound from the right speaker at a slightly lower volume and with a slightly altered frequency response. This is called the “interaural amplitude difference” (IAD). These two cues work well from about 400 Hz to about 2 kHz and are the most important psychoacoustic cues for left-to-right placement of sound sources (lateralization).
Once a sound is higher than 2 kHz, it becomes more difficult to tell where it’s coming from using ITD because the wavelength is shorter than the distance between the ears. High frequency localization is instead performed by listening to the effects of the short reflections that happen in the outer ear. If you look at your ear, around the entrance to the ear canal is a small basin, created by a ridge that takes up about half the outer ear. This is called the “concha ridge” and acts to focus sound into the ear canal. As a sound approaches the ear, some of it goes directly into the ear canal and some reflects off the concha ridge before entering the ear canal. The direct signal is mixed with the reflected signal after a short time delay, causing a frequency response phenomenon called a “comb filter” to occur. As the source of the sound moves, or, more importantly, as the head moves, the artifacts in the comb filter shift, allowing your brain to decode changes that your perceptual system understands as sound source position. This information is used to fine tune information from ITD to detect front/behind and elevation position.
It is important to understand that it is mainly changes in perceived sound that the brain uses to localize, not the static cues themselves. Scientists have observed that people use a small left-to-right shaking of the head movement to localize sound. This subconscious act is believed to allow people to listen for the subtle changes in the psychoacoustic cues needed for accurate sound localization.
In addition to the primary psychoacoustic cues discussed above, your listening system also uses sound reflections off of your shoulders and torso and correlations between visual cues and reflections off walls and objects in the room to not only determine locations of sound sources but also to determine your physical orientation in the room. The listening system is used primarily as an alert and support system for the visual sense, and acts primarily at the subconscious level. This is one of the reasons that acute and accurate listening skills are difficult to develop.