We keep an ongoing glossary of terms and explanations, and have shared this glossary with a number of clients. We thought it worthwhile to publish to Mosaic for easier reference and sharing at large.
The following explanations of various general terms, assistive and accessible technologies, and media affordances provide an overview and a shared context for which to think about and design specific solutions, initiatives, and inclusive undertakings.
These are not recommendations, but instead aim to serve as informal working definitions. Some terms, like Zoom, are not defined, as their definition is obvious; instead an explanation offering some additional nuance is provided to help the reader understand how these approaches work together. Terms are grouped semantically and are not explicitly ordered alphabetically.
We keep this page updated and iterate upon it as our work reveals new insights, thoughts, and use cases. Please feel free to use, cite, and communicate with us as needed.
Words like affordance, accessibility, and othering come up a lot in this guide. To ensure consistency and clarity in these terms, we are defining each as follows:
Affordance refers to the interaction between a person and a physical or digital interface. An affordance is an attribute indicating how something is used and what actions can be taken. For example, a button may have a depressed area indicating it should be pushed or a light indicating its status. Understanding affordances in the context of accessibility and inclusive design can facilitate greater access by examining how affordances are surfaced using multimodal tactics; including but not limited to those defined in this document such as captions, braille, ASL on-screen, etc.
Accessibility is the ability of all people to use a product, place, or service regardless of disability. It is critical to understand that accessibility is intended to mitigate the results of disability.
Disability is the result of a variance that may be physical, cognitive, mental, sensory, emotional, developmental, or some combination of these. When discussing disability, two models are often discussed: the medical model and the social model of disability. It’s critical to understand that there are many more models.
Medical Model of Disability
In the medical model, if one has a disability, it is treated as something to be fixed or healed. When the perceived impairment can’t be fixed, society treats the individual as being broken, as having the impairment, as different, “the other.” This shifts the burden to the individual because the individual is viewed as the problem.
Social Model of Disability
The social, or environmental, model considers the environment as disabling, rather than the individual as being disabled. This means that when a wheelchair user can’t activate a door, it’s not the fault of the wheelchair user, but instead is the environment that is disabling and othering.
Inclusive design is a design process in which the fact that all individuals interact with the world differently is placed at the heart of the process. Individual people, with their own lived experience, prior knowledge, and differences, will interact with what we make and put into the world, so we should relax our assumptions about the abilities of the user and instead design with compassion, flexibility, and inclusion at the heart of our practice.
Any act that makes a person or group of people feel essentially different. Intent is important but matters a lot less when discussing othering.
The accessible zone is a volume of space in which all primary content must be displayed so as not to disadvantage those who do not have access to the viewing angles afforded by having a nominal height. This zone especially benefits those who are of small stature or are in a seated position.
The universal keypad (UKP) is a device first conceived (by Corey Timpson, Sina Bahram) and implemented at the Canadian Museum for Human Rights (CMHR). It is a physical device that facilitates alternative modalities such as a headphone jack and rubberized buttons to control the interface it is paired with, turn on/off accessibility features, and control output volume.
Static media is analogue or digital media with no time component. Static media includes images, photographs, paintings, non-moving projections, and more.
Visual description is a textual representation of static media most commonly intended for those who may not be able to see the media, but it is now generally accepted to be helpful to all people. There are many forms of visual description, including short, long, and poetic/interpretative descriptions, to name a few.
Alt text is mentioned in this glossary because it is very common to see alt text and visual description conflated. To disambiguate them, alt text is when visual description is associated with an image, most commonly on the web or in other electronic formats. Alt is short for alternative.
Please note, the term “alt tag” is completely meaningless and thus should never be used. When people use “alt tag,” they almost certainly mean the “alt attribute” (alt=) of the “img” tag of HTML and related technologies to specify the alt text to be rendered.
Touch is an effective modality to convey spatial information when used correctly. Explanations of tactile alternatives, manipulatives, and reproductions are below.
Tactile alternatives are touchable representations, but not full reproductions, of a piece of source media. This can be an embossed version of a graph, a simplified architectural model as a tactile diagram, a touchable piece of fabric from a garment that cannot be touched due to conservation/preservation reasons, and much more.
A tactile manipulative is an object that is typically held or placed on a flat surface for examination. These are often components of a touch tour or other similar service offering.
A tactile reproduction is a tactile alternative that significantly prioritizes accuracy, fidelity, and the comprehensive presentation of the nuances of an object.
Guided Tactile Description
Guided tactile description incorporates and is like visual description; however, it explicitly assumes that the consumer of the description is touching something while the description is being consumed. This is important because the guided tactile description uses references to how something feels or where a tactile landmark is located as helpful wayfinding information.
Sonification is the use of non-speech-based audio to convey spatial information. Sonification is a reasonably well-established field with a body of academic research that is ever-evolving. An example would be an audible tone whose pitch is mapped to the Y-values on a graph, facilitating greater understanding for someone who is unable to see the image.
Time-Based Media (Non Navigable)
Time-based media refers to any media that has a non-zero duration, e.g., animated gifs, audio, video, animations, and more presented without user controls.
Sign languages (also known as signed languages) use the visual-manual modality to convey meaning. Sign languages are expressed through manual articulations in combination with non-manual elements. Museum experiences can be challenging for native users of American Sign Language (ASL) as their reading levels in English may be much lower than their ability to communicate in their official language. The official working language of the American Deaf community is ASL. This language has equal status and priority within the National Association of the Deaf and its activities. All time-based media in the exhibition program can support the needs of this audience by incorporating ASL.
Subtitles are a translation affordance for a visitor. They are offered when the language of the viewer differs from the language of the media. For example, if a video is in German, English subtitles shall appear on screen for English speakers. It is important to understand that subtitles are not captions. In addition to being in a translated, and therefore different, language than the source media, subtitles do not indicate non-speech audio such as sounds, music, and more.
Captions are the real-time textual representation of what is being said as well as any non-speech audio. If English is being spoken or if any sounds occur in a video, English words appear as a textual alternative to the audio. It is critical to understand that captions are not subtitles. Captions are displayed in the same language as the language of the media.
Closed captions are the form of captions that are not burned into a video. This means they are stored digitally as text, either in a separate file from the source media or as part of a container format. They are preferred because they can be read by screen readers—think of a deaf-blind individual accessing captions via braille.
Open captions are captions that are burned into the video. They are not stored as text, but as pixels that comprise the frames of the media instead. They are not preferred in most situations (see Closed Captions), but they are sometimes necessary—think of an uncomplicated video playback device in a public setting that doesn’t support closed captions.
Transcripts are the static textual representation of the source media. It is convenient and accurate to think of a transcript for a piece of time-based media as a concatenation of all the captions of that media. Transcripts can be consumed at an independent rate of speed from the source media, as they are static text.
An enhanced transcript is similar to a transcript with the addition that it also contains the text of the audio description of the media. This concept is important so that there is an affordance for those who would benefit from both a transcript as well as audio description.
Audio description is an additional audio track, often phrased as an additional language track. Just as captions are in the same language as the source media, so too is audio description. Audio description is the narrated visual descriptions of the contents of the source media. The style and amount of audio description is heavily dependent upon the source media’s contents.
Audio description is often delivered via audio ducking, a simple acoustic treatment easily achievable in virtually all editing workflows. Audio ducking is when the source audio is lowered but not eliminated, and the audio description track is played at the source media volume. The audio of the source media is said to duck underneath the narration of visual description; hence why it’s called audio ducking.
Navigable media refers to media, static or time-based, that can be interacted with or navigated in any way. A video playing on a screen is time-based media, but a video with transport controls, e.g., play/pause, rewind, and fast-forward, is navigable media, as are video games, digital interactives, websites, mobile apps, and much more.
Access technology is an additional affordance, mode of operation, or other accommodation to a system that enhances the experience to ensure inclusivity and accessibility. The below-defined concepts of screen reader, zoom, and high-contrast mode are all access technologies and/or technological implementations.
A screen reader is an application, which often runs with elevated privileges, on a platform. It keeps track of the user’s focus, or point of regard, and announces, via synthetic speech, what the user is currently interacting with, what can be done from this point, and how to perform any desired actions. Synthetic speech is achieved by software called a text to speech (TTS) engine. Screen readers can also drive a braille display, which is a tactile interface that can display braille on an often single-line display consisting of between 10 to 80 characters (20 to 40 characters is most common). Those who are deaf-blind rely primarily or solely on a braille display for all aspects of communication; therefore, thinking about braille support, either via a provided braille display or by following standards such that a third-party display can be used, is necessary so that we can make sure not to exclude this often-ignored population.
Text to Speech
TTS refers to the mechanism by which any digital system uses synthetic speech to inform the user of something. This should not be confused with speech to text, which is when the user speaks, and a computer system responds.
Different users of TTS listen at different words per minute (WPM) rates. By providing control of the speed of the TTS output, agency is returned to the visitor over how fast they choose to consume content. This has many implications, from helping those with cognitive differences to accommodating those who are first learning how to use a screen reader to power users who wish to rapidly progress through the content.
Much like how controlling speed is helpful, so too is controlling volume. Volume control is obviously beneficial for ensuring that a continuum of visitors with various hearing abilities are able to successfully consume content, but also to help accommodate unforeseen noise in the environment such as a group of young children visiting a gallery at the same time as someone who depends on speech to navigate through the space and content. Additionally, different headphones and earbuds have various impedance ratings, which means that the same output voltage results in different volumes; therefore, a wide volume range helps accommodate the maximum number of devices. Some visitors may also be sensitive to volume and therefore prefer to lower the volume from a preset nominal level.
A visual highlight or focus rectangle provides additional visual affordance with the highlight indicating either the content being explored or the interface elements being used. The highlight/rectangle has a transparent interior so as not to obscure the specific content and an opaque outline. A soft glow around the exterior of the rectangle can also be used to provide additional visual feedback. This highlight is critical when using the interface via an external device such as a keypad, mobile app, keyboard, or other affordance, and is useful for a sighted companion as it shows where a screen reader’s point of regard is in the user interface (UI).
The ability to zoom in on both text and images (the entire interface) needs to be thought about ahead of time. This is critical for those with low vision but also for anyone who may have forgotten their reading glasses that day, is standing farther from the screen than originally assumed, or for a variety of other reasons. When zoom is engaged on a system, the implication is that a gesture or other affordance exists for panning. This is because if a fixed height and width interface needs to display information at 200% or greater, then both vertical and horizontal panning becomes necessary to explore all the information. A common practice is to reserve single-finger gestures for screen-reading functionality and two-finger gestures for things like pan for a zoom mode. Lastly, visible focus-tracking is critical for zoom on digital interactives. This means that as the user advances across UI elements, the view auto-scrolls to ensure their point of regard is always in view. This is another reason the aforementioned focus highlight is important.
A control for being able to adjust brightness is helpful in many situations, not only is this an assistance to those with light sensitivity, but it is also helpful for those who rely on high contrast.
If the interface can be placed in monochromatic mode (either black/white or, more commonly, grayscale), this inversion of colors drastically assists with contrast issues and readability of text as well as exploration of graphics. This will also help those with various forms of color-blindness, but it is not a solution in and of itself for those populations. Making sure to never use color alone as the sole way of conveying information is the conceptual assumption and prerequisite that allows this affordance to be even more helpful for visitors. If nothing relies solely on color (e.g., text and iconography are also used to convey meaning) then allowing the interface colors to be switched into monochromatic mode is a powerful win (consider dark mode on most modern devices to illustrate this point).
When an interface can be used by a keyboard or other physical inputs, it not only helps those who cannot or do not prefer to use a touchscreen, but this extensibility also lays the groundwork for supporting switch users in the future.
Adjustable height provides much-needed flexibility. By allowing the height of screens or other presentation-media to be adjusted, the environment becomes more usable to those of small stature, children, those who are taller than nominal levels, etc.
The angle of view is important when considering visitors at all different positions, seated and otherwise. By allowing for even a small range of motion in the vertical and horizontal direction, orthogonal to the visitor’s face, the display of visual information can be made much easier to consume by a variety of visitors. Often, adjusting height can present a logistical challenge, whereas tilt control is easier to achieve and can resolve many similar challenges. Both are preferred, but if height is not adjustable, tilt is a great fallback.