While preparing Knowbility training materials on how to code for screen reader accessibility, I decided to double-check some of the finer details about how assistive technology actually gets hold of information from web pages. This process quickly took me down a rabbit hole resembling Journey to the Center of the Earth when sources across the A11Y community each told a somewhat different story. This post is the first in a three-part series that explains the technical details in a way that mops up the discrepancies. Predictably, all the MVPs turned out to have had keys to the Knowbility offices at some point, but I took the scenic route.
In this first installment, we'll start from absolute scratch and recount the widely available basic story. It's written for beginners, but the fact that I describe web accessibility in the broader context of software accessibility might offer a different twist for current practitioners as well. Parts 2 and 3, on the other hand, are more technical. Although those discussions may largely be of interest only to "inquiring minds that want to know", understanding the history and variety of the specialized protocols at the heart of this process sheds light on why screen readers sometimes perform differently in the same application or on the same web site while also providing a glimpse of a likely future.
What Is an Accessibility API?
First, let's be clear about what we mean by interface. A graphical user interface (GUI) provides the means for a user to interact with the software internals: windows, menus, buttons, text, methods of interaction like mouse clicks and keyboard input, etc. The task of assistive technology (AT) is to build another UI on top of the GUI in order to support alternative input and output methods that weren't provided for but darn well should have been.
Meanwhile, an application programming interface (API) provides a protocol for conveying information between applications. Whatever needs to be communicated, there's an API for that. For example, the DOM standard is a programming interface for HTML and XML that supports platform-independent JavaScript.
Finally, an accessibility API provides the protocol for conveying necessary information to assistive technology or whatever else wants to use it, like automation software. That information includes an object, its role (sometimes referred to as its type), current state, properties, supported event notifications, actions the AT can perform on it, the object's children, how the children are doing in school, etc. The state of a button can be "pressed" or "unpressed"; a checkbox, checked or unchecked; an object, focused or unfocused. One of the button's properties might be that it activates a popup. These make up the programmatic “semantics” that a user agent can understand.
When we invariably discuss accessibility APIs in terms of "screen readers and other assistive technologies", that's largely because blind users have little or no access to the informal cultural semantics of color, size, position, and so forth. Accordingly, screen readers are uniquely glued to every inch of the accessibility API.
The object-oriented structure of accessibility APIs was designed from the beginning in the 1990s to mirror the Windows component object model (COM) and, a few years later, the core Mac OS X Coco API. In between, the web's new document object model (DOM) took on the hierarchical tree structure that likewise came to characterize the accessibility models. When Windows builds its accessibility tree, apps are child elements of the desktop. Each app exposes its child elements the same way: a dialog, the buttons within the dialog, a web page, the elements on the page, and so on.
Standard operating system components are, like standard HTML elements, accessible right out of the box—their semantics are baked in. Additionally, any application that doesn't want to be a jerk adds appropriate semantics to its non-standard controls and puts them on its accessibility tree. A screen reader moves through a forest of these accessibility trees, which make up the sum total of the world it can pass through its UI. …Unless you are JAWS or NVDA, but we'll come to that in Part 2.
When something important happens, like the display of new content, the application is responsible for posting an event notification to the platform API, including the type of event, the object, and other relevant information. The AT, meanwhile, is responsible for registering which types of events it wants to listen for. The same responsibilities work the other way as well when AT sends actions to the application. An article about the accessibility model on Mac references an older API, but it remains a good description of how accessibility APIs work from a software development perspective. When the user moves focus into a dialog box, for instance, the Apple VoiceOver screen reader uses the API to query the app for that object and all its children, which would include things like buttons, a text node, and a list box. Moving to the list box exposes its children, which are the list items.
The process of passing messages between running applications by way of various APIs is called inter-process communication (IPC). It has always been a defining feature of multitasking operating systems. We'll frequently come back to this concept in parts 2 and 3, so consider this a bit of foreshadowing.
With accessibility protocols in place, assistive technology doesn't need to build in code for communicating with different applications in different ways or for each time a software update breaks accessibility. That was very often the story through the early 2000s. While all of that can sometimes remain true even today, progress in the APIs and their adoption has made for an increasingly smooth ride from an accessibility perspective. Good user experience for AT users is often an entirely different story, but that's a rant for another day.
The Web Accessibility Tree
A web browser's accessibility model has also to include a web document object whose children include all the accessible objects from the DOM.
Standard HTML elements already have implicit roles if they contribute to the meaning or functionality of the web page. So, they are loaded onto the accessibility tree. But a few other DOM elements aren't. For example, <div>
and <span>
are empty boxes for putting meaningful objects into, so they have no meaningful role. The browser strips them out but spares their children and adopts them into the tribe—text inside the element will be presented, for instance. The <html>
element has a role of document, but, honestly, so what? It gets voted off the island. <SVG>
opens a portal to a parallel universe of vector graphics that could represent anything. Until you tell me your role, SVG, you are dead to me. Other examples exist. Out they go, unless the developers add semantics to them, such as when they've put a non-standard control into a <div>
element—which they have an unhealthy fondness for doing even when standard HTML would do. Since that's what the Accessible Rich Internet Applications standard (WAI-ARIA) is for, we need to talk about it.
ARIA is a collection of HTML attributes that contributes semantics to the accessibility tree.
Well, that was easier than I thought it would be. Except, of course, for the fact that the first rule printed on the wall of the ARIA dojo is "Don't use ARIA", followed by "unless you have to" in a very small font. You must first find a master to teach you all the rules. Until then, Grasshopper, it's wax on, wax off. And yes, those are two different pop culture references—it's mixed martial arts!
ARIA is the counterpart to the very easy process of specifying accessible attributes in software applications. The accessibility API is the fate of all. Ben Myers puts it best: the responsibility of web developers is "to be good stewards of the accessibility tree". I like that. So go plant a tree.
Wrapping Up
Firefox and Chrome expose the page's entire accessibility tree through their developer tools, and all browsers will show accessibility attributes for the current DOM object. You can monitor the accessibility API on Windows and Android using MicroSoft's Accessibility Insights or, on Mac, through the Accessibility Inspector built into XCode.
Here's the list of APIs we'll be discussing. MicroSoft Active Accessibility (MSAA) came first, shipping with Windows 98. Starting with Windows Vista, UIAutomation largely took a fresh start, in part to account for newer types of objects and events. The open standard IAccessible2 is built into Chrome and Firefox on Windows. The API built into Mac and IOS is called UIAccessibility, which on Mac replaced NSAccessibility as part of Apple's effort to unify software development across the two platforms. Honestly, where do they come up with these names? We're skipping other platforms, but the same principles discussed above apply.
The discussion so far describes how the screen readers built into operating systems work—VoiceOver, Narrator, and TalkBack, as well as Linux's ORCA. JAWS, on the other hand, has run the mean streets of Windows since 1995, and NVDA since XP. They wear leather jackets, carry concealed weapons, and go around saying things like "I'm not here to make friends. I'm here to win". Windows still has some rough neighborhoods, and we blind folks need allies who don't always play by the rules, as you can read about in Part 2. This is your trigger warning for violence: things are going to get hacked.