The Road to Good Intentions Is Paved with Hell

laptop with flames shooting out of the screen

This post is the second in a three-part series about how screen readers obtain what users need from applications and operating systems. Part 1 introduced accessibility APIs (application programming interfaces) and the accessibility tree. I'll conclude the series with a discussion of the IAccessible2 API, which is key to how modern Windows-based screen readers communicate with browsers. First, though, we need to jump back to the point in time when screen readers learned to cope with graphical user interfaces (GUIs), because the tricks learned back then remain critical today, at least for JAWS and NVDA. We will not discuss the web this time, nor have VoiceOver, TalkBack, or Narrator ever used the techniques described here.

Most introductions to accessibility APIs include a similar but shorter and simplified history lesson. This one collects details only found in one or the other, or none of them, throws in some trivia for fun, and cites what my ex-professor brain thinks of as primary sources. Grab your machete, because we're heading into weeds so tall that a tiger couldn't find its way home from an Indian buffet without GPS.

Caveat: I have relied on screen readers since 1988 and was a computer geek decades before being a geek became cool, but I am not a software developer. This post is the product of research. An extended email conversation with veteran accessibility software developer Matt Campbell steered me toward many of the avenues explored in this series. There's much more from our conversation next time.

When Screen Readers Got GUI

Screen readers have never literally read the screen, of course. Only something on your side of the thirteen-inch amber CRT monitor could do that. However, as the earliest screen readers emerged during the mid-1980s, such as Textalker for the Apple II and Jim Thatcher's in-house IBM screen reader, they did read video memory, which consisted of a couple of easily-accessed kilobytes of values representing text characters (i.e., the ASCII standard), plus additional memory addresses that stored color values, once that became a thing. A twelve-year-old could read and write to it: I was, and I did during the early '80s. Screen readers' tasks all revolved around this simple UI. They echoed typed text, automatically read lines scrolling up from the bottom, automatically announced text with a specific background color that designated a selected menu item, provided a special review mode and review cursor that could explore the entire screen, sent their output text string to a hardware speech synthesizer, and granted the user control over all of it through more buttons and switches than a NASA manned space vehicle. When a program switched into "graphics mode" in order to draw the screen as a "bitmap" pixel by pixel, screen readers clammed up.

Trivia alert: Examples of early screen readers included Flipper, named for its ability to flip between two speech profiles (which was fun for about a day); and JAWS, officially standing for Job Access With Speech, but which its creator Ted Hinter admitted actually began as a predatory smirk at the name of its competitor. Add in the Linux Orca screen reader and the Dolphin screen reader, and the take-away is that these folks have a thing about fish and fishy mammals. If you or someone you love has thought about developing a screen reader, consider calling it Tilapia.

The various graphics-based operating systems emerging at the same time represented an entirely different kettle of fish, and I promise that is the only pun in this series. "Graphics mode" became the whole UI.

The solution was to intercept low-level drawing instructions from applications and the operating system itself in order to build a database with which the screen reader could interact as if it were text mode. This approach was dubbed an off-screen model (OSM).

Berkeley Systems, best known for developing the most fun screen savers ever, got there first in 1989 with OutSpoken for Mac. The mouse being of course totally inaccessible for blind users, OutSpoken's entire function set consisted of mouse emulation using the numeric keypad. The UX therefore required users to visualize and explore the screen layout. I and others who have written about using it found it clunky and tiring, in large part because the old Mac OS included almost no keyboard shortcuts that would alleviate the need to tap around the window with one's virtual white cane. Suffice to say, brilliant as OutSpoken was, it did not draw a significant number of screen reader users to Mac, and no other Mac screen reader emerged until VoiceOver became part of the all-new OS X in 2005. Meanwhile, nearly a dozen mostly short-lived screen readers attempted to tackle Windows 3.1. Here, too, the vast majority of us successfully ignored Windows by simply not typing "win.exe" at the DOS prompt.

But no one could ignore Windows 95, the looming widespread adoption of which posed an existential threat to the employability of screen reader users. Microsoft needed to do something about it. So declared numerous disability advocacy organizations that, according to a 2000 Access World retrospective and the recollections of Peter Korn, began to enlist State governments in a potential boycott of the product.

I don't know and don't want to imply that this pressure was the primary impetus, but Microsoft pursued several avenues. First, it hired Chuck Oppermann, then VP of JAWS development at Hinter-Joyce, to lead its accessibility efforts. Second, it licensed the JAWS OSM technology. To its surprise, this upset rival screen reader developers like Berkeley Systems and GW Micro—makers of the Window-Eyes screen reader that would ultimately be eaten by JAWS. So, this idea was shelved.

Microsoft also created the IAccessible specification (I for interface), which put in place the now-familiar tree structure of semantic information about UI objects. This enabled screen readers to make sense of a new type of dialog box in Microsoft Office, as well as custom widgets built with the ActiveX framework found in Microsoft's own applications or embedded on web pages—the Flash plugin, for instance. This interface was the basis for Microsoft's ActiveX Accessibility (MSAA). The "X" was dropped by release time, perhaps because they got tired of telling people ActiveX was a separate brand name and the X in MSAA was silent. That comes from a fascinating 1996 CSUN committee meeting, during which Chuck Oppermann and representatives from around the accessibility field floated ideas for platform-independent OSMs and accessibility APIs (alas, OutSpoken remains the only screen reader to have had both Mac and Windows versions).

I can't think of any major consumer products ever to support MSAA other than Microsoft's own and Acrobat Reader. For that matter, I don't think I ever encountered Flash content tagged for accessibility. It began to make a significant difference in Internet Explorer 5; but, as I'll come back to in Part 3, it only provided a portion of the information a user required. So, MSAA in no way meant the end of OSMs.

The Off-Screen Model

Good technical overviews of how OSMs work come from A 1991 article by Rich Schwerdtfeger and Peter Korn's description of the Berkeley model during the CSUN meeting cited earlier. I'll simplify these and other sources for the sake of a wider audience that can extend to, for instance, me. It may come as a surprise to some readers that, as discussed in the next section, off-screen models remain in limited use today.

The ability to construct an off-screen model relies on the presence of an operating system graphics layer that turns drawing instructions sent by software into the bitmap sent to the video card driver. These interfaces include the Windows Graphics Display Interface (GDI) and the now-extinct QuickDraw from the original Mac OS. The screen reader inserts "hooks" into other running applications in order to intercept calls to those drawing functions. It then inspects those function calls for the information it is programmed to process. Finally, it sends them on to the graphics interface, much like raiding your neighbor's mailbox and steaming open their outgoing mail before resealing it—which I have never done, and you can't prove it.

Except, replace "mailbox" with "house," because I… I mean code… breaks directly into their space. At the height of OSM-building, Window-Eyes and JAWS, as well as screen magnification programs, actually inserted their own mirror display drivers that grabbed literally everything before passing it on to the actual video driver.

The work of an OSM is extremely complex. It starts with reading Strings of ASCII characters from text-drawing functions. The OSM also needs to keep track of the bitmap in order to insert text at the right place in the OSM when, for example, typed characters are being drawn on a line one instruction at a time as part of the same word. It has to keep track of what's visible or not, too, such as when areas on the bitmap are transferred off screen and replaced in order to convey the illusion of 3D pull-down menus or layered windows. Graphical icons have to be recognized and labelled (the legacy of which is the fact that JAWS and NVDA continue to call web images "graphics"). In older Windows versions, the user could reclass a non-standard control, trying out one control type after another in hopes that the OSM might make sense of its appearance if given a hint. Finally, an OSM can integrate information from sources other than GDI-hooking, including MSAA or other APIs.

The developers who created the first OSMs were, in essence, called upon to reproduce a sighted user's visual process of interpreting the screen. Truly, those were the days of wooden ships and iron gender-identifying men.

The process couldn't be perfect. Developers had to slowly add special code for each application that didn't exclusively use standard UI elements, which meant only the most popular or critical ones ever became accessible. Screen readers—at least, mine—sometimes had difficulty tracking the cursor across text in tiny font sizes. They might fail to correctly report words or lines under the cursor in a text document (this remains true in some applications). Anything scrolling off the right edge didn't exist until I thought to maximize the window and set zoom to page width. Edit boxes on web pages sometimes went silent as I tried to review what I'd typed, because part of the box had scrolled off screen or otherwise failed to make it into the OSM. Misrecognitions, memory leaks, and accumulated garbage in the OSM were constants as well. Anybody else remember the crash dialog sporting a white X in a red circle? I attached a car crash sound to mine so that a smile could descend with me into General Protection Fault and Illegal Operation Hell. Troubleshooting was a major life activity, and we end-users had to learn many technical details we mortals were never meant to know.

Then again, in all fairness, Windows at the time famously brought the Blue Screen of Death indiscriminately down upon all its users.

Even though most published recollections of the OSM era focus on the problems, the fact is that I used Windows 9x all day, every day, really with no more annoyance than Windows 11, because screen readers, newer Windows components, "universal" Windows apps, and the UI Automation accessibility API are still hashing things out.

The Death of Screen Reading

Old code dies hard. GDI began life as part of Windows 1.0. Its successor, DirectX, has slowly been taking over since Windows 95. Yet, GDI hangs on and is still used by older applications, because who can afford to sink resources into a drastic rewrite while things are working fine now?

Consequently, rumors of the death of off-screen models have been slightly exaggerated. According to Matt Campbell, who wrote the System Access screen reader, some screen readers continued to rely in part on their OSMs in Internet Explorer until IE 9 switched to one of the DirectX APIs in 2011, at which point Microsoft offered a scaled-down version of UI Automation as a substitute. Although built around accessibility APIs from the outset, NVDA nevertheless added an OSM—which the user explores in what it calls screen review mode—for the sake of GDI-reliant applications. The manual warns us not to be surprised when it doesn't work. Meanwhile, JAWS continues to ship its mirror display driver, though power users were widely reporting on message boards by 2015 that it rarely worked under Windows 10. The JAWS cursor, which utilizes an OSM to track the mouse, only announces "blank" in most contexts today; but, again, the feature will remain until the last trace of GDI vanishes from our universe.

At the outset, I noted that the screen review mode was one of the defining features that gave screen readers that name. Ironically, then, the one thing that screen readers no longer do is read the screen.

In fact, that's good. As I noted regarding OutSpoken, I shouldn't be forced to contend with visual layout in order to grasp meaning. A new array of cursors—including NVDA's object navigation, the VoiceOver cursor, the JAWS scan cursor, and all touch navigation—navigate the UI by object using the accessibility tree. MSAA hangs on in older Windows components like the desktop and taskbar, UI Automation works increasingly well in components introduced since Windows 8, and IAccessible2 provides the accessibility tree in Firefox and Chromium-based browsers, as we'll discuss next time. Thirty-four years after the introduction of the first GUI screen reader, the DOS-era task of interpreting a visual UI has finally been replaced by "official" access to the underlying semantics that the user actually needs, courtesy of accessibility APIs.

As much as I want to end on that note, we ultimately need to ask, is desktop computing more accessible today? The fact that you have made it to the end of this lengthy post means you are probably a serious accessibility professional familiar with the profound increase in awareness, remediation efforts, and training surrounding web and native mobile app accessibility, as well as the miles yet to go before we sleep. On the other hand, the only resources I am familiar with for desktop applications include published specifications, user forums like Stack Exchange, and now Matt Campbell's Access Toolkit—which provides a one-stop, cross-platform framework for supporting all the various accessibility APIs (was that an advertisement? …Maybe). The greatest impact has in fact come from the expanding number of built-in apps by platform developers like Microsoft, Google, and Apple. When it comes to accessible third-party apps, however, we continue to search like a band of foragers, sharing our findings with one another when we can. As in 1995, accessibility continues to rely for the most part on developers' use of standard operating system components. The threat is that, also like 1995, software accessibility has the potential to dictate what we can do and what we can't. Blind people are apparently not supposed to edit video, for instance, or be able to use many "smart" home appliances that likewise lack tactile controls. When developers not only provide a simple, standard UI, but also tag their UI elements properly and even leverage special screen reader functionality, I get a warm fuzzy. And, in the final analysis, that should be the goal of every software developer: to give verbose middle-age blind randos like me a warm fuzzy.

Come to think of it, that's exactly right.