PDF/UA examples, Accessible HTML
FallMT 2022 PDF/UA and HTML – Accessible
Ross Moore
July 2023
FallMT-2022: Management Track Assessments Fall 2022
- •
PDF/UA: a pre-publication version of FallMT-2022: PDF version ;
- •
HTML: a derivation to Accessible HTML .
Some features of these PDF and HTML documents will be discussed at the TUG 2023 meeting, in Ross Moore’s presentation video.
The method of production follows these steps.
- 1.
First LaTeX source is processed to build a Tagged PDF valid for the PDF/UA-1 format.
- 2.
The resulting PDF is derived into an HTML version using ngPDF .
- 3.
The HTML page, and any images, is then downloaded and copied to this location.
- 4.
Some post-processing with sed commands is done on the HTML to remove some incorrect structures, invalid attributes, and the ‘- ’ remnants of hyphenation within the PDF.
Accessibility: WCAG 2.1 and ARIA Success Criteria
The following images and explanatory notes indicate how WCAG and ARIA recommendations and requirements are handled. We use the ‘AInspector’ software which operates as a plugin to the Firefox browser, to algorithmically check many aspects of these recommendations.
The table headings are V: violation, W: warning, MC: manual check, and P: pass.
Note that there are no violations and no warnings, only 28 possible issues requiring manual checking, which will be dealt with below. As well as the 37 rules which are algorithmically verified as Pass, there are 57 further rules/recommendations deemed to be not applicable, making 122 tests in all. For instance, since there are no interactive Form fields in the document, then all 17 checks under the ‘Forms’ category are deemed inapplicable; similarly for 15 under ‘Widgets/Scripts’ and 12 under ‘Audio/Video’ as well as a few in other categories.
-
The ARIA concept of ‘Landscape’ has no direct analog in PDF/UA; but corresponds to the highest level nodes in the Tagged PDF ‘structure Tree’. In this document the correspondence is appropriate, as follows; BANNER ↔ Cover page(s), COMPLEMENTARY ↔ \frontmatter, MAIN ↔ \mainmatter, CONTENTINFO ↔ Back cover pages. Furthermore, REGION ↔ sectioning with\section or\subsection, etc. All content is contained in such regions appearing structurally under landscapes.
-
The HTML site is just a single (very long) page, with the PDF/Title occurring as H1 on the front cover, which is the ‘BANNER’ landmark.
All top-level (H2 or H3) headings in REGION sections have unique textual titles. Lower-level headings may include the fish name, to ensure uniqueness; e.g., ‘Terms of Reference: Pollock’. Even where not, such as ‘References’, it is unique to each fish-stock section; that is, branch of the structure tree.
-
Most lists are tagged as such; apart from some, such as authors or panellists. These have semantic markup conveyed through user-defined environments and commands. Changes in text sizing occur with headings or captions, all of which have meaningful semantic markup. Document divisions without header text are marked using an aria-label attribute; e.g., ‘Coverpage’, ‘Front-matter’, ‘Series Information’, etc. Generally reading order follows the order of appearance in the LaTeX source; when perhaps different, then CSS rules are defined and associated through structure tag attributes. The only language changes are for latin names of fish species, being clearly tagged as anchor text for links to a Glossary page.
-
All the images, whether graphical plots, photographs or line-art, have alternative text. Some have appropriate long descriptions provided as structural attribute string-values. There is no mathematical content presented as an image.
-
Each internal hyperlink has anaria-label attribute which is unique to the link’s target. This has been constructed from the string keys used within the LaTeX processing of regular structures. These are phrases such as: ‘starts on page v’, ‘section 1 starts on page 1’, ‘Table 2 on page 5’, ‘Figure 13 on page 46’, ‘Glossary:NOAA’, ‘1st AOP on page 4’, etc., according to context.
As there is only a single HTML page, focus remains on that page after following an internal hyperlink.
-
All the tabular material has a caption describing the purpose and a summary description describing the shape; i.e., number of rows and columns as well as how many have header- or data-cells. All (non-blank) data cells have a Headers array specifying the appropriate row- and column-header. Some tables have hierarchical headers.
-
There are no interactive Forms within this document; neither in PDF nor HTML. So none of the rules/recommendations are applicable.
-
There are no interactive scripts. ARIA roles are supplied, where appropriate. Where custom-roles or aria-labels may have been produced by the translation software, these have been removed from the HTML by postprocessing scripts.
There are no other kinds of active scripts embedded within the PDF nor HTML versions of the document.
-
None of the images in this document contain embedded audio nor video; so most of the rules do not apply.
-
A ‘skip to Main’ link is anchored on the logo at the top of the (1st) page, both in PDF and HTML. This anchor is first in the keyboard focus order; indeed it has ’autofocus’ in the derived HTML. The focus order includes the ARIA landscapes, as well as (in some browsers) all active hyperlinks.
-
There are no images or other content that ‘blinks’, ‘flashes’ or otherwise changes with time. With all content known to be static, manual checks are unnecessary.
-
The visible document ‘Title’ is located within the initial ‘BANNER’ landmark. This string also is used for the HTML<title> content in the<head> section. With the Bookmarks, Table of Contents, List of Tables, List of Figures, Glossaries and Tags tree, there are multiple ways to navigate to particular parts of the document. Keyboard navigation is supported by providing taborder attribute values for primary structural components.
Since there is just a single HTML page, the ‘consistency’ requirements of SC 3.2.3 and 3.2.4 do not apply. Nevertheless, due to the production using LaTeX macro methods, documents built from similar sources and processed using the same modules will have a consistent ‘look and feel’.