Captioning Video


Updated: 02/03/2022

Closed Captioning logoLaws related to the accessibility of information technology (IT) have been on the books for many years. In the United States, regulations related to Section 508 of the Rehabilitation Act have clearly stated the requirement that all video content that represents human dialog must be provided with synchronized captioning. In addition, Section 508 requires a written transcript for pre-recorded audio-only content. The requirements of Section 508 had been interpreted to cover all federal government IT services as well as IT services provided by state government and various entities (including state colleges and universities) in states that receive funds.

The international guidelines for web accessibility, the W3C’s Web Content Accessibility Guidelines (WCAG, v 2.0 – level AA) have similarly required synchronized captioning for video content (containing spoken human language) and written transcription for pre-recorded audio-only content. The WCAG v2.0 also requires that video content have an audio description of the actions taking place in the recording.

In 2017, the U.S. Access Board approved the update (or “refresh”) of Section 508. The new Section 508 “harmonizes” their requirements with other guidelines and standards both in the U.S. and abroad, including standards issued by the European Commission and with the WCAG.

In 2010, the Federal Communications Commission (FCC) establishing rules for the closed captioning of video programming delivered via Internet protocol (i.e., IP video), as required by the 21st Century Communications and Video Accessibility Act (CVAA). The rules govern TV stations, cable systems, broadcast and cable networks and virtually every other professional video program producer who is now, or will be in the future, making programming available online (Tremaine, D. W. JDSupra January 18, 2012).

Benefits of Creating Accessible Audio and Video Content

Apart from the legal requirements to provide captioning for video content, Google, and other search engines indexes captioned content from websites that host captioned video content. The captioned content provides a rich source of keyword data thus improving search engine optimization (SEO).

As web and internet-based content is increasingly being used by other types of digital technologies, it is also beneficial to ensure your content meets all of the industry standards and guidelines to ensure it can be used by these devices.

Accessible podcasts and other pre-recorded audio content

When pre-recorded audio is provided as part of on-line content, the accessibilty guidelines (Success Criterion Level AA) require a written transcript be provided. It is noted that Success Criterion AAA requires all audio content (i.e., live audio) to have captioning similar to the guidelines for video content. Current Section 508 guidelines require Success Criterion Level AA.

Written transcripts allow anyone that cannot access content from web audio to read a text transcript instead. Transcripts do not have to be verbatim accounts of the spoken word. They should contain additional descriptions, explanations, or comments that may be beneficial, such as indications of laughter or an explosion.

Methods for creating accessible transcripts

Pre-production – create a script. When planning a podcast it makes sense to create, at the very least, an outline of what you plan to talk about. From here, you can add to your outside the specific text you plan to use. Here you should carefully work on your text to make the timing and pacing efficient and enjoyable to listen to and make sure you cover everything you want to say. The resulting script does not need to be read exactly as written, although you should not stray too far from the script. The final script in digital form becomes the basis of the transcript that can be used to make the podcast accessible. Simply edit the script to make it match the final audio and you have made your podcast accessible content.

Post Production – There are a number of options for doing this:

  • Manually transcribe the content using a word processor. Caution: When performing manual transcription, it obviously helps to be able to type fast, ideally fast enough to keep up with the speakers. Approximate words per minute rates are around 150–200 for typical podcast speakers, and 40–80 for average-to-good typists. That difference creates a problem.
  • Use of transcription software. This is specialized software that combines a media player with a text editor. You play the media and start typing what you hear (as fast as you can), pause, rewind a bit, repeat. – see Resources for a list.
  • Text-to-speech software. This again is specialized software where the computer attempts to transcribe the spoken text. Results vary based upon the quality of the audio, the speed of the speech and the presence of any accents or speech variables. Again, many resources to choose from.
  • Transcription service. A professional service where you send your audio content to a transcription business and pay to have the content transcribed. See resources.

Delivering accessible podcasts

To meet accessibility guidelines, audio content must be served up using the following:

  • Accessible Controls. While modern content management systems running the latest HTML and advanced browser technology make posting an audio file to a website generally very easy (It usually simply involves adding a short line of code), accessibility guidelines call for ensuring the audio “player” has accessible controls. This means that people who are using assistive technology (AT) like screen readers and switches can start and stop the recording as necessary. The native HTML 5 audio player is accessible. If you are using a podcasting service or alternative method of serving up your audio content, you will need to test to ensure the delivery system is accessible to AT.
  • Posting transcript. The posting of the transcript is very easy and can be either an attached and downloadable accessible file (e.g., word processor file) or HTML content at the same location as the audio file.

Examples of Accessible Podcasts

Accessible Video Content

When video content (i.e., synchronized media) is distributed via the web, accessibility guidelines require synchronized captioning of the audio portion of the recording and a descriptive narration of the actions taking place in the video.

What is captioning?

Captions are text versions of the spoken word used in various forms of multimedia such as movies, television and digital video files. Captions may also be found on videotape, CD-ROM or DVD recordings, as well as occasionally on the broadcast of live events (e.g., closed captioning on television programs). Captions not only display words as the textual equivalent of spoken dialogue or narration, but they also include speaker identification, sound effects, and music description.

The captioning of multimedia enables those who are deaf or hard of hearing to have full access to media materials that otherwise would not be readily available. Though captioning is primarily intended for those who have a disability related to hearing, it has also been found to help those who can hear but are in situations where it is difficult to hear. In addition, captioning has been found to be helpful to individuals who may not be fluent in the language in which the audio is presented.

Common accessibility guidelines indicate that captions should be:

  • Synchronized – the text content should appear at approximately the same time that audio would be available.
  • Equivalent – content provided in captions should be equivalent to that of the spoken word.
  • Available – caption content should be readily available to those who need it.

(Source: Adapted from WebAIM – Web Captioning Overview)

Open Captioning versus Closed Captioning

Open Captioning. Open captions, also known as burned-in, baked on or hard-coded captions, are seen by everyone who watches the video (or film). Open captions are a permanent feature on the video and can’t be turned on and off. Open captions are often used for videos which are being played on website video players that don’t have closed captioning functionality.

Closed Captioning. The term “closed” (versus “open”) indicates that the captions are not visible until activated by the viewer, usually via the remote control or menu option. The caption content resides in a separate file and is added or layered on top of, or next to, the video content during playback.

The differences. While both versions are acceptable for accessibility purposes, the Closed Captioning method provides the user with greater usability as they are often able to control the font type, size and color of the caption. In some video playback systems, the user may also be able to control the location of the closed captions.

Synching with actions in the video. Unlike the podcast, the creation of captioning is not limited to simply creating a transcript of all the spoken text. For video content, the transcript must be broken down into sections and synched with the action in the video. This is done with the use of timestamps.

Creating Captions

There are several methods for creating captioned content:

Manually transcription. Methods for creating closed captioning are similar to transcribing audio-only/podcast content. Because of the need to time stamp and synch caption content, the process is usually done in post-production (see CART below). Captioning transcription can be done manually with a simple word processor or by using specialized software. Similar to podcasts, there are also many options and resources for the DIY (do-it-yourselfer), but given the need to make timestamps, the software and the techniques are more complicated. One of the advantages of using specialized software is the ability to get the caption synched correctly. Sometimes this becomes an art form!

Automated systems. There are many automated captioning systems (a google search will reveal many) that will use text-to-speech technology to convert the spoken content into written text, however, results will vary. The best example of this is the automatic transcription feature built into YouTube (YT). With the click of a switch, YouTube’s servers will create a synced text transcript of the audio portion of any video that is uploaded to their system. Depending of the quality of the audio, the results may be very good or very poor. Fortunately, the person who owns the YT channel can download the automated caption file and edit it off line, or used the built-in caption editing tools and perfect the transcript file on the YT site. YT will also allow content developers the option of uploading a prepared caption file or simply a transcript. The YT system will then sync the new caption file with the video automatically. See also Captioning Live Video – a discussion about new automatic and AI-generated captioning technology

Outsourced methods.

  • Combination of manual and automated. Similar to the method described above which uses the YouTube automatic captioning system followed by manual editing, many commercial captioning firms will use their own combination of automatic and manual transcription
  • Manual only systems. Some captioning vendors will only used specially trained professional transcriptionists to create the caption content and time stamps.



Communication Access Realtime Translation or CART is live captioning of audio-only, or video content. CART is primarily used during live events, but the recorded text can also be edited and used in post-production for video that will be archived and made available for later viewing. CART may also be used in synchronous (Real Time) webinars and webcasts and the text provided as a separate feed for participants. Many of the major webinar applications (Zoom, Cisco WebEx ) allow for the synchronous streaming and integration of CART content in their products.

Creating Audio Description

As noted, to meet current accessibility guidelines, all pre-recorded video content must provide something called audio description” in addition to captioning of the audio portion of the video recording. Audio description is sometimes referred to as “video description” or “descriptive narration,” so we will refer to it as “description” to avoid any confusion.

Although description has been around for many years, the requirements in web accessibility guidelines were elevated in WCAG v 2.0 and are now required even to meet Level A of the guidelines. Description is defined as: “narration added to the soundtrack to describe important visual details that cannot be understood from the main soundtrack alone. In standard audio description, narration is added during existing pauses in dialogue.” It is noted that, “where all of the video information is already provided in existing audio, no additional audio description is necessary.”

Challenges many. Description is difficult to produce without sufficient expertise and training. Currently, there are few free description service providers and we could find no free editing tools. Standards for the provision of description are still in draft form despite the fact that Section 508 has now adopted WCAG v2.0. Description is difficult to accomplish for the layman, and training is strongly encouraged. There are several training programs around the country that teach people how to create good quality description (see resources below). This is one aspect of accessibility that you may wish to outsource to professional services. Vendors  3Play Media and AST/CaptionSynch, had prices ranging from $12 to $20 per minute to provide both captioning and description (as of summer 2019 – check for current rates).

To effectively incorporate the video description in to the final production, it is often necessary to pause sections of the video to “fit in” description thus requiring re-editing the video and audio tracks. Because this can change the experience for some views, producers will often create two version of the final video production, one with the original video/audio and closed captioning and one with the edited video/audio and description.

Integration of the final video production with both captioning and description presents additional technological hurdles. 3PlayMedia does have a proprietary player that will serve up the three stands of content (video, captioning and description).

Delivering Accessible Video

Video player and controls. This is a controversial area. As with audio-only/podcast content, the player used to view the video content, along with the synched caption content must be accessible to people using assistive technology (AT). Unlike in the audio-only realm, there are many video players used, many of which are not accessible. Unfortunately, there is no native support for closed captioning built into HTML5 standards. I have provided a link to some articles on the topic with some video players that have been reported to be “fully accessible.” However, there are other articles on the web that suggest that all is not perfect. FMI – Read: “Accessible HTML5 Media Players & Resources Accessible HTML5 Media Players & Resources” on DigitalA11y…

Captioned content and controls. If closed captioning is used, the video player must also allow for captions to be turned on and off. Ideally, the video player provides the use the ability to resize and change the color, background or type of font, however, this is not a current requirement in the accessibility guidelines.

Examples of Accessible Captioned Video

Examples of Audio Description

Final thoughts

Captioning and describing audio/visual content is a necessary part of meeting the needs of people with disabilities, but it is something that requires some cost and time. When businesses, organizations and institutions wish to use audiovisual content which contains human dialog, they must plan and budget the necessary resources to ensure the final product is of high quality.


Tools and Resources – Podcasts

Resources and Tools – Video Captioning

Resources – Description

Caption Video – Live and Post-Production

Captioning Resources

Businesses Providing Closed Captioning Services

The following is a list of captioning service providers from the Bureau of Rehabilitation Services (BRS) and details resources close to Maine.

Automatic Sync Technologies
Haywood, CA
(877) 278-7962
Web site:

3 Play Media
34 Farnsworth Street
Boston, MA 02210
(617) 764-5189
Web site:

Karasch & Associates
1646 West Chester Pike, Suite 4
West Chester, PA 19382
1-800-621-5689 (V)
(619) 696-2008 (FAX)

Closed Caption Maker
Walter Gallant
1955 Kensington Street
Harrisburg, PA 17104
1-800-527-0551 (V)
Website: Closed Caption Maker

Custom Captions
Alice Durrant
458 South 2470 West
Provo, UT 84601
(801) 370-9878 (V)
P.O. Box 835215
Richardson, TX 75083
214-801-7606 (V)
Michael Ward

Video Caption Corporation
88 Hunns Lake Road
Stanfordville, New York 12581
800-705-1203 (V)
800-705-1207 (FAX)

Video Production Services
Carol Lane
North Monmouth, ME 04265
933-3896 (V)
1-800-848-8550 (V)
Website: Video Production Services

Frameweld/National Captioning Institute
Recap’d Captioning Service
44-02 23rd St., Suite 420
Long Island City, NY 11101

LNS Captioning
Carol Studenmund, RDR, CRR, CBC, CCP
1123 SW Yamhill Street
Portland, OR 97205
503-299-6200 (V)
800-366-6201 (V)
Web site:

Verbit – Transcription and Automatic Subtitling and Real Time CART
Israel and New York

For more information on captioning and captioning service vendors, see the website for Captioned Media Program.

Businesses Providing Communication Access Real-time Translation (CART)

Caption Logic
Shari Majeski, RMR, CCP, CBC
(952) 388-1546 (V)
Web site:

Maine CART & Captioning Service
Marsha Dulac-Swain
660 South Belfast Avenue
Augusta, ME 04330
207-242-9378 (V & Text)
Website: Maine CART & Captioning Service

Karasch & Associates
1646 West Chester Pike, Suite 4
West Chester, PA 19382
1-800-621-5689 (V)
(619) 696-2008 (FAX)

Jennifer M. Rodrigues
P.O. Box 20278
Castro Valley, CA 94546
(510) 888-9825 (V)

Dayette J. Zampolin, RMR, CRR, CCP
697 Jug Tavern Road
Downsville, NY 13755
(607) 363-7808 (V)
Website: Captionears

Verbit – Transcription and Automatic Subtitling and Real Time CART
Israel and New York


Access Captioning Technology (ACT)
Lisa Sorenson
P.O. Box 614
Gorham, ME 04038
222-2882 (V/TTY/FAX)

revised: 02/03/2022