Note: This information has not been revised since April 2013 and is scheduled for revision soon.
As laws, rules and technologies have changed recently, there has been a great deal of interest in the topic of captioning video content, particularly web-based video content. This resource was created in response to that need.
Laws related to the accessibility of information technology (IT) have been on the books for many years. In the United States, regulations related to Section 508 of the Rehabilitation Act have clearly stated the requirement that all video content that represents human dialog must be provided with synchronized captioning. The requirements of Section 508 had been interpreted to cover all federal government IT services as well as IT services provided by state government and various entities (including state colleges and universities) in states that receive funds through the Assistive Technology Act. (For more information see “Implications for Higher Education of the American with Disabilities Act and Section 508 of the Rehabilitation Act Amendments of 1998”.)
In recent legislation, the Federal Communications Commission (FCC) has establishing rules for the closed captioning of video programming delivered via Internet protocol (i.e., IP video), as required by the 21st Century Communications and Video Accessibility Act (CVAA). The new rules govern TV stations, cable systems, broadcast and cable networks and virtually every other professional video program producer who is now, or will be in the future, making programming available online (Tremaine, D. W. JDSupra January 18, 2012).
Apart from the legal requirements to provide captioning for video content, recently the search engine Google has announced that it has begun indexing captioned content from websites that host captioned video content. Since captioned content provides a rich source of keyword data, this means captioned videos will be easier to find in on-line searches. Read more about closed captioning and search engine optimization…
What is captioning?
Captions are text versions of the spoken word used in various forms of multimedia such as movies, television and digital video files. Captions may also be found on videotape, CD-ROM or DVD recordings, as well as occasionally on the broadcast of live events (e.g., closed captioning on television programs). Captions not only display words as the textual equivalent of spoken dialogue or narration, but they also include speaker identification, sound effects, and music description.
The captioning of multimedia enables those who are deaf or hard of hearing to have full access to media materials that otherwise would not be readily available. Though captioning is primarily intended for those who have a disability related to hearing, it has also been found to help those who can hear but are in situations where it is difficult to hear. In addition, captioning has been found to be helpful to individuals who may not be fluent in the language in which the audio is presented.
Common accessibility guidelines indicate that captions should be:
- Synchronized – the text content should appear at approximately the same time that audio would be available.
- Equivalent – content provided in captions should be equivalent to that of the spoken word.
- Available – caption content should be readily available to those who need it.
For video on the web, synchronized, equivalent captions need to be provided any time audio content containing human dialog is present. This pertains to the use of audio and video played through multimedia players such as Quicktime, RealMedia, or Windows Media Player, but can also pertain to such animation technologies as Flash or Shockwave when audio content is a part of the multimedia presentation. (Source: Adapted from WebAIM – Web Captioning Overview)
Open Captioning versus Closed Captioning
There are two types of captioning used: open captioning and closed captioning. Simply put, open captions are always in view and cannot be turned off, whereas closed captions can be controlled (turned on and off) by the viewer.
For video that is displayed on television sets, special devices called decoders must be available in order to view closed captions. Since 1993, decoders have been required to be built into television sets 13 inches or larger sold for use in the United States. Generally, this means that the viewer is able to turn closed captioning on and off through a setting in the TV’s menu or through a button on their remote control.
It should be noted however that the recent arrival of High Definition Television (HDTV) has create a number of challenges in the use of closed captioning. Because of the various ways an HDTV signal can be provided, the ability to turn closed captioning on and off varies and is often complicated. For a discussion and direction on how to turn captioning on and off an HTDV, please see the Wikipedia entry “HDTV interoperability issues.”
AccessIT, the accessibility resource located at the University of Washington provides the useful information about open and closed captioning:
When videos are accessed on the World Wide Web, they also may have captions that are open or closed. Closed captions appear only when the user agent (e.g., a media viewer/player) supports them. (Update: All of major media viewer software applications (e.g., QuickTime, Microsoft Media Player, Real Media, Adobe Flash) now support closed captions, however there are about 30 different formats of caption files, the use of which one depends on which type and version of the media player you are using.)
Delivering video products with closed captions places responsibility on the user to understand how to turn captions on, either on their television sets or in their media viewer software. So that the user isn’t faced with this burden, some people argue in favor of delivering video products with open captions. Open-captioning proponents also argue that captioning has universal design benefits for people other than those with hearing impairments (e.g., people whose first language is not English; people in noisy airports, health clubs, sports bars). Also, when the spoken word of all speakers is open-captioned, additional translation for speakers who have speech impairments is not required.
Despite the advantages of open captions, there also are disadvantages. Some disadvantages stem from the fact that open captions are an actual part of the video stream, whereas closed captions exist as a separate text stream. If captions are preserved as text, users potentially can archive and index video content and allow users to search for specific video content within these archives; this ability is lost with open captions. Also, open captions, unlike closed captions, are subject to loss of quality when the encoded video is compressed.
With the mix of advantages and disadvantages of open and closed captioning, it is important for the video producer to evaluate the use of the video product and make an informed decision about what type of captioning to use. For example, if a training videotape is specifically designed for individuals with disabilities or for large audiences or for use in noisy conference exhibits, open captioning might be a good choice. The most important decision is to choose to caption the product; the choice to make it open-captioned or closed-captioned should occur after consideration of all factors regarding its use.(From “What is the difference between open and closed captioning?” – AccessIT)
How to Provide Captioning
Captioning can be provided in a variety of ways depending on the “medium” being used. In broadcast television in the United States, closed captions are typically created in advance for pre-recorded programming and included in the broadcast signal. For live television broadcast events, the captioned information is created by trained transcriptionists remotely and then mixed with the television feed. For more information about captioning on television please see: Wikipedia – Closed Captioning and National Institute of Health – Captions For Deaf and Hard-of-Hearing Viewers.
For live, face-to-face events or events being broadcast via closed circuit to large audiences (e.g., in the classroom, at meetings, workshops and other presentations including live theater), the use of a CART (Communication Access Realtime Translation) transcriptionist is recommended. While it is possible to have the CART transcriptionist in the same room as the event, very often the CART Transcriptionist is located elsewhere (i.e., Remote CART). In either case, the CART Transcriptionist listens to the audio from the event source and creates the transcription which is sent by digital signal to:
- In the case of live events, a computer screen or projector for live display on a larger screen;
- For closed-circuit video events, the digital signal is mixed with the video feed using the appropriate method depending on the equipment being used;
- For webinar events, the digital signal is incorporated into a pod or window in the webinar application.
CART may also be used in synchronous (Real Time) webinars and webcasts and the text provided as a separate feed for participants. Many of the major webinar applications (Adobe Connect Pro, Cisco WebEx ) allow for the synchronous streaming and integration of CART content in their products.
For pre-recorded audio or video content used on the Web, the choices for providing captioning are far greater. A number of captioning software applications for all operating systems exist including some which are free and open source. In addition there are a number of on-line (server-based) captioning programs and services available. (See Desktop and on-line captioning services below for more information.)
Professional Captioning Services
In situations where professional transcription services are desired, three types of captioning services are available. In the Human Translation method, a transcriptionist views and listens to the audiovisual recording and creates a digital document of the dialog content adding time markings (time-code) and descriptions of important non-dialog audio.
In the Machine Translation method, the digital document containing the dialog content is created with the use of special speech-to-text software. Unfortunately, despite major advances in speech-to-text technology, these systems are still far from perfect and Machine Translations tend to be the least accurate of all captioning methods. In addition, the non-dialog audio and other descriptions cannot be accomplished by the Machine Translation method as these features can only be added by a human.
The Hybrid Translation method begins with a Machine Translation from an automated speech-to-text system and the results edited and finished by humans who fix errors and add the non-dialog audio and other descriptive information to the transcript.
Desktop and on-line captioning services
While it is probably best to seek the services of a trained professional transcriptionist when a high quality caption is needed, there are software alternative that will allow anyone with good listening and typing skills to create a caption file. Currently, there are a number of free or inexpensive desktop captioning applications for Windows OS and Macintosh OS computers. These applications allow the user to view short sections of the video/audio content and type in the transcription. The software has tools which replay the clip repeatedly so the user can hear the audio content as many times as necessary before moving on. After the captioned dialog content is created, the application offers the user options to save or convert the content into several different formats. In addition, these applications also allow users to take existing captioning files, edit them and convert them into different formats.
There are also several on-line services available for creating captions for video. These websites work essentially the same as the desktop applications but usually require that the video content be loaded onto, and served from, the service’s own web site. While these on-line services will provide users with code to embed the video onto another website, the video and captioning content continues to be served up from the service provider’s web site and cannot be moved to another location. This system can sometimes prove to be problematic particularly when copyrighted materials are being used.
In March of 2010, Google’s YouTube service deployed a free service call “automatic captioning” which allows the creation of a machine translation caption file for uploaded video. This feature must first be activated in the user’s YouTube Settings and viewer must then click on the “CC” icon on the individual video to create and view the “subtitles.”
As a Machine Translation method, YouTube’s automatic captioning leaves much to be desired. The system uses the experimental Google Voice service and the quality of the resulting captions improves only if one person is speaking, the speech quality is of high quality and the rate of speech, slow. YouTube does provide a means for the account owner to edit the transcription on their service or upload and edited transcript. The owner can also elect to download a text copy of the machine translation edit it with a desktop captioning application, and then re-uploaded it to YouTube.
Captioning audio/visual content is a necessary part of meeting the needs of people with disabilities, but it is something that requires some cost and time. When businesses, organizations and institutions wish to use audiovisual content which contains human dialog, they must plan and budget the necessary resources to ensure the final product is of high quality.
Here is a list of desktop captioning applications and on-line captioning services:
- MAGpie (open source from CPB/WGBH National Center for Accessible Materials).
- MovieCaptioner (Mac and Windows applications available. Single use license around $100).
- CapScribe Mac-based video editor for captioning and description. Free for non-profits and schools.
- Captionate (Single use license around $60, educational and site license discounts available).
- Jubler (open source).
- Subtitle Workshop (open source).
- Universal Subtitles.
- Easy YouTube Caption Creator from Accessify.
The following is from the Maine.gov Bureau of Rehabilitation Services (BRS) and details resources close to Maine.
Businesses Providing Closed Captioning Services
Carol Studenmund, RDR, CRR, CBC, CCP
1123 SW Yamhill Street
Portland, OR 97205
Web site: lnscaptioning.com
Businesses Providing Communication Access Real-time Translation (CART)
Shari Majeski, RMR, CCP, CBC
Nationally certified provider of remote CART and nonbroadcast captioning
(952) 388-1546 (V)
Web site: captionlogic.com
60 Starlight Drive
Brewer, ME 04412
Carol Studenmund, RDR, CRR, CBC, CCP
1123 SW Yamhill Street
Portland, OR 97205
Web site: lnscaptioning.com
Photo credit: Image licensed through Creative Commons by Daniel Oines