This is an old revision of the document!

CCAligner : Word by Word Audio Subtitle Synchronisation

Developed under Google Summer of Code, 2017 with CCExtractor Development By Saurabh Shrivastava

The usual subtitle files (such as SubRips) have line by line synchronisation in them i.e. the subtitles containing the dialogue appear when the person starts talking and disappears when the dialogue finishes. This continues for the whole video. For example :

01:55:48,484 -> 01:55:50,860
The Force is strong with this one

In the above example, the dialogue #1274 - The Force is strong with this one appears at 1:55:48 remains in the screen for two seconds and disappears at 1:55:50.

The aim of my GSoC project was to build a tool for word by word synchronisation of subtitles with audio present in the video by tagging each individual word as it is spoken, similar to that in karaoke systems. I have named my project CCAligner as it conveniently lays out it’s basic functionality.


The           [6948484:6948500]
Force         [6948501:6948633]
is            [6948634:6948710]
strong        [6948711:6949999]
with          [6949100:6949313]

In the above example each word from subtitle is tagged with beginning and ending timestamps based on audio.

CCAligner makes use of automatic speech recognition to analyse audio and recognise words to perform alignment. The project comprises of both user friendly tool and developer friendly API.

The project was built by me individually. All the external libraries and code used are credited wherever due.

All the technical details are commented in the codes and the documentation is available in the readme of the repository (linked above). Code is properly commented and the variables, classes and other components are named properly in Camel Case for easier understanding of the code. Find compiling, installing and usage instructions here :

In addition to my main project, I also worked on creating a single header SubRip subtitle parser library in C++ and contributing to various open source projects, including, but not limited to CCExtractor, Sample-Platform, AutoEdit2, Rhubarb Lip Sync, CMUSphinx.

1. Created a single header SubRip subtitle parser library in C++. This served as a core in CCAligner subtitle handling. It has very huge number of options available, and is very simple to use.

  • Documentation : Complete documentation is in the readme file located in repository.

2. Improving existing CCExtractor features, fixing issues and help in PR and code reviews.

3. Improving CCExtractor's sample-platform, fixing and reporting issues, and help in PR and code reviews.

4. Link to my Github profile :

The project is in it’s very early stage and is constantly evolving. The available functions, usage instructions et cetera are expected to refactor over time. Feel free to contribute and improve the project. Currently, officially only US English is supported. For other languages and accents, a proper trained acoustic model could be supplied and experimented with. Text tokenisation within the program needs improvement. Feel free to raise any issue in the repository's issue tracker :

  • public/gsoc/2017/saurabh.1503266991.txt.gz
  • Last modified: 2017/08/20 22:09
  • by saurabhshri