Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:gsoc:2017:saurabh [2017/08/24 16:54]
saurabhshri
public:gsoc:2017:saurabh [2018/01/06 20:03] (current)
willem ↷ Page moved and renamed from public:gsoc:gsoc2017:ccaligner_word_by_word_audio_subtitle_synchronisation_saurabh_shrivastava_gsoc_2017 to public:gsoc:2017:saurabh
Line 3: Line 3:
 // //
  
 +//Blog entry for final submission : (https://​saurabhshri.github.io/​gsoc-final-submission/​) //
 ---- ----
  
Line 31: Line 32:
 In the above example each word from subtitle is tagged with beginning and ending timestamps based on audio. In the above example each word from subtitle is tagged with beginning and ending timestamps based on audio.
  
-{{ :​public:​gsoc:​gsoc2017:​karaoke-480.gif?nolink |}}+{{ youtube>​38_27E1PxXA?large }} 
 +\\ 
  
 CCAligner makes use of automatic speech recognition to analyse audio and recognise words to perform alignment. The project comprises of both user friendly tool and developer friendly API. CCAligner makes use of automatic speech recognition to analyse audio and recognise words to perform alignment. The project comprises of both user friendly tool and developer friendly API.
Line 80: Line 82:
  
 4. Link to my Github profile : https://​github.com/​saurabhshri 4. Link to my Github profile : https://​github.com/​saurabhshri
 +
 +
 +===== Some Demostrations =====
 +
 +{{youtube>​6VnhC8u_d40?​small}} ​
 +  * Karaoke Demo 2 [Ted Talk]  ​
 +
 +\\  ​
 +
 +{{youtube>​j_zeixo-zJY?​small}} ​
 +  * Karaoke Demo 3 [Cartoon Show]  ​
 +
 +\\  ​
 +
 +{{youtube>​8tTDX6NZGsU?​small}} ​
 +  * Karaoke Demo 4 [Discussion Video]  ​
 +
 +\\  ​
 +
 +{{youtube>​tFrf0TVnqIQ?​small}} ​
 +  * Transcription Demo [Reality Show]  ​
 +
 +
 +\\   
 +
 +===== Third party libraries and dependencies =====
 +
 +All the third party libraries are located in `src/​lib_ext` and along with their individual licenses.
 +
 +1. PocketSphinx : PocketSphinx is a lightweight speech recognition engine. It is portable and is used in ASR based alignment. //​(https://​github.com/​cmusphinx/​pocketsphinx)//​
 +
 +2. SphinxBase : Basic libraries as well as some common utilities for manipulating acoustic feature and audio files. This is used by PocketSphinx. //​(https://​github.com/​cmusphinx/​sphinxbase)//​
 +
 +3. srtparser.h : srtparser.h is a single header, simple and powerful C++ srt subtitle parsing library that allows to easily handle, process and manipulate srt subtitle files. //​(https://​github.com/​saurabhshri/​simple-yet-powerful-srt-subtitle-parser-cpp)//​
 +
 +4. webRTC : WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. It is used to perform VAD in the project. //​(https://​webrtc.org)//​
  
 ===== Known Issues / Future Work Needed ===== ===== Known Issues / Future Work Needed =====
  • public/gsoc/2017/saurabh.1503593680.txt.gz
  • Last modified: 2017/08/24 16:54
  • by saurabhshri