Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:gsoc:python_extension_module_technical_documentation_gsoc_17 [2017/09/04 17:22]
skrill [Workflow of Python extension module]
public:gsoc:python_extension_module_technical_documentation_gsoc_17 [2017/09/04 17:42] (current)
skrill [Wrappers for the extension module]
Line 100: Line 100:
  
 ==== Callback Function architecture ==== ==== Callback Function architecture ====
-When using the extension module, when a particular C function is called from Python, the control is transferred to C and returned to Python only after the execution of the function. However, according to the adopted architecture,​ a single function would process the entire sample and extract all the caption frames until the control is passed back to Python for processing the captions in Python. Thereupon, for further processing in Python the user would have had to wait until the end of the extraction of all the caption frames from the sample. This would violate the basic ideology that the module should be able to process the caption frames in Python as they are extracted in CCExtractor rather than waiting till the end of extraction from the entire sample. +  * When using the extension module, when a particular C function is called from Python, the control is transferred to C and returned to Python only after the execution of the function. However, according to the adopted architecture,​ a single function would process the entire sample and extract all the caption frames until the control is passed back to Python for processing the captions in Python. Thereupon, for further processing in Python the user would have had to wait until the end of the extraction of all the caption frames from the sample. This would violate the basic ideology that the module should be able to process the caption frames in Python as they are extracted in CCExtractor rather than waiting till the end of extraction from the entire sample. 
-As a result of this, the callback function architecture was adopted. The main advantage of this architecture is that the moment a line from the caption frame is extracted the line is passed via a callback function to Python and the processing of the extracted line could be done in Python.  +  ​* ​As a result of this, the callback function architecture was adopted. The main advantage of this architecture is that the moment a line from the caption frame is extracted the line is passed via a callback function to Python and the processing of the extracted line could be done in Python.  
-In the present architecture,​ the user has a flexibility to tell CCExtractor which Python function would act as a callback function and a mechanism has been designed to convey this function to CCExtractor. This has been done with the use of my_pythonapi function as discussed in the previous sections. +  ​* ​In the present architecture,​ the user has a flexibility to tell CCExtractor which Python function would act as a callback function and a mechanism has been designed to convey this function to CCExtractor. This has been done with the use of my_pythonapi function as discussed in the previous sections. 
-NOTE: In the api_testing.py,​ I have defined the callback function to be named callback. However, the user has complete freedom to define any name for the callback function. The user needs to note that the callback function would be getting nothing but a line from the caption frame that is extracted by CCExtractor. Further processing of the extracted line is the responsibility of the user. +  ​* ​NOTE: In the api_testing.py,​ I have defined the callback function to be named [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_testing.py#​L25|callback]]. However, the user has complete freedom to define any name for the callback function. The user needs to note that the callback function would be getting nothing but a line from the caption frame that is extracted by CCExtractor. Further processing of the extracted line is the responsibility of the user. 
-After defining the callback function, the user needs to make sure that this function is passed via Python to CCExtractor so that it can be used for callback. For doing so, the user needs to set the second argument of the function my_pythonapi as the callback function. This has been done in the api_testing.py script and the user can refer to it for example. +  ​* ​After defining the callback function, the user needs to make sure that this function is passed via Python to CCExtractor so that it can be used for callback. For doing so, the user needs to set the second argument of the function my_pythonapi as the callback function. This has been done in the api_testing.py script and the user can refer to it for [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_testing.py#​L16|example]]
-A detailed description about why a single line of the caption frame is passed via the callback function and not the entire frame is described in detail in later sections. +  ​* ​A detailed description about why a single line of the caption frame is passed via the callback function and not the entire frame is described in detail in later sections. 
-Also, when the user passes the callback function via Python to CCExtractor so the my_pythonapi function saves a pointer to this function as an element to a global structure, array, defined and declared in ccextractor.h. The element reporter holds the callback function passed by user via Python.  +  ​* ​Also, when the user passes the callback function via Python to CCExtractor so the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​wrappers/​wrapper.c#​L10|my_pythonapi function]] saves a pointer to this function as an element to a global structure, array, defined and declared in ccextractor.h. The element ​[[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​src/​ccextractor.h#​L37|reporter]] holds the callback function passed by user via Python.  
-Whenever the user wants to pass a line to the callback function then the user needs to call the function run which has been defined in ccextractor.c+  ​* ​Whenever the user wants to pass a line to the callback function then the user needs to call the function ​[[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​src/​ccextractor.c#​L553|run]] which has been defined in ccextractor.c.
-run +
-Function declaration- void run(PyObject * reporter, char * line, int encoding)  +
- The run function takes two arguments and their description is as follows: +
-The first argument is the callback function which the user passes via Python. According to present architecture,​ this callback function is contained by the element reporter contained in the global structure named array. So the first argument is array.reporter. +
-The second argument to the run function is the line which needs to be passed to Python. +
-This is how the callback mechanism works for passing the lines from C to Python in real time.+
  
-Processing output ​in Python+=== run === 
 +Function declaration- //void run(PyObject * reporter, char * line, int encoding)//  
 +  * The run function takes two arguments and their description is as follows: 
 +      * The first argument is the callback function which the user passes via Python. According to present architecture,​ this callback function is contained by the element reporter contained ​in the global structure named array. So the first argument is array.reporter. 
 +      * The second argument to the run function is the line which needs to be passed to Python
 +This is how the callback mechanism works for passing the lines from C to Python in real time.
  
-As described in the previous sections, the extension modules just return a single line from the caption frames. The processing of the caption frames to generate the output subtitle file is done in Python. +==== Processing output in Python ==== 
-A script to generate an output subtitle file from the extracted captions frames in Python has been written. The api_testing.py has a function named callback which acts as a callback function returning the extracted caption lines in Python. These lines then are passed to generated_output_srt in api_support.py described in the api/ directory. Thereupon, the function searches if the line has specific identifier which are used to decide how the output would be generated. A detailed section has been included in this documentation regarding the nomenclature used for processing different lines in CE-608 format caption fields (Support for only CE-608 captions section). The main reason for doing so is to avoid any buffering in C to hold the caption lines until the entire caption frames are extracted. This facilitates real time processing of the extracted caption frames. +  * As described in the previous sections, the extension modules just return a single line from the caption frames. The processing of the caption frames to generate the output subtitle file is done in Python. 
-For getting the output filename from CCExtractor which would then be used to write the output srt file from Python, whenever the code is run from the extension module the first line that is passed via the callback function is the output filename generated by CCExtractor. This is incorporated by calling the callback function from init_write function defined in the src/​lib_ccx/​output.c file. The line passed to the callback function is of the format filename-<​name of the output file to be generated>​ and this is then used to generate the output file. This line is then captured in the generate_output_srt function defined in the api_support.py. +  ​* ​A script to generate an output subtitle file from the extracted captions frames in Python has been written. The api_testing.py has a function named callback which acts as a callback function returning the extracted caption lines in Python. These lines then are passed to [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_support.py#​L7|generated_output_srt]] in api_support.py described in the api/ directory. Thereupon, the function searches if the line has specific identifier which are used to decide how the output would be generated. A detailed section has been included in this documentation regarding the nomenclature used for processing different lines in CE-608 format caption fields (Support for only CE-608 captions section). The main reason for doing so is to avoid any buffering in C to hold the caption lines until the entire caption frames are extracted. This facilitates real time processing of the extracted caption frames. 
-However, if the user wants the flexibility of defining the filename in a different manner, then for such outputs, the user must make changes in the generate_output_srt function to set the filename and ignoring the first line that appears in Python via the callback function.+  ​* ​For [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_support.py#​L10|getting the output filename]] from CCExtractor which would then be used to write the output srt file from Python, whenever the code is run from the extension module the first line that is passed via the callback function is the output filename generated by CCExtractor. This is incorporated by [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​src/​lib_ccx/​output.c#​L58|calling the callback function from init_write]] function defined in the src/​lib_ccx/​output.c file. The line passed to the callback function is of the format filename-<​name of the output file to be generated>​ and this is then used to generate the output file. This line is then captured in the generate_output_srt function defined in the api_support.py. 
 +  ​* ​However, if the user wants the flexibility of defining the filename in a different manner, then for such outputs, the user must make changes in the generate_output_srt function to set the filename and ignoring the first line that appears in Python via the callback function.
  
-Support for only CE-608 captions: +==== Support for only CE-608 captions ​==== 
-For understanding the CE-608 caption format, the user is advised to refer to this documentation on CE-608.+//For understanding the CE-608 caption format, the user is advised to refer to this [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​docs/​G608.TXT|documentation on CE-608]].//
  
-The Python extension module is so far able to extract the captions frames from CE-608 samples. In samples with CE-608, the caption frames that are extracted by CCExtractor are in the form a 15x32 grid which depicts the screen. Thus, the information regarding the font of the captions, the colour they would be having on the screen as well as their alignment on the screen is captured in font,color and text grids respectively. +  * The Python extension module is so far able to extract the captions frames from CE-608 samples. In samples with CE-608, the caption frames that are extracted by CCExtractor are in the form a 15x32 grid which depicts the screen. Thus, the information regarding the font of the captions, the colour they would be having on the screen as well as their alignment on the screen is captured in font,color and text grids respectively. 
-Using Python modules each of such grids can be accessed in Python. However, as described in the previous section the callback function gets a single line and not the entire grid from CCExtractor,​ some processing needs to be done in Python for getting the user required grids per caption frames. +  ​* ​Using Python modules each of such grids can be accessed in Python. However, as described in the previous section the callback function gets a single line and not the entire grid from CCExtractor,​ some processing needs to be done in Python for getting the user required grids per caption frames. 
-The functions which would be acting as the processing and buffering functions for grid generations are present in the ccx_to_python_g608.py. The two major functions are return_g608_grid and g608_grid_former. The g608_grid_former is mainly used to form the grid from lines obtained at the callback function. +  ​* ​The functions which would be acting as the processing and buffering functions for grid generations are present in the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​ccx_to_python_g608.py|ccx_to_python_g608.py]]. The two major functions are [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​ccx_to_python_g608.py#​L15|return_g608_grid]] and [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​ccx_to_python_g608.py#​L1|g608_grid_former]]. The g608_grid_former is mainly used to form the grid from lines obtained at the callback function. 
-The main advantage of the return_g608_grid function is that the user can generate whatever pattern the user desires to process in Python. For accessing various different combinations of the font, color and text grids in CE-608, a help_string has been defined in the return_g608_grid function in the ccx_to_python_g608.py file which describes on the value of mode to be passed to this function to get proper combination of the grids. +  ​* ​The main advantage of the return_g608_grid function is that the user can generate whatever pattern the user desires to process in Python. For accessing various different combinations of the font, color and text grids in CE-608, a [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​ccx_to_python_g608.py#​L17|help_string]] has been defined in the return_g608_grid function in the ccx_to_python_g608.py file which describes on the value of mode to be passed to this function to get proper combination of the grids. 
-In the earlier sections it has been stated that the callback function in Python is not passed with the entire caption frame but just one single line from the frame, a particular nomenclature has been devised to make sure that the lines belonging to the same caption frames are identified in the Python interface. The nomenclature is as follows: +  ​* ​In the earlier sections it has been stated that the callback function in Python is not passed with the entire caption frame but just one single line from the frame, a particular nomenclature has been devised to make sure that the lines belonging to the same caption frames are identified in the Python interface. The nomenclature is as follows: 
-For every frame, the first line that is passed to the callback function is the srt_counter which indicates the identifier value of the caption frame that would be extracted next. +      ​* ​For every frame, the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​extractors/​extractor.c#​L88|first line]] that is passed to the callback function is the srt_counter which indicates the identifier value of the caption frame that would be extracted next. 
-Following the srt_counter,​ the next line would contain a conjunction of the start time and end time of the caption frame with respect to the timings when the captions would be visible on the screen. The start_time and end_time would be conjuncted as start_time-<​start time>\t end_time-<​end time>\n and the user needs to process this line to get the timings. This processing in case of getting a srt file as an output has been done in the generate_output_srt function. +      ​* ​Following the srt_counter,​ the next line would contain a conjunction of the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​extractors/​extractor.c#​L96|start time and end time]] of the caption frame with respect to the timings when the captions would be visible on the screen. The start_time and end_time would be conjuncted as start_time-<​start time>\t end_time-<​end time>\n and the user needs to process this line to get the timings. This processing in case of getting a srt file as an output has been done in the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_support.py#​L18|generate_output_srt function]]
-After the timings have been sent via the callback function, until the next srt_counter is extracted, the lines containing information about the color, font or text grids of CE-608 samples are passed via the callback ​ function to Python. +      ​* ​After the timings have been sent via the callback function, until the next srt_counter is extracted, the lines containing information about the color, font or text grids of CE-608 samples are passed via the callback ​ function to Python. 
-For processing the grids separately, the color grid could be identified by identifying the presence of color[<​srt_counter value>​]:<​color grid line> in the line obtained from the callback function. Similarly, for the font and text grids, the nomenclatures are font[<​srt_counter value>​]:<​font grid line> and text[<​srt_counter value>​]:<​text grid line> respectively. Processing a grid on the basis of such a nomenclature has been done in the g608_grid_former in the ccx_to_python_g608.py file. +      ​* ​For processing the grids separately, the color grid could be identified by identifying the presence of color[<​srt_counter value>​]:<​color grid line> in the line obtained from the callback function. Similarly, for the font and text grids, the nomenclatures are font[<​srt_counter value>​]:<​font grid line> and text[<​srt_counter value>​]:<​text grid line> respectively. Processing a grid on the basis of such a nomenclature has been done in the g608_grid_former in the ccx_to_python_g608.py file. 
-After the entire caption frame has been sent via the callback function to Python for further processing, when the extraction of present caption frames finishes and CCExtractor shifts to a new frame, then a line containing ***END OF FRAME*** is passed via the callback function from C to Python. The user needs to catch this line in order to get the signal that from the next line onwards a new caption frame would begin. Similar approach has been implemented in the function generate_output_srt in the api_support.py file.+      ​* ​After the entire caption frame has been sent via the callback function to Python for further processing, when the extraction of present caption frames finishes and CCExtractor shifts to a new frame, then a line containing ***END OF FRAME*** is passed via the callback function from C to Python. The user needs to catch this line in order to get the signal that from the next line onwards a new caption frame would begin. ​[[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_support.py#​L28|Similar approach]] has been implemented in the function generate_output_srt in the api_support.py file.
 This is how the entire CE-608 is transmitted to Python and the user needs to follow the nomenclature in order to get the caption frames in Python. This is how the entire CE-608 is transmitted to Python and the user needs to follow the nomenclature in order to get the caption frames in Python.
-However, if the user thinks to modify the nomenclature in accordance with some other nomenclature that suits their use case, then the user can do so by editing the python_extract_g608_grid function in the extractor.c file. In this file, the user needs to find the lines where the function run is called with its first parameter being the callback function that is passed from Python and the second parameter being the line which is to be passed to Python.+  * However, if the user thinks to modify the nomenclature in accordance with some other nomenclature that suits their use case, then the user can do so by editing the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​extractors/​extractor.c#​L3|python_extract_g608_grid]] function in the extractor.c file. In this file, the user needs to find the lines where the function run is called with its first parameter being the callback function that is passed from Python and the second parameter being the line which is to be passed to Python.
  
-Wrappers for the extension module +==== Wrappers for the extension module ​==== 
-In case of using an API, it is highly desired to set the parameters desired by the user not via command line but as call to built-in functions. This gave rise to the necessity of wrapper functions which can be called to set certain parameters for directing the functioning of the bindings. +  ​* ​In case of using an API, it is highly desired to set the parameters desired by the user not via command line but as call to built-in functions. This gave rise to the necessity of wrapper functions which can be called to set certain parameters for directing the functioning of the bindings. 
-The wrappers have been defined in the wrapper.c file in api/​wrappers/​ directory. The user can use just call the wrappers to set some parameters. More wrappers can be defined according to the architecture followed in wrapper.c. +  ​* ​The wrappers have been defined in the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​wrappers/​wrapper.c|wrapper.c]] file in api/​wrappers/​ directory. The user can use just call the wrappers to set some parameters. More wrappers can be defined according to the architecture followed in wrapper.c. 
-The user needs to note that the wrappers can be called anytime in between adding parameters to CCExtractor instance (as done in api_testing.py) and before calling the compile_params function from the CCExtractor module. +  ​* ​The user needs to note that the wrappers can be called anytime in between adding parameters to CCExtractor instance (as done in api_testing.py) and before calling the compile_params function from the CCExtractor module. 
-Another thing to note about the wrapper is that, the my_pythonapi wrapper function is a very important wrapper function. It tells CCExtractor that the call has been made using the Python module and thus the functioning of CCExtractor is altered. Hence, if the user intends to use the Python module the user is always advised to call this wrapper function with its first argument to be the object returned by api_init function from CCExtractor module and second argument being the callback function which would be called by the CCExtractor to pass the extracted caption lines back to Python.+  ​* ​Another thing to note about the wrapper is that, the my_pythonapi wrapper function is a very important wrapper function. It tells CCExtractor that the call has been made using the Python module and thus the functioning of CCExtractor is altered. Hence, if the user intends to use the Python module the user is always advised to call this wrapper function with its first argument to be the object returned by api_init function from CCExtractor module and second argument being the callback function which would be called by the CCExtractor to pass the extracted caption lines back to Python.
  
- Test Script+==== Test Script ​==== 
 +  * Once the Python module are generated then the user can use them by importing ccextractor module in Python.  
 +  * For testing the output of the bindings a test script, [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_testing.py|api_testing.py]]. But to mention, the module at this stage only supports generating a subtitle file from the CE-608 standard samples only. 
 +  * Another testing feature, that has been added is that the user can use [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​recursive_tester.py|recursive_tester.py]] to generate the subtitle files for all the samples from a directory. The only parameter needed to run this script is the location of all the samples.
  
-Once the Python ​module are generated then the user can use them by importing ccextractor module ​in Python +==== Silent API ==== 
-For testing ​the output of the bindings ​test script, ​api_testing.py. But to mention, ​the module at this stage only supports generating a subtitle file from the CE-608 standard samples only+  * The Python ​bindings have been designed in such a way that the API is silent ​in itself as well as in the form of output generationSilent in itself means that the API doesn’t write out any output to the STDOUT and the entire ​output of CCExtractor is silenced when the module is used for extraction of caption frames. This feature has been made possible by passing ​parameter -pythonapi internally in api_testing.py ​using the function my_pythonapi() ​from the ccextractor module. The -pythonapi internally makes CCExtractor to silence all the outputs that could have been generated otherwise.  
-Another testing featurethat has been added is that the user can use recursive_tester.py ​to generate ​the subtitle files for all the samples ​from a directory. The only parameter needed ​to run this script is the location of all the samples.+  * If the user wants to add some print functionality from the CCExtractorthen may be defining the prints using printf C function could be an option. Note that the user cannot ​use the mprint function ​to get prints from the extension module ​from inside the CCExtractor C code part as used in CCExtractor ​to get the desired STDOUT prints as these are silenced via -pythonapi.
  
-Silent API+==== Work status ==== 
 +  * The proposal made by me for this project had a major component of multi-threading to let CCExtractor’s Python bindings run the CCExtractor’s extraction process in multi-threads. 
 +  * However, the end goal was modified while the GSOC 2017 coding period and after Second Phase Evaluation, the main aim was to create a Python extension module for CCExtractor which could process CE-608 video samples, extract the caption information present in them and pass this information to Python for further processing. The module was expected to be silent and the output generation from the caption information present in the video sample has to be done via Python. 
 +  * The present status of the extension module is that the module can extract caption information from CE-608 standard video samples and pass the caption information to Python. Further work has also been done to process this caption information to generate an output subtitle(srt) file (the user is advised to check completion of comparing_text_font_grids function sub-section under the future work section).
  
-The Python bindings have been designed in such a way that the API is silent in itself as well as in the form of output generationSilent in itself means that the API doesn’t ​write out any output ​to the STDOUT ​and the entire output ​of CCExtractor ​is silenced when the module ​is used for extraction of caption framesThis feature has been made possible by passing a parameter -pythonapi internally in api_testing.py using the function my_pythonapi() ​from the ccextractor moduleThe -pythonapi internally makes CCExtractor ​to silence all the outputs that could have been generated otherwise +==== Future Work ==== 
-If the user wants to add some print functionality from the CCExtractor,​ then may be defining ​the prints using printf C function could be an option. ​Note that the user cannot use the mprint function ​to get prints from the extension module ​from inside ​the CCExtractor C code part as used in CCExtractor ​to get the desired STDOUT prints as these are silenced via -pythonapi.+=== Identifying ​the input format and raising errors if unsupported === 
 +  * CCExtractor does not process any non-video files. Similarly, ​the processing ​of non-video files is not supported by extension moduleHowever, since the API has been designed to be silent, the module ​doesn’t output ​any error log stating that the input file is a non-video file and it cannot be processed.  
 +  * This is a much desired feature ​and the present version ​of CCExtractor ​extension ​module ​lacks this featureI would be working on this feature post GSOC 2017 but if any user finds that this feature has not been added until they start contribution to CCExtractor’s extension module, then their work on this feature would be highly appreciated. 
 +  * For adding this feature to extension module, ​the extension module must be extended to process the return value from CCExtractor as done in the [[https://​github.com/CCExtractor/​ccextractor/​blob/​master/​src/​ccextractor.c#​L71|api_start function]]. When the sample (non-video) is processed via CCExtractor’s binary, then the processing is stopped by raising ​an ‘Invalid ​option ​to CCExtractor Library’ errorHowever, since the extension module has been designed to be silent, this error message is suppressed. Hence, ​the user should extend ​the test scripts ​to process ​the return value of api_start function in python ​extension module ​according to the constants defined ​in [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​src/​lib_ccx/​ccx_common_common.h|ccx_common_common.h]].
  
-Work status +=== Callback class mechanism === 
-The proposal made by me for this project had major component ​of multi-threading ​to let CCExtractor’s ​Python ​bindings run the CCExtractor’s extraction process ​in multi-threads. +  ​* ​The present architecture uses callback mechanism to pass the extracted caption lines from the caption frames ​of CE-608 captions ​to Python ​for further processing. In the callback mechanism, a callback function is supplied to CCExtractor in C via the my_pythonapi function which stores ​the callback function as a PyObject* in the global variable array. However, according ​to Python ​documentation on C-APIeverything ​in Python ​is a PyObject; be it a function, a tuple or a class. 
-However, ​the end goal was modified while the GSOC 2017 coding period and after Second Phase Evaluation, ​the main aim was to create a Python ​extension module for CCExtractor which could process CE-608 video samplesextract the caption information present ​in them and pass this information to Python ​for further processingThe module was expected ​to be silent and the output generation from the caption information ​present ​in the video sample ​has to be done via Python+  * So, the ideology is to replace ​the present ​callback function by a class which can have many methods that the user can use for different use cases. 
-The present status of the extension module ​is that the module can extract caption information from CE-608 standard video samples and pass the caption information ​to PythonFurther work has also been done to process ​this caption information to generate an output subtitle(srt) file (the user is advised ​to check completion of comparing_text_font_grids function sub-section under the future work section).+  * An example of such an implementation ​has been done [[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​api/​api_testing.py#​L27|here]]. The user needs to note that for accessing the Python class in C, some modifications need to be done to the run function defined in ccextractor.c and a sample example for calling a class method named ‘callback’ could be found [[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​src/​ccextractor.c#​L553|here]]
 +  * Also, an important point to be noted in this case is that the user needs to pass the callback function’s name to run function in C so that the corresponding callback method of the class passed via my_pythonapi could be called via CAs an example, the callback method’s name has been provided [[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​src/​ccextractor.c#​L562|here]]. 
 +  * For understanding the exact implementation of this approach, I would recommend ​the user to understand C-API for Python as the documentation is quite extensive to every use case.
  
-Future Work +=== Completion ​of comparing_text_font_grids function === 
-Identifying the input format and raising errors if unsupported +  * The Python ​extension module ​for CCExtractor is able to pass lines of the caption frames ​for different grids of CE-608 captions. ​Howeverfor generating ​the subtitle file from the caption gridsthe text grid needs to be modified ​according to the color grid as well as font grid. In CCExtractorthis job is done at the function, ​[[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​src/​lib_ccx/​ccx_encoders_helpers.c#​L234|get_decoder_line_encoded]]
-CCExtractor does not process any non-video files. Similarly, the processing ​of non-video files is not supported by extension module. However, since the API has been designed to be silent, the module doesn’t output any error log stating that the input file is a non-video file and it cannot be processed. ​ +  * For generation ​of subtitle files (.srt files) from Python, ​an equivalent version of get_decoder_line_encoded ​has been implemented in Python ​and has been defined ​as [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​python_srt_generator.py#​L56|comparing_text_font_grids]] in python_srt_generator.py 
-This is a much desired feature and the present version of CCExtractor ​extension module ​lacks this feature. I would be working on this feature post GSOC 2017 but if any user finds that this feature has not been added until they start contribution to CCExtractor’s extension module, then their work on this feature would be highly appreciated. +  * Howeveras the user can note that this function ​is not a complete implementation of get_decoder_line_encoded ​function, completion ​of this function’s definition is a matter ​of future work.
-For adding this feature to extension module, the extension module must be extended to process the return value from CCExtractor as done in the api_start function. When the sample (non-video) ​is processed via CCExtractor’s binary, then the processing is stopped by raising an ‘Invalid option to CCExtractor Library’ error. However, since the extension module has been designed to be silent, this error message is suppressed. Hence, the user should extend the test scripts to process the return value of api_start function in python extension module according to the constants defined in ccx_common_common.h +
-Callback class mechanism +
-The present architecture uses a callback mechanism ​to pass the extracted caption ​lines from the caption frames of CE-608 captions ​to Python for further processingIn the callback mechanisma callback function is supplied to CCExtractor in C via the my_pythonapi function which stores ​the callback function as a PyObject* in the global variable array. However, according to Python documentation on C-APIeverything in Python ​is a PyObject; be it a function, ​a tuple or a class. +
-So, the ideology is to replace the present callback function by a class which can have many methods that the user can use for different use cases+
-An example ​of such an implementation ​has been done here. The user needs to note that for accessing the Python ​class in C, some modifications need to be done to the run function ​defined ​in ccextractor.c and a sample example for calling a class method named ‘callback’ could be found here+
-Alsoan important point to be noted in this case is that the user needs to pass the callback ​function’s name to run function ​in C so that the corresponding callback method ​of the class passed via my_pythonapi could be called via C. As an example, the callback method’s name has been provided here. +
-For understanding the exact implementation ​of this approach, I would recommend the user to understand C-API for Python as the documentation is quite extensive to every use case.+
  
-Completion of comparing_text_font_grids function +=== Adding more wrapper functions === 
-The Python extension module for CCExtractor is able to pass lines of the caption frames for different grids of CE-608 captions. However, ​for generating ​the subtitle file from the caption gridsthe text grid needs to be modified according to the color grid as well as font gridIn CCExtractor, this job is done at the function, get_decoder_line_encoded. +  * As described in the ‘Wrappers ​for the extension module’ sectionmore wrapper functions are needed ​to be declared in the [[https://​github.com/CCExtractor/​ccextractor/​blob/​master/​api/​wrappers/​wrapper.c|wrapper.c]] file. For examplefew wrappers have been defined. ​More wrapper functions ​can be defined in similar manner.
-For generation of subtitle files (.srt files) from Pythonan equivalent version of get_decoder_line_encoded has been implemented in Python and has been defined ​as comparing_text_font_grids in python_srt_generator.py +
-However, as the user can note that this function is not a complete implementation of get_decoder_line_encoded function, completion of this function’s definition is matter of future work.+
  
-Adding more wrapper functions +=== Extending the module to support other caption formats ​=== 
-As described in the ‘Wrappers for the extension module’ section, more wrapper functions are needed to be declared in the wrapper.c file. For example, few wrappers have been defined. More wrapper functions can be defined in a similar manner. +  ​* ​In this version, CCExtractor’s extension module supports processing of video samples having CE-608 standard captions in them and writing these captions to output subtitle (.srt) files. 
-Extending the module to support other caption formats +  ​* ​However, CCExtractor in itself has support for other caption standards like DVB, 708 etc. So, extension of module to extract of caption information from samples containing the caption information in these formats is a future task. 
-In this version, CCExtractor’s extension module supports processing of video samples having CE-608 standard captions in them and writing these captions to output subtitle (.srt) files. +  ​* ​The user should note that the information passed from CE-608 to Python is in raw form as lines which are then used to form the 608 grids. Similarly, the extension to other formats must consider passing the raw information of caption in respective format and then processing the information extracted by CCExtractor in Python. 
-However, CCExtractor in itself has support for other caption standards like DVB, 708 etc. So, extension of module to extract of caption information from samples containing the caption information in these formats is a future task. +  ​* ​While extending, the architecture to be followed for ccx_encoders_python should be consistent to other encoders in the codebase to maintain uniformity. Thus for DVB samples, a function name pass_cc_bitmap_to_python and for 708 samples pass_cc_subtitle_to_python need to be declared in ccx_encoders_python.c.
-The user should note that the information passed from CE-608 to Python is in raw form as lines which are then used to form the 608 grids. Similarly, the extension to other formats must consider passing the raw information of caption in respective format and then processing the information extracted by CCExtractor in Python. +
-While extending, the architecture to be followed for ccx_encoders_python should be consistent to other encoders in the codebase to maintain uniformity. Thus for DVB samples, a function name pass_cc_bitmap_to_python and for 708 samples pass_cc_subtitle_to_python need to be declared in ccx_encoders_python.c.+
  
  • public/gsoc/python_extension_module_technical_documentation_gsoc_17.1504545773.txt.gz
  • Last modified: 2017/09/04 17:22
  • by skrill