This is an old revision of the document!

Activity Extractor Technical Documentation

This page contains how the service modules were coded and also how to add support for a new service.

Module Documentation

Main Module

This module is responsible for processing the parameters passed through the command line and calling the appropriate streaming service.

It passes the streaming service a dictionary containing credentials required to complete the process
The dictionary is formatted like this:

parameters = {
  'url': self.url,
  'password': self.password,
  'chrome_args': self.chrome_args,
  'user': self.user

url: The url the driver initially navigates to.
email: The email required to log into the service.
password: The password associated with the email.
chrome_args: Potential arguments that can be used when initializing the chromedriver.
user: Only required for Netflix The profile name the user wishes to retrieve viewing activity from.


This module gets viewing activity from Hulu


This function is called from the Main Module. It's main purpose is to initialize the process and call loginHulu()


First this function creates an instance of Chrome and passes potential arguments to the driver.
It then navigates to and logs in with the user credentials.


We first require the page source of the video.
The function createSoupObject() is responsible for this. For this purpose we use the requests module. We parse the HTML with the help of BeautifulSoup library. The getTitle function returns the title of the video. This is also used for naming the file.

<title>VIDEO NAME - YouTube</title>

The function getRawSubtitleLink returns the Raw Link which is in encoded format. This is still an incomplete URL. The variable UglyString contains the complete URL. The link is present in the BeautifulSoup. We now prompt the user to choose the desired language from the available choices. The available subtitle language choices are extracted from the UglyString.
Based on the chosen language, the corresponding language code is indexed from the language dictionary. This language code is appended to the decoded Link.
This final URL contains the subtitles as an XML file. Now, the XML file is converted to .srt file using BeautifulSoup function calls.


The subtitle URL for Amazon is present in this URL -


		"asin"                              : "" ,
		"consumptionType"                   : "Streaming" ,
		"desiredResources"                  : "SubtitleUrls" ,
		"deviceID"                          : "b63345bc3fccf7275dcad0cf7f683a8f" ,
		"deviceTypeID"                      : "AOAGZA014O5RE" ,
		"firmware"                          : "1" ,
		"marketplaceID"                     : "ATVPDKIKX0DER" ,
		"resourceUsage"                     : "ImmediateConsumption" ,
		"videoMaterialType"                 : "Feature" ,
		"operatingSystemName"               : "Linux" ,
		"customerID"                        : "" ,
		"token"                             : "" ,
		"deviceDrmOverride"                 : "CENC" ,
		"deviceStreamingTechnologyOverride" : "DASH" ,
		"deviceProtocolOverride"            : "Https" ,
		"deviceBitrateAdaptationsOverride"  : "CVBR,CBR" ,
		"titleDecorationScheme"             : "primary-content"

The primary parameters we need to get are ASIN ID, customerID and TOKEN. These are obtained from the config file.
The config file is generated from the file. The file takes the users login and password and generates the config file. The ASINID is taken from the URL directly.

Now, add the parameters to the dictionary and generate the final URL. The final URL will look something like this -,CBR&operatingSystemName=Linux&deviceProtocolOverride=Https&deviceID=b63345bc3fccf7275dcad0cf7f683a8f&deviceStreamingTechnologyOverride=DASH&asin=B0141BACGU&desiredResources=SubtitleUrls&customerID=A1234GH2343&deviceDrmOverride=CENC

This is where the Subtitle URL is present. We get a JSON response from this URL and it contains a subtitle URL with .dfxp format. We request that subtitle URL and download the subtitles.
With BeautifulSoup and Python regex we convert this dfxp to .srt format. (File -


We first need to extract the episode ID from the URL. Sample URL -

The episode ID is p03rkqcv.
The episode PID and episode Title(for naming the file) are present in the URL -<episode_id>.xml

The subtitle URL is present in the following link -<pid>

The PID is nothing but the episode PID obtained above. There are multiple PID's present. So, we try all the URL's until the page request is successful.
If the request is successful we get the subtitle link by parsing the XML page using Beautiful Soup. The subtitles obtained are in XML format. They are converted to .srt by using BeautifulSoup function calls and regex. The conversion takes place in the file


This is one of the methodologies to get the subtitles ID. In the Beautiful soup text it can be found that every video has this parameter.

		  <span class="showmedia-subtitle-text">
		    <img src=""/> 
		    <a href="/naruto-shippuden/episode-464-ninshu-the-ninja-creed-696237?ssid=206027" title="English (US)">English (US)</a>,
		    <img src=""/> 
		    <a href="/naruto-shippuden/episode-464-ninshu-the-ninja-creed-696237?ssid=206015" title="العربية">العربية</a>,
		    <img src=""/> 
		    <a href="/naruto-shippuden/episode-464-ninshu-the-ninja-creed-696237?ssid=206733" title="Italiano">Italiano</a>, 
		    <img src=""/>
		    <a href="/naruto-shippuden/episode-464-ninshu-the-ninja-creed-696237?ssid=206033" title="Deutsch">Deutsch</a>

We need to obtain all the SSID's. We return all the id's as a list along with the respective Language title attached.
For the above HTML we should have this - [['206027', 'English (US)'], ['206015', 'العربية'], ['206733', 'Italiano'], ['206033', 'Deutsch']]
We prompt the user to choose the language and based on the choice, we append the ID from the list obtained above. A sample subtitle URL, where a script_id(206027) has been appended to the base URL :

The encrypted subtitles are extracted from the above URL. The decryption of these subtitles has been taken from another Open Source software : youtube-dl.


The user needs to input his username and password of Netflix in the userconfig.ini file. Netflix requires login to download the subtitles.

We use python-selenium browser to automate the process. The first step is to login to Netflix with the config file information. Chrome WebDriver is used as the driver for selenium.
After a successful login from selenium browser, we request for the video URL.
The chrome Network tab gives a list of resources fetched from the server. We use the command :

return window.performance.getEntries();

This command returns all the fetched URL's. It was observed that all the Netflix videos had this sub-string in common and it was unique. /?o
So we query for /?o and let the browser fetch the resources until we find such a URL. If we do not find the URL before the time out, we exit the application. If such a URL is found we save the URL and follow the standard procedure.
We request the URL using requests module and save the file.
The module is used to convert XML to .srt format.


We first require the page source of the video.
The function createSoupObject() is responsible for this. For this purpose we use the requests module. We parse the HTML with the help of BeautifulSoup library.

The video URL follows a specific standard throughout. 

We need to split and return “684171331973”. This is the required contentID.

This is the alternative method to obtain the contentID. In the soup text there is a meta tag which also contains the video URL. This is helpful in case the user inputs a shortened URL.

 <meta content="" property="og:url"/> 

As stated above we split the URL and return the require contentID, 684171331973 The other parameters required for obtaining the subtitle URL are also present in the HTML page source.

The required script content looks like this-

		jQuery.extend(Drupal.settings, {"":...............}); 
  • We add everything to a new string after encountering the first “{”.
  • Remove the last parentheses and the semi-colon to create a valid JSON. —- ');'

The JSON has the standard format and the required parameters follow this naming. The json content :

"foxAdobePassProvider": {......,"videoGUID":"2AYB18"}}

We use the json module to parse the json and extract the parameters namely showid , showname , videoGUID

Sample Subtitle Links -

The standard followed is -[showid]/[showid]/showname_videoGUID_contentID.dfxp

Some Subtitle URL's follow this standard -[showid]/showname_videoGUID.dfxp[showid]/

So we store both URL's and check for both the varieties. We request both the varieties of URL and save the subtitles file when a successful request is returned.

General rules

Each service has a unique way of fetching the subtitles from the server. We can get to know the methodology by following some steps -

  • The easiest way is to first open the Developer tools in Chrome/Firefox and check for XHR requests. Generally we find the subtitle URL's here.
  • The next step is to find out a general pattern in the subtitle URL's of that particular service.
  • If a pattern is found, it is most likely that we can request the subtitle page by forming the URL's from the required parameters.
  • Generally, the parameters can be found in the HTML page source. We need to search for them and query the URL.
  • Sometimes the required parameters for the URL are found in some other links in JSON format. A quick check of the fetched JSON resources will reveal the availability of them.
  • For services such as Netflix, the parameters have some kind of hashing in them which is difficult to decrypt. In such cases we can use selenium browser and search for keywords like .srt, .dfxp, cc, sub
  • By checking for multiple videos we can find out common sub-strings in the subtitle URLs. These common sub-strings(have to be unique) can be used for querying the resources from selenium browser.
  • In most cases, the subtitle URL is fetched only if the user is logged in. So we first need to setup login and then go to the video URL in the WebDriver.
  • The subtitles can then be downloaded from the URLs.
If you are a developer and want to add support for new services or fix bugs please feel free to send a pull request or contact me for further assistance.
  • public/codein/activity_extractor_technical_docs.1482104289.txt.gz
  • Last modified: 2016/12/18 23:38
  • by manveer_b