The list of accepted organizations has not been published yet. You can start working on these ideas at any time of course, and in fact doing so and making friends in our community will greatly increase your chances of being accepted - if we are.
We are a small org, which means that your contribution is expected to have a very large impact. It's not going to mean a 0.5% improvement on a big project - it's supposed to be like 10% on a medium size one. So please come only if you like the challenge and feel like that you are ready to have a serious responsibility.
We have -we think- statistically amazing continuity in the team: Most GSoC students from all the past years are still involved, even if they are no longer eligible as students. They still contribute code, and they mentor both in GSoC and code-in. So while you don't have to do it, we really prefer students that consider their GSoC participation as a starting point of joining a developer community, not a summer gig.
We have mentors in North America, Europe, Asia and Australia. Time zones are never a problem. We hang out in a slack channel to which everyone is welcome. If you get accepted we expect to see you there often. Even if you don't need to talk to your mentor, please try to be around when working.
About the projects and getting accepted
Qualification: In order to qualify you need to achieve a minimum of 7 points. You get points:
1) By solving issues in our GitHub issue tracker (CCExtractor), Sample platform issues (default 1 points per issue unless specified somewhere in the issue page, double points for issues solved before the accepted orgs are announced). Most issues have an explicit number of points that you can find in a comment.
2) By joining the community in slack (1 point if you do it after we've been accepted to GSoC, 2 points if you do it before). You can invite yourself here.
3) If you are a former Code-in finalist you start with 1 point. If you were a winner, you start with 2 points. Note that there are just a few developers that meet this, so don't be discouraged if you aren't one of them. Almost no one is, but we'd love to hear from those that are.
4) By sending us a TV sample that has something we don't support. It doesn't have to be from your own country (since hopefully, we already support it), but if it is, so much the better. This is probably hard to get, since we already got all the low hanging fruit. But if your local TV has subtitles you can turn on and off, we'd love a recording.
Getting 7 points doesn't guarantee that you will be accepted as that depends on the quality of your proposal (which also needs to be good) and the amount of slots Google allocates to us.
Students without 7 points will not be accepted no matter what. If we have more slots than students with the minimum score we will just give those slots back to the pool so other orgs can use them.
It goes without saying that everyone in the community has to be polite and respectful, and consider everyone else a member of a team and not a competitor.
All developers are part of the team, by the way. Our slack channel has mentors, code-in participants, other students, are developers and users that are none of the above but that play some kind of role in CCExtractor.
Part of being respectful is giving consideration to everyone else's time. For example asking things that are written in the website or in the software help screen shows little respect. We don't want to seem unfriendly, but asking in the slack channel something like “isn't there a GUI?”, “how do I run this”, etc, is not a great way to start. This doesn't mean you can't ask questions, but remember than being a clueless user and a lazy developer are two very different things. If you ask those questions you will probably get an answer as if you were a clueless user (polite no matter what), but if you apply to GSoC you will be considered a lazy developer.
You can propose to do any of the following ideas, or you can bring your own. In any case, make sure you run them by us before you actually submit your proposal.
Important: The first two weeks must be allocated to solve bugs listed in GitHub. Yes, we know it's a chore and that you would rather work immediately on the new great thing. But experience has proven that these two weeks are extremely useful to bond with the rest of the community, get you introduced to the existing code base, and of course the bonus that bugs will actually be fixed. If you really don't want to spend any time on this we will waive this requirement for students with 15 qualification points (see above).
At the very least your proposal needs to
- Explain what you do want to do, why it is important to you, and why it is important or useful to us.
- Explain how you intend to accomplish the goal, in enough detail that makes it clear that you know what you are talking about. For example, “I will modify the CCExtractor binary so that it's able to convert audio to text with perfect accuracy” is the same thing as sending your proposal to the trash. You need to have a plan.
- Detail the timeline, week by week, explaining the deliverables for each week (pay special attention to the milestones within the GSoC timeline itself, of course) and how we should validate the results.
- Detail what kind of support you will need for us. For example, if you are going to need test streams, hardware, access to a server, etc, let us know, so we can plan ahead.
- Detail your expected working hours in UTC.
- Detail your planned absences. We don't need you to detail what you will be doing when you are not working of course, but if you are going away for any reason we need to know so we don't think you've abandoned.
- Link to your GitHub profile, if you have one, so we can take a look at your previous work.
- GSoC is a coding program: This means that ideas that are about testing, website design, etc, are out.
- However, we want to have good documentation: Make sure you have time to write a good technical article explaining your work.
- Be realistic and honest with the timeline. Consider each week you should work around 40 hours. If your timeline reserves a lot of time for minor things we'll think that you are not going to be working full-time in GSoC. On the other hand if you promise to do things in a lot less than that it seems realistic to us it will seem that you don't really know how much work things take.
- If you are going to be using 3rd party libraries (that's OK), make sure to validate that their license is compatible with GPLv2 (which is ours). List the libraries in your proposal. Check that they are multiplatform. If you will need to extend those libraries in any way please explain. In this case, your proposal should include time to get that extension submitted to the maintainers (we love to contribute to other projects).
Something else: Mentors often have their fingers in several pies. If you send the same proposal to several orgs everyone will know. So do yourself a favor and don't do that. You can apply to several organizations and that's totally fine, but each organization will want to see that you have put the time to write a great proposal that is focused on them.
The ideas we currently have
Important: If you have something else in mind that relates to subtitles and accessibility please get in touch. We prefer that you do something that you are passionate about even if it's something we hadn't considered.
Write high speed subtitle synchronization tools
Tool A - sync between two versions of the same footage: This is a very common use case: Suppose you have raw recording of a TV show, with commercials, etc, then use CCExtractor get the subtitles from it. Then you remove the commercials, and have a really clean recording, but the subtitles are out of sync since the timing changed the video.
The project is to write a tool that takes:
a) The original video
b) The edited video
c) The subtitles for the original video
d) The subtitles for the edited video
We recommend you use FFmpeg to do the heavy lifting for the video processing and DejaVu as a reference to do the audio fingerprinting which you will need for the synchronization.
A really important requirement is that this is a fast tool. This means that writing a script that first calls FFmpeg to generate a .wav file and then calls DejaVu to locate each segment will definitely not work (and also, it's not a Summer of Code task). You need to write a C program that uses FFmpeg libraries and reimplement the audio fingerprinting in C. This should be “easy” since for DejaVu you have the source code, an amazing explanation of how everything works, and FFmpeg libraries have FFT functions so luckily you don't need to implement them yourself.
You can also come up with a totally different solution that doesn't follow our suggestion as long as it achieves the goal.
Tool B - Suppose you don't have the original video, but you do have the original subtitles from it, so what you have is:
a) The subtitles for the original video, which contains subtitles for commercials and possible a few minutes from the previous and following program.
b) The edited version.
Doing the sync now is more difficult as you don't have the original audio or video to compare. But you do have the audio for the edited version from which you can obtain timing for voice. For example if the subtitles for the original video contain three consecutive frames that last 3.45 seconds, 1.54 seconds and 2.34 seconds respectively, and doing audio analysis in the edited video you find 3 segments with voices with similar duration it's likely that they are a match.
1) You cannot use any non open source dependency. For example, Mathlab is out, even if the run time is free.
2) Your program needs to be usable from a script, so it should be command line based. If there's time, you can definitely provide a GUI, but that's secondary to the main program.
3) High speed is really a priority. Prepare to spend time coming up with a good algorithm.
4) While GSoC is about coding, you will have to prepare really good documentation. As an example, check out DejaVu's explanation on how everything works (even if you don't use it at all, use it as a baseline of really good technical documentation).
5) Must be as portable as the libraries you use. For example FFmpeg builds in linux, windows, etc, so if you use FFmpeg then your program must also build on those platforms.
We will provide a fast speed server in which you can work. You don't have to use it, but keep in mind that in general video files are very large. You will need to deal with files that are several gigabytes long. If you have the bandwidth, great. Otherwise you can just work remotely on our development server.
Add support for DTMB countries
DTMB is the Chinese TV standard, adopted by other countries such as Cuba. We still don't know much about it. Due to this, your proposal must include:
a) A link to the relevant standard documents. We don't know if they exist in English. If they don't but you speak the language they are in, that's fine. If you locate the documents but they require payment (as is often the case for technical specifications) send us a link to buy and we'll allocate organization funds to purchase them.
b) Some TV samples. Or, if you cannot get them directly, an explanation of how you will get them, for example by purchasing a capture card that is known to be compatible (send us an exact link), plugging it to an antenna or dish, etc, that you have access to (detail), etc.
In short, this is an “adventure” task. We'll go all the way with the student that tries it, but we want to make sure the chances of success are reasonable.
This is what we (think we) know so far:
DTMB regulates the physical transmission standard (signals, frequencies, etc). It seems to be available (for purchase) here.
The reason Cuba is interesting is that their subtitles will have Latin characters, which will make life a lot easier for most team members. Also, the Cuban government has a good website about their TV regulations.
Apparently the subtitles themselves follow the European DVB standard. We can see that in this document from the Hong Kong regulatory body which says:
Subtitles: Receivers shall include provisions to decode and display subtitles conforming to ETSI EN 300 743.
The Cuban government says the same thing:
That document (in English) says:
The Brand and Model TV Set is intended for the reception of DTMB Digital Terrestrial Television in 6MHz bandwidth, according to the specifications GB 20600-2006.
DVB subtitles (ETSI EN 300 743) – The DUT must support DVB subtitles (ETSI EN 300 743).
Important: Since Chinese is by far the most extended language among DTMB countries, its support is essential. We have some preliminary support for it, but whatever is missing you will need to add. This applies in particular to the .srt writer. Since .srt is text based, you need to OCR the bitmaps. This is already done but almost untested for Chinese. Don't assume it's going to work. Probably not. Give yourself time in your proposal for this.
Detect Automatically the most interesting bits of sample videos
Write software that is able to detect, for some kind of videos, the most interesting bits (highlights). You can use:
At a minimum, the following must be detected:
- Goals in soccer (previous work exist; you can build on it or reimplement)
- Three pointers in basketball
- Jokes in sitcoms
Plus any other 5 use cases you want to work on.
|Deep Learning of Audio and Language Features for Humor Prediction
Do word by word subtitle-audio sync
The usual subtitles files, such as .srt, do a line by line sync - meaning the subtitles appear when the person starts talking, says a few words, then the line disappears and a new one appears, etc.
00:02:17,440 –> 00:02:20,375
Senator, we're making
our final approach into Coruscant.
In this .srt example, at minute 2, second 17 those two lines of text appear and then they disappear at 2:20.
The task is to tag each individual word as is being spoken. This implies audio analysis. While in principle it doesn't seem terrible hard (since you just need to distinguish between individual words for which you at least have an ordered list) keep in mind that some times subtitles don't match audio 100%. For those words that do match, you need to provide a perfect audio-subtitle sync. For those words in the subtitle files that don't appear in the audio (this is a corner, yet possible, case) add some indicator. Finally for those words in the audio that don't appear in the subtitles, add a different indicator.
Focus on the challenging parts of the project, which is the sync itself. You can assume that the subtitle format is always .srt and don't deal with additional formats, since conversion tools exists. Similarly, you can assume that the audio is a .wav file and forget about dealing with video formats. FFmpeg can deliver a raw wav from almost any stream which is more than enough.
The solution needs to work in real time, meaning that it must be possible to pipe the subtitles and audio data into your program and get the word-by-word sync'ed version has it happens. So things like double pass are out of the question.
As a suggestion, take a look at this. You don't have to use it (you can if you want), but it's worth checking out for ideas and concepts.
Write Python bindings for CCExtractor
Extend Python to use CCExtractor's library to access subtitles. You should export as much of CCExtractor as possible. At a minimum, it should be able to
- Open and close input video streams.
- For an open stream, get the list of programs.
- For a selected program, get the subtitles in various easy to use structures. You need to provide access to the original representation (for example, if it's US TV subtitles then a grid for CEA-608, if it's European DVB subtitles then a bitmap) as well as the conversion to usual formats such as .srt.
While CCExtractor itself uses its own library (lib_ccx), we are not aware of any other program using the library directly (as opposed to running CCExtractor and getting the generated file). This means - it's likely you will also need to modify the library itself to make it “sane enough” for this project.
Create a integrated GUI, replacing what we have
allows creating amazing GUIs that are portable. We currently have separate GUIs (they are different binaries that run the command line main program) for each supported platform. The job is to create a GUI that is part of the current binary and that works in Windows / Linux / OSX.
The library above is a suggestion. It may or may not be exactly what we need, even though it's really promising. It's your job to do the research (this should be part of your proposal, i.e. you need to tell us what library you plan to use), and come up with a good plan.
We will assist with integration and you probably can get away without going too deep into the current code. However, when it comes to the GUI itself, you are in the driver's seat. We haven't used this library ourselves so we're there for moral support and general peer support (when you run into problems we'll look into them together, but we won't have guru-level answers ready).
Also check this out.
Complete 708 support
708 is the standard for digital TV in the US and a few other countries. We have preliminary support, but the goal is a 100% accurate implementation. This means:
a) Perfect timing.
b) Perfect rendering, limited only by the output format.
c) Full support to all languages for which samples are available.
We will provide hundreds of samples (for which you must complete support, no exceptions) and access to a high speed linux server for you to work with if needed. These samples are usually very large (gigabytes each) so working locally may not be feasible for you if you don't have a great internet connection.
This is a high value task and we'd love to have it done, but in order to qualify you need to fix the issues in any of the existing samples. Please don't ask what the issues are. Download the samples and you will see.
Enable automated testing on windows and other general sample platform improvements
The sample platform has been a good way to test regression tests, but still lacks windows support. It's been foreseen, but unfinished. It should be finished, so we can ensure it's working both on Linux and Windows. Besides that, there are some things that need to be finished. This task thus encompasses:
a) Windows support
b) FTP upload support
c) Improved error detection
d) Other small listed improvements on the issue tracker
A web-site to view captions in real-time
There is a a web-site which allows viewing caption stream in real-time from web-browser. To fetch CC stream from TV tuner and send it to the server it uses separate application that parses CCExtractor's output. This project just has passed prove-of-concept stage, so at least you have to implement the following:
Project Nephos: Cloud based storage for a massive collection of TV recordings
Both CCExtractor and Red Hen (our sister organization you should check out) store massive archives of TV recordings. By massive we mean hundreds of terabytes. Until now these archives have been handled in-house, but we're approaching a point in which it's financially more sensible to use cloud storage.
During this summer we want to approach the migration to cloud storage, specifically using Google Drive (but your code should allow to extend to other services).
Some of the must-have features are easy, for example when a recording is complete (and exists as a local file) is needs to be moved to Cloud.
Other things will need more work. Specifically:
- Indexing. This is not a “index by date” or other trivial thing, we index by content.
- Sharing, which needs to be as flexible as possible. In general everything needs to be automatic. For example config such as “share all Spanish TV content with these American universities”, etc.
- Duplication, which means that content shared with us from another instance of Nephos can be copied to our own instance of Cloud storage.
- Pre and post processes, for example to convert the original format to smaller versions, or to extract subtitles.