This blog is subject the DISCLAIMER below.
Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Tuesday, May 18, 2010

Python Script to download TED Talks translations :))

Well, i admit it, i am a TED lover. I love the passion, and the "new" factor TED brings to me :)

I was watching this presentation for Simon Sink and someone I know downloaded the video but wasn't able to view the translation locally, so I thought I might be able to download the translations for him and convert them to SRT format to be able to display them on any Video Player. and Here is how I did it :)

If you visited the site before, you will find it has a flash control where you can view the video and choose the translation if needed.

First, To be able to investigate how the flash control gets the translation, you have to open a network sniffer which will enable you to view all the packets going to and from your network card. A good packet sniffer is Wireshark, and if you don't know it check those links to know more about how to use it :)
Packet Sniffing using Wireshark Tutorial (Video)
Fifteen Minute Wireshark Tutorial - Wheeler Software


Second,
you need to add a filter for "HTTP" requests only, and navigate through any video (like the one above) to view how the flash control communicates with the server :)

When you choose a translation, you will find a request to a URL like this: www.ted.com/talks/subtitles/id/848/lang/eng , where 848 is the Talk ID and eng is the language choosed "English".

So there two questions now, given a URL to the TED Talk

What format are translation subtitles returned?
How do i get the Talk ID?


To answer the first question, just click the link above, and you will find the translation is returned in JSON (Javascript Object Notation) format. So this is good news. We just need to find a good library to handle JSON, and be able to convert it to SRT format to be able to use it on almost any Video Player like VLCMedia Player.

And the answer to the second question is to do a simple "View Source" and search for this number, you will find it in several places through the page's source code, so simple parsing the page HTML should do the job.
Here is the full Script to do the job given the URL on ted.com and the language code which is eng for english and ara for arabic. Will try to provide the rest of the language codes later.

This is a simple command to test the script:

python TEDSubtitles.py "http://www.ted.com/talks/simon_sinek_how_great_leaders_inspire_action.html" "eng"

And here is the full source code, will try to upload it somewhere soon.

Hope it's worth spreading :))

Updates:
You can get the script here .
You need at least Python 2.6 for the json module to be available.

Update on 22/09/2010:
I was informed in the comments by Mohammad that he created a program http://sourceforge.net/projects/tedgrabber/ to download TED Talks and subtitles. He also created another one http://sourceforge.net/projects/timecovergrabbe/ , which allows you to grab Time Magazine covers.This is another example that illustrates the concept and ofcourse it can be applied everywhere. Good luck to Mohammed.
Although it's hosted on sourceforge, the source code is not available. It would be better if it was shared.

Update on 24/04/2011:
A Google Appengine application was created for the same functionality on http://tedsubtitles.appspot.com

You can view the source below, but for better colorized viewing check this link here
import os
import sys
import json
import urllib2

# Format Time from TED Subtitles format to SRT time Format
def formatTime ( time ) :
milliseconds = 0
seconds = ((time / 1000) % 60)
minutes = ((time / 1000) / 60)
hours = (((time / 1000) / 60) / 60)
formatedTime = str ( hours ) + ':' + str (minutes) + ':' + str ( seconds ) + ',' + str ( milliseconds )
return formatedTime

# Convert TED Subtitles to SRT Subtitles
def convertTEDSubtitlesToSRTSubtitles ( jsonString , introDuration ) :
jsonObject = json.loads( jsonString )

srtContent = ''
captionIndex = 1

for caption in jsonObject['captions'] :
startTime = str ( formatTime ( introDuration + caption['startTime'] ) )
endTime = str ( formatTime ( introDuration + caption['startTime'] + caption['duration'] ) )

srtContent += ( str ( captionIndex ) + os.linesep )
srtContent += ( startTime + ' --> ' + endTime + os.linesep )
srtContent += ( caption['content'] + os.linesep )
srtContent += os.linesep

captionIndex = captionIndex + 1
return srtContent

def getTEDSubtitlesByTalkID ( talkId , language ) :
tedSubtitleUrl = 'http://www.ted.com/talks/subtitles/id/' + str(talkId) + '/lang/' + language
req = urllib2.Request(tedSubtitleUrl)
response = urllib2.urlopen(req)
result = response.read()
return ( result )

tedTalkUrl = sys.argv[1]
language = sys.argv[2]

req = urllib2.Request(tedTalkUrl)
response = urllib2.urlopen(req)
result = response.read()

## Get Talk ID value
splits = result.split ( ';ti=' )
talkId = splits[1].split ( '&' )[0]
print talkId

## Get Talk Intro Duration value
splits = result.split ( ';introDuration=' )
talkIntroDuration = splits[1].split ( '&' )[0]
talkIntroDuration = int ( talkIntroDuration )
print talkIntroDuration

jsonString = getTEDSubtitlesByTalkID ( talkId , language )

srtContent = convertTEDSubtitlesToSRTSubtitles ( jsonString , talkIntroDuration )

# Generate SRT file name
splits = tedTalkUrl.split ( '/' )
srtFilename = splits[len ( splits )-1].split ('.')[0]

srtFile = open ( './' + srtFilename + '.srt' , 'w' )
srtFile.write ( srtContent.encode ( "utf-8" ) )
srtFile.close ()


.. more.

Monday, August 25, 2008

Playlist creation Mobile App in Python !

This is a simple program i have made for my mobile phone (Nokia N70).

To cut a long story short, i have a a 1GB memory card more than half of them are mp3's, and i wanted a simple program to organize them (Playlist for each folder). At the same time, i wanted to play with something new and non-usual ( ! J2ME), i wanted to write it in python :) , and it turned out to be very easy.



  • At first i needed to install the suitable python runtime for my phone, and found this link (provides some information about your device, including the development platform). Then i had to download it from here . You just need to install it on your cell phone just as any other program.

  • Then i wrote the program in python and tested it on my Ubuntu PC and here is the code. I forgot to mention that playlist files have a ".m3u" extension and are only a text file containing a list of mp3(|| m4a) files separated by a new line character.


    import os
    import sys

    # Test folder
    searchDirectory="/media/sda6/Songs/fdsfsd"

    if not os.path.isdir ( searchDirectory ) :
    print searchDirectory + " is not a directory."
    sys.exit ( 0 )

    m3uFilename = os.path.split ( searchDirectory )[1] + '.m3u'

    # Get all files.
    dirList=os.listdir(searchDirectory)

    fileList = []

    # Get mp3 or m4a files only.
    for fname in dirList:
    if fname.lower().endswith ( '.mp3' ) or fname.lower().endswith ( '.m4a' ) :
    fileList.append ( fname )

    # Begin : Generate the m3u file.
    m3uFile = open( searchDirectory + os.path.normcase('/') + m3uFilename ,'w')

    filesAdded = 0

    for aFile in fileList:
    print 'Adding ' + aFile
    m3uFile.write ( aFile + '\n' )
    filesAdded = filesAdded + 1

    print 'Files Added : ' + str ( filesAdded )

    m3uFile.close()
    # End : Generate the m3u file.

  • Third, I needed a simple control to select the files visually, after a 5 minute googling i found this fileselector control. and I needed to add only two extra lines of code :D , Those -->

    import fileselector
    searchDirectory = fileselector.fileselect()

  • Fourth, Put them on a folder on your memory card, then from your cell phone install fileselector.py (remember you got it from the third step) and install it as a python lib, and install m3ucreator as a script. Then run python on your phone, select options --> Run Script and you will find it as my\m3ucreator.py . When you run it the file selector should open, after you select your folder it will report how many files are added to the playlist. And when you open Nokia Music Manager the next time, you will find it in the list of playlists.

  • Fifth, There is no fifth step :) . I just want to note that there is no code changes from the desktop version except in the folder selection (THIS IS GREAT) . Here is the code listing

    import os
    import sys

    # Begin : Mobile Specific Code.
    import fileselector
    searchDirectory = fileselector.fileselect()
    # End : Mobile Specific Code.

    if not os.path.isdir ( searchDirectory ) :
    print searchDirectory + " is not a directory."
    sys.exit ( 0 )

    m3uFilename = os.path.split ( searchDirectory )[1] + '.m3u'

    # Get all files.
    dirList=os.listdir(searchDirectory)

    fileList = []

    # Get mp3 or m4a files only.
    for fname in dirList:
    if fname.lower().endswith ( '.mp3' ) or fname.lower().endswith ( '.m4a' ) :
    fileList.append ( fname )

    # Begin : Generate the m3u file.
    m3uFile = open( searchDirectory + os.path.normcase('/') + m3uFilename ,'w')

    filesAdded = 0

    for aFile in fileList:
    print 'Adding ' + aFile
    m3uFile.write ( aFile + '\n' )
    filesAdded = filesAdded + 1

    print 'Files Added : ' + str ( filesAdded )

    m3uFile.close()
    # End : Generate the m3u file.

    Enjoy it :)
Note that the code is not displayed properly here. Python code blocks depend on indentation.

.. more.