Showing posts with label Java. Show all posts
Showing posts with label Java. Show all posts

Friday, 25 September 2009

Download all SMBC Comics

Simple regex-based bruteforce program to save all comics from http://www.smbc-comics.com/. You'll need http://commons.apache.org/io/ and my source code.

Sunday, 26 July 2009

Blagh

1. Blagh:
  • I read the Post "Crying" again and must say that Java is definitely my favourite programming language.
  • I totally lost every knowledge of Python which's a bad thing since I'll have to use it in a exam in a few months.
  • I got attached to Twitter. It's so convenient and you can watch it on the panel of my blog.
  • I read Russell's Introduction to Mathematical Philosophy. Very interesting.

2. Pathfinder:

I had to write a small Java program that implements Dijkstra's shortest path algorithm in order to find the shortest path between two cities. You can download the source code here. You can add nodes and edges to the data.txt and the user interface is console-based.


3. Imageboardsave:

Currently, I'm writing a program which downloads every image and Rapidshare URL in a thread of an imageboard. The program works for AnonIB at the moment and the user can specify how many pages he want to search. The program scans the page for threads and download the content to:

/threadID/picturefilename.something


and the Rapidshare-URLs to:

/threadID/rapidshare.txt


Subthreads, i.e. more than a one-page thread, are considered as well. It even has a graphical user interface programmed in Java Swing. Actually, it runs pretty good right now, but I don't want to release it yet.

Wednesday, 20 May 2009

Language Guesser and OpenNLP Pipe

1. The first program I wrote estimates the language of a document, based on a simple statistical bigram model. It takes 5 arguments from commandline based on following syntax:

java Main trainfile_lang_1 trainfile_lang_2 trainfile_lang_3 trainfile_lang_4 testfile_lang

e.g.

java Main ./europarl/train/de/ep-01-de.train ./europarl/train/en/ep-01-en.train ./europarl/train/es/ep-01-es.train ./europarl/train/fr/ep-01-fr.train ./europarl/test/de/ep-01-de.test

and writes to the system standard output: "Die Sprache ist wahrscheinlich Deutsch." (The language is probably German.)

You see, this is very static and perhaps I'll make it more dynamic in the future. It was originally created to estimate the language of a protocol from the European parliament. You can download it here, but please be aware that the comments are in German.


2. The second program is more interesting. It takes a XML file with tags and everything and writes the sentences to a file in following format:

token TAB tag TAB chunk

e.g. for the sentence "Is this love?"

Is VBZ O
this DT B-NP
love NN O
? . O

It just takes the path to the XML file as commandline argument. To run the program you'll need 2 things. Ant and the OpenNLP models. You can download the program here

/*
* To ensure the functionality of the program, the models EnglishChunk.bin.gz,
* EnglishSD.bin.gz, EnglishTok.bin.gz, tag.bin.gz and tagdict.htm have to be
* in the models directory.
*
* IMPORTANT:
* The models have to be downloaded and placed in the models directory
* otherwise the program won't work. The download links can be found in
* ./model/readme.txt.
*
* Install:
*
* 1. Download the models at OpenNLP: http://opennlp.sourceforge.net/
* 2. Run Ant
* 3. Start the program with java -jar NLPipe.jar XMLfile
* 4. Optionally you can run the program with: ant -Dargs="XMLfile" run
* 5. Optionally you can archive the basedir with: ant tar
*/