Last friday I gave presentation on Zipf's law primarily concerned with processing frequency lists and spectra in R and zipfR. The scripts contain two Python programs for extracting frequencies out of NLTK's internal Gutenberg selection corpus and the section J of the ACL Anthology corpus. If you don't have access to the ACL, I provide the processed TFL and SPC files for both corpora in the ZIP file.
Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts
Sunday, 13 December 2009
Sunday, 11 October 2009
Some NLP-related Python code
1. A program that counts the Flesch Score of a text. Code here. Don't know if the syllables are computed correctly.
2. A program that searches a wordlist for minimal pairs. Code here. Example here. The format of the wordlist is restrictive and the minimal pairs are printed twice!
3. A program that obfuscates the input, which means that first and last letter are the same but everything in between is mixed around. Code here.
4. A program that constructs a tree from a file and searches for the common minimal ancestor of two nodes. Code here. Example here.
2. A program that searches a wordlist for minimal pairs. Code here. Example here. The format of the wordlist is restrictive and the minimal pairs are printed twice!
3. A program that obfuscates the input, which means that first and last letter are the same but everything in between is mixed around. Code here.
4. A program that constructs a tree from a file and searches for the common minimal ancestor of two nodes. Code here. Example here.
Labels:
Computational Linguistics,
Linguistics,
Programming,
Python
Monday, 28 September 2009
Python-based RPN Evaluator
This program evaluates logic expressions out of a textfile with Reveresed Polish Notation (RPN) syntax.
Example world file:
wind
/sun
/rain
red
wind and red have the value of 1, sun and rain 0 since they are prefixed by "/".
Here's the syntax to run the program: "python log.py myworld.world".
It quits when an empty expression occurs.
Example usage:
C:\Python26>python log.py myworld.world
Logical Expression: rain sun &
0
Logical Expression: sun red |
1
Logical Expression: sun wind ^
True
Logical Expression: winter sun &
*** Error while evaluating: Bad name: 'winter'.
Logical Expression: sun red
*** Error while evaluating: Unbalanced expression: 'sun red'.
Logical Expression: sun red red |
*** Error while evaluating: Unbalanced expression: 'sun red red |'.
Logical Expression:
C:\Python26>
Find the source code here.
Example world file:
wind
/sun
/rain
red
wind and red have the value of 1, sun and rain 0 since they are prefixed by "/".
Here's the syntax to run the program: "python log.py myworld.world".
It quits when an empty expression occurs.
Example usage:
C:\Python26>python log.py myworld.world
Logical Expression: rain sun &
0
Logical Expression: sun red |
1
Logical Expression: sun wind ^
True
Logical Expression: winter sun &
*** Error while evaluating: Bad name: 'winter'.
Logical Expression: sun red
*** Error while evaluating: Unbalanced expression: 'sun red'.
Logical Expression: sun red red |
*** Error while evaluating: Unbalanced expression: 'sun red red |'.
Logical Expression:
C:\Python26>
Find the source code here.
Tuesday, 28 July 2009
These little annoying and surprising things...
...concerning Python as language of "very clear syntax which emphasizes code readability" (Wikipedia):
A = Annoying == Anti-Python
1. The naming conventions of...:
a) ...package-files/magic members: __init__.py, def __del__, __name__
b) ...visibility modifiers: _protected and __private
So _ and __ in general. Really, why did Guido do this? Is there an explanation? Perhaps it's inherited from another language?
2. The verbose...:
a) ...object inheritance declaration of each class: class Standard(object)
b) ..."self"-reference of each class-value/-constructor/-function just to indicate that it's non-static: def compute(self, number), self.Radius, def __init__(self)
(3. Multiple inheritance: It's no coincidence that most languages don't support multiple inheritance. Normally, you don't need it and it is a trap which makes debugging almost impossible. It is definitely not a feature for a language which emphasizes code readability and clear syntax.)
S = Surprising
(Powerful ability to handle and process strings in general.)
1. Lambda/Annonymous functions: (lambda x, y : x + y)
2. Managed Attributes: property([fget[, fset[, fdel[, doc]]]])
3. Great modification abilities due to magic members/methods and type-emulation.
4. List comprehension, generator expressions and yield.
5. Function decorations. Java has this one too.
6. Localization module. It's neat and easy to localize your programs.
7. Parallel computing module!
8. Awesome network protocol capabilities.
9. Unit tests.
10. The best documantation I've ever seen.
A = Annoying == Anti-Python
1. The naming conventions of...:
a) ...package-files/magic members: __init__.py, def __del__, __name__
b) ...visibility modifiers: _protected and __private
So _ and __ in general. Really, why did Guido do this? Is there an explanation? Perhaps it's inherited from another language?
2. The verbose...:
a) ...object inheritance declaration of each class: class Standard(object)
b) ..."self"-reference of each class-value/-constructor/-function just to indicate that it's non-static: def compute(self, number), self.Radius, def __init__(self)
(3. Multiple inheritance: It's no coincidence that most languages don't support multiple inheritance. Normally, you don't need it and it is a trap which makes debugging almost impossible. It is definitely not a feature for a language which emphasizes code readability and clear syntax.)
S = Surprising
(Powerful ability to handle and process strings in general.)
1. Lambda/Annonymous functions: (lambda x, y : x + y)
2. Managed Attributes: property([fget[, fset[, fdel[, doc]]]])
3. Great modification abilities due to magic members/methods and type-emulation.
4. List comprehension, generator expressions and yield.
5. Function decorations. Java has this one too.
6. Localization module. It's neat and easy to localize your programs.
7. Parallel computing module!
8. Awesome network protocol capabilities.
9. Unit tests.
10. The best documantation I've ever seen.
Wednesday, 18 February 2009
Crying
Vacation! Last semester was very intense, no time for anything. Gosh, and now I'm trying to learn Java... It's far more complicated than imagined. Python was neat and easy - in comparison - but Java is far more potent, so they say. I don't quite understand why I've to write code which seems to be obsolete - at least for a beginner like me , e.g.
is equivalent to
in Python.
And next semester I'm going to die for sure... Here're my courses:
Formal Syntax: I'm more the Morphology, Phonology, Semantics type. Actually Syntax is the only thing which gives me problems.
Java: Well, it's going to be hard I'd say.
Logic: I like formal logic, really.
Artificial Intelligence: Very very interesting. Inference, neuronal networks, genetic algorithms and that stuff - you know...
Acoustic Phonetics: This sounds good: Reading spectrograms and get in touch with VoiceXML.
public class HelloWorld
{
public static void main(String argv[])
{
System.out.println("Hello World!");
}
}is equivalent to
print "Hello World!"
in Python.
And next semester I'm going to die for sure... Here're my courses:
Formal Syntax: I'm more the Morphology, Phonology, Semantics type. Actually Syntax is the only thing which gives me problems.
Java: Well, it's going to be hard I'd say.
Logic: I like formal logic, really.
Artificial Intelligence: Very very interesting. Inference, neuronal networks, genetic algorithms and that stuff - you know...
Acoustic Phonetics: This sounds good: Reading spectrograms and get in touch with VoiceXML.
Labels:
About,
Academic,
Computational Linguistics,
Programming,
Python
Monday, 22 December 2008
Python Madness
My Night at 3 AM: Hacking Python code to learn the language. That's a life! Here's what I've done so far:
1. Decimal to dual conversion:
2. Basic truthtables:
3. ASCII table. First column is the ASCII value, second column is the local interpretation, third column is the raw UTF-8 interpretation, fourth column is the hexadecimal value:
4. Perhaps a complicated dual to decimal program:
This code works in 2.6.1
1. Decimal to dual conversion:
import sys
def bd(x):
n = []
if x < 0:
return "Positive integer required"
elif x == 0:
return [0]
else:
while x > 0:
n.insert(0,x%2)
x = x/2
bd(x)
return n
if __name__ == "__main__":
try:
number = int(raw_input("Number: "))
print bd(number)
except ValueError:
sys.stderr.write("Integer required\n")
2. Basic truthtables:
def logicalAnd():
for valueOne in range(2):
for valueTwo in range(2):
print "%d %d %d"%(valueOne, valueTwo, valueOne and valueTwo)
def logicalOr():
for valueOne in range(2):
for valueTwo in range(2):
print "%d %d %d"%(valueOne, valueTwo, valueOne or valueTwo)
def logicalConditional():
for valueOne in range(2):
for valueTwo in range(2):
print "%d %d %d"%(valueOne, valueTwo, not valueOne or valueTwo)
def logicalBiconditional():
for valueOne in range(2):
for valueTwo in range(2):
print "%d %d %d"%(valueOne, valueTwo, valueOne is valueTwo)
if __name__ == "__main__":
op = raw_input("Connective: ")
if op == "and":
logicalAnd()
elif op == "or":
logicalOr()
elif op == "conditional":
logicalConditional()
elif op == "biconditional":
logicalBiconditional()
else:
print "Connective not known"
3. ASCII table. First column is the ASCII value, second column is the local interpretation, third column is the raw UTF-8 interpretation, fourth column is the hexadecimal value:
for element in xrange(256):
print "%s \t %s \t %s \t %s"%(element,%%
chr(element), str(tuple(chr(element)))%%
.strip("()'',"), chr(element).encode("hex"))
4. Perhaps a complicated dual to decimal program:
def reverseRange(input):
n = []
for i in range(len(input)-1,-1,-1):
n.append(i)
return n
def singleValues(input):
m = []
for i in input:
m.append(i)
return m
if __name__ == "__main__":
input = raw_input("Number: ")
rR = reverseRange(input)
sV = singleValues(input)
dN = 0
for i in range(len(sV)):
dN += int(sV[i])*2**int(rR[i])
print dN
This code works in 2.6.1
Subscribe to:
Comments (Atom)
