Showing posts with label python3. Show all posts
Showing posts with label python3. Show all posts

25 May 2014

Python and cmd.exe on Windows - a world of pain.

As I mentioned a few days ago on Hacker News in a Ruby thread, CPython support for Windows is, overall, extraordinarily good for a runtime with clear Unix roots. This said, occasionally you'll eventually hit a wall and find yourself cursing Guido & Bill under your breath. Yesterday was one of those times for me.

I'm currently working on a project using Python 3.3.5 on Windows 2008 r2, building a program that will talk to Weblogic 10.3 via the Jython-powered WLST interface (it actually does more than that: it leverages Jython to also instantiate several complex Java classes, launch VBS scripts and so on and so forth). In order to correctly set up WLST/Jython, I have to launch a batch file which in turns calls several other batch files in order to set up all sorts of environment variables. These are all pure-DOS batch files doing very little except creating or reading environment variables, but they're nested two or three levels deep from the entry-point batch.

For some reason, when I launched this batch with Popen(), variables were not set correctly. In fact, it looked like "call" statements in the batch were just silently ignored. I tried using shell=True, and it made no difference whatsoever. I put it down to some weird cmd.exe behaviour; tried to switch extension from .bat to .cmd and things started to move a bit more (so much for all those posts saying there is no difference between the two) but still some stuff wouldn't work, so I eventually settled for reimplementing the whole batch chain in Python (which is terrible and will likely bite me a year down the line as the version of Weblogic changes, but beggars can't be choosers).

The most frustrating thing, however, was that opening an interactive pipe to test and do exploratory programming was just too difficult. There are a lot of examples out there talking about .send(), .communicate() and stdin=subprocess.PIPE, but nobody seems to mention what I experienced: as soon as you call communicate() on a cmd.exe launched with Popen(), all pipes are closed and there is no obvious way to reopen them. I don't think this is due to cmd.exe, because the last output I got was always ">More ?"; I think this is just CPython being too eager to clean up.

Luckily, I found a solution in WinPexpect, a fork of Pexpect that actually deals with Windows weirdness. Processes launched with winpexpect.winspawn() actually keep their stdin pipes up long enough for me to figure out enough stuff to fully re-implement the batch chain.

The result is a Python script three times as long as the original batch and likely to break the first time Oracle changes a line here or there. It will do for now, but the experience left a sour taste in my mouth, so to speak; cmd.exe is a crappy shell, but it shouldn't be that hard to open a long-running prompt-like process piping stuff to it. If I'm missing an obviously-better solution, please let me know and I'll happily blog about it, because clearly Google and DuckDuckGo need to know about it.

17 December 2012

Why I've never really liked the Facebook API

The other day, I got an email from Virgin Media stating that my connection had been "upgraded to 100Mb/s". I went to a bunch of speed-testing websites, and reported speeds were indeed much higher than in the past. I was tempted to brag about it on Facebook then I remembered that, last time I did something similar, I was humbled by a bunch of Dutch friends with "big pipes". I wondered what sort of speed they reported then, so I went to Facebook to search for that old status. And that's where my problems started.

The standard FB search UI failed to return anything even vaguely related, as usual. So I started googling for apps that would allow me to search my previous posts, and found a few which just wanted to gather all my personal data (on FB -- you don't say!). Then I found that you can actually request a complete download of all your data from FB (under Settings) and launched the request, but it looked like it would take a long time (for the record, I finally got it about 24 hours later). So I thought "hey, surely I can work with the FB API". How naive of me!

There is, in fact, a straightforward API call to get your statuses: /me/statuses. By default, it will return some 25 records, and links to further paginated results. Except pagination is ridiculously buggy: after the first 50 records, it will just return a blank page. If you try to use the limit parameter, it will return a maximum of 100 records per page, and again it will stop after the second page (i.e. max 200 results, which it's actually 199 because everybody knows "there are only two hard things in computer science"). Time-based parameters (until, since) didn't seem to work at all. Using wrappers rather than direct calls didn't seem to make any difference. Being very late, I gave up and went to sleep.

A day later, still incredulous and obviously fairly frustrated, I googled harder and finally found a relevant question on StackOverflow, which pointed to a Facebook bug about pagination. As the bug says, you can work around the problem by always using offset rather than relying on 'next' and 'previous' links returned in JSON responses. I verified and that's actually the case. By now, my export was available for download anyway. You can imagine how happy I am (not).

Lessons learnt from this whole debacle:

  • The unofficial facebook-sdk for python doesn't work with Python 3. There is an experimental fork with very limited testing (i.e. it passes 2to3 and that's it).
  • the json module in Python 3 Standard Library, as used by facebook-sdk, chokes on Facebook data. Don't even ask me how I found out. Trying with a more up-to-date version from the original upstream doesn't help. There is a Python 3 fork which didn't help. Juggling between json.load and json.loads didn't seem to help, and I didn't want to rip the guts out of facebook-sdk in fear of dropping compatibility with 2.x (although I cringed at times: using "file" as variable name? Really?). No wonder @kennethreitz rolled his own JSON parser in Requests.
  • facebook-sdk should probably be rewritten from scratch in Python 3 using Requests. Not that I'll ever do it.
  • After so many years and botched revamps, the Facebook API is still terrible. For something reportedly so essential to "2.0" internet infrastructure, and with so many uber-smart people on their payroll, the whole thing still feels incredibly hackish.

19 August 2012

How to compile PyObjC for Python 3 on OSX 10.8 Mountain Lion

Another one for teh Google...

It so happens that I am curious about PyObjC, the Python bindings for Objective-C, which is the "native" language of choice for OSX/iOS.

As usual, my timing is completely wrong: recent versions of Xcode dropped support for PyObjC, and the project has shrunk to basically one person (that Ronald Oussoren I previously mentioned). The version on PyPI seems to work with Python 2.x only. Even the official page on Sourceforge is basically abandoned, and packages available from there are obsolete. This is a problem because I'm really trying hard to do everything with Python 3 these days, and the PyObjC version shipped with OSX 10.8 "Mountain Lion" is for 2.7 (the only Python version Apple ships and supports).

Luckily, from my past tribolations I knew that Ronald had his own repository on BitBucket, so I tried that and it worked fine. However, the documentation on how to build PyObjC from source is quite scarce (in fine geek tradition), and I had to figure out the following principles in the hard way:

  • Ronald's repository is split into many separate packages that have to be individually built. This is very fine-grained, but a bit cumbersome for the general case.
  • Do not use the setup.py script you'll find under /pyobjc . These are just for people pulling from PyPI, i.e. post-release.
  • /pyobjc-xcode is obsolete, and there's nothing to build there.
  • /pyobjc-framework-XgridFoundation simply refuses to build under ML. Xgrid is a somewhat obscure, proprietary Apple technology for highly-parallel computation. If you don't know what it is, chances are that you won't need it. I personally don't care about it.
  • /pyobjc-core is a requirement for all other packages, so it should be built first.
  • In order of importance, /pyobjc-framework-Cocoa, -Quartz and -CoreData are dependencies of other packages, so they should be built in this order before any other pyobjc-framework-*.
  • Python 3 support is occasionally shaky. In one occasion, one file had to be patched to remove unicode literals (the u'mystring' notation from Python 2 that was dropped in Python 3.0), but that's just a temporary snag: Python 3.3 will reintroduce that syntax as a compatibility hack for exactly this type of situation. I've submitted a patch anyway, but if you can't wait for Ronald to consider it, it is available in the below-mentioned repository.
  • Looking at BitBucket, I noticed there's at least one significant fork that is arguably targeting Python 3 more consistently. You might want to try that if Ronald's version is not good enough for you.

Because I don't plan to do this sort of work every day, I've put together a script so that I won't have to remember all this stuff when starting a new virtualenv environment. It's now available from my utils repo on BitBucket. There is no documentation but OMG IT'S FULL OF COMMENTS so there. As usual, any feedback is more than welcome.

13 August 2012

How to run py2app on OSX 10.8 Mountain Lion and Python 3.2

This post is for teh Google and all poor souls trying to use py2app on Mountain Lion.

To make it short, the latest official release of py2app does not work with ML and Python 3.2, you have to get the current development snapshot. Unfortunately, py2app requires a number of smaller libraries written by his developer, Ronald Oussoren, and most of them have to be upgraded as well (and before you curse his name: he's single-handedly maintaining py2app, pyObjC and virtualenv-mac; what have you done recently for the community?).

So here's my recipe:

  1. Clone all required repos.
    Oussoren uses Bitbucket, which is better accessed through Mercurial (hg); you can get hg from your favourite package manager (Homebrew/MacPorts/Fink/whatever).
    Then:
    hg clone https://bitbucket.org/ronaldoussoren/altgraph
    hg clone https://bitbucket.org/ronaldoussoren/macholib
    hg clone https://bitbucket.org/ronaldoussoren/py2app
    
  2. Install the packages. Since you're basically tracking trunk, you should probably use the develop mode of setuptools:
    cd altgraph && python setup.py develop && cd ..
    cd macholib && python setup.py develop && cd ..
    cd py2app && python setup.py develop && cd ..
    Note that this means you'll have to keep these "source" folders available forever. If you don't like that, you should create an egg (e.g. python setup.py bdist_egg), then install it (easy_install dist/your-resulting.egg).

    For the record, altgraph will present itself as version 0.10, macholib as 0.7, and py2app as 1.5.
  3. Now you should be able to run your python setup.py py2app

Bonus achievement: if you're using PyQt, this version of py2app will give you Retina-ready packages, by automatically adding the NSPrincipalClass key to the generated Info.plist and setting it to NSApplication. Nice one, Roland!

19 March 2012

Simple Python script to clean up HTML produced by Excel

Here's a throwaway Python script to clean up HTML produced by Microsoft Excel 2010. I leave it here just so that I can find it later, or if anybody else has the same problem -- for some reason, I couldn't google an easy solution anywhere. I'm sure this doesn't cover all the corner cases and complex layouts, but it's a starting point showing most of the techniques you'll ever need: tag stripping, attribute stripping (either en-masse or selective), and handling crappy declarations ("<!if" tags).

It's for Python 3 (although I think it'll work almost unmodified in 2.7, you'll just have to change open() calls with codecs.open()) and requires BeautifulSoup 4+, which really does all the magic. I don't know if it's the power of Py3k or BS getting better and better, but it's gone through a dozen files in a blink.

06 December 2011

State of Python 3

I want to start a little side-project, basically a few lines of HTTP automation and mail sending; nothing that some Taco Bell Programming couldn't handle, but I don't particularly like shell scripts and I figured it'd be cleaner in Python.

It's 2011 and One Should Really Use Python 3, right ? The box I'll work on is a brand new Debian Testing where Python 2.x is not even installed, perfect! I'll just grab virtualenv... Waitamminute, "python" doesn't even run, but I'm sure I had installed the python3 package...? Oh, "python3" in Debian doesn't give you /usr/bin/python but rather /usr/bin/python3 !
Thank you, Debian Project, for your consistent inconsistency dealing with anything Python-related. Ok, so a quick run of update-alternatives and that's sorted.

Now I'll

easy_install virtualenv
... sorry, easy_install3 virtualenv, of course... oops, syntax error, clearly a Python 2.x package there. Does easy_install discriminate between 2.x and 3.x package? Er, no. Joy.

Whatever, I'll just grab virtualenv.py and drop it in /usr/local/bin, right? cool, works a treat. Activated my virtualenv, let's try again to download a couple of libs... paramiko: syntax error, clearly a 2.x package. funkload: syntax error, again a 2.x package. I know, I'll use pip! ... no difference.

Basically I can choose whether to handle libraries like it was 2001, website after website, setup.py after setup.py; or I can develop like it was 2009, i.e. with python 2.x.

I think for now I'll choose life, and drop python 3.