Subclassed: python

Showing posts with label python. Show all posts

12 March 2022

Detecting Badger2040 boards and automating uploads

I recently bought a bunch of Pimoroni Badger2040 boards, and they are a lot of fun.

The Badger is basically a small microcontroller (the Raspberry Pico) with an eInk display, in the size of a typical office badge. It has a few buttons you can interact with, when powered, but because of eInk it doesn't actually need to be powered all the time - you can just set it to the desired screen, turn off the battery, and the screen will stay as it was more or less forever.

The fun bit is that it can run MicroPython, so programming it is a breeze. You don't have to deal with all the scary vagaries of C/C++; just write your Python scripts, save them to the board, and run them. Sweet!

There is already a fairly comprehensive tutorial on how to get started with Badger2040, but (like most Pico-related documentation out there) it assumes you're happy to use Thonny, an editor focused on the micropython ecosystem, in order to move files to the board. With all due respect, Thonny is a very limited editor, and it gets recommended only because it's the most intuitive when it comes to managing files on the Pico. I'm much happier when I live in my beloved PyCharm, but its MicroPython plugin is somewhat limited and requires manual interaction, so I investigated a strategy to automate the basic stuff directly from Python on my laptop.

The first step is detecting the board. It appears to the operating system as a serial port, so we have to list the available ports and find the one that looks like our guy.

 # badgerutils.py
import serial.tools.list_ports as list_ports
from serial.tools.list_ports_common import ListPortInfo
  
def is_badger(port: ListPortInfo):
    """ decide if the port looks like a Badger2040 """
    # mac, but other systems will probably be similar,
    # just add other "if" blocks for windows etc
    if sys.platform.startswith('darwin'):
    	# you should be more thorough, 
        # might want to check VID etc, but this will do for dev
        if port.manufacturer and \
           	    port.manufacturer.lower().startswith('micropython'):
            return True
    return False
  
def get_badger():
    """ loop through all the ports and find our board """
    ports = list(list_ports.comports())
    for p in ports:
        if is_badger(p):
            return p

The next step is where things get a bit hairy. Interacting over the serial port is not everyone's idea of fun, so we better stand on the shoulder of geeky giants if possible. We could dig through Thonny's code, but it's long and complicated and meant to support a lot of scenarios we don't really care about. Instead, we can reuse a little utility called ampy, which is slightly old but fairly robust and (more importantly) self-contained and easy to understand.

Ampy includes a couple of modules to interact with a micropython board. You can have a look at the functions found in its cli module to figure how to wrap them, but here's a simple approach to start pushing files to the board - some of the code is lifted almost entirely from ampy.cli, but it's MIT-licensed, so you can do that (just mention the original copyright notice somewhere, if you publish it!).

# BadgerManager.py
  
from serial.tools.list_ports_common import ListPortInfo
from ampy.files import Files, DirectoryExistsError
from ampy.pyboard import Pyboard
  
class MyBadger(Pyboard):

    def __init__(self, port: ListPortInfo):
        super(MyBadger, self).__init__(port.device)
        self.files = Files(self)

    def upload(self, file_path: Path, dest_path: Path):
        """ upload file or directory to board """
        if file_path.is_dir():
            # Directory copy, create the directory and walk all children 
            # to copy over the files. 
            for parent, child_dirs, child_files in os.walk(file_path):
                # Create board filesystem absolute path to parent directory.
                remote_parent = posixpath.normpath(
                    posixpath.join(dest_path, os.path.relpath(parent, file_path))
                )
                try:
                    # Create remote parent directory.
                    self.files.mkdir(remote_parent)
                except DirectoryExistsError:
                    # Ignore errors for directories that already exist.
                    pass
                # Loop through all the files and put them on the board too.
                for filename in child_files:
                    with open(os.path.join(parent, filename), "rb") as infile:
                        remote_filename = posixpath.join(remote_parent,
                                                         filename)
                        self.files.put(remote_filename, infile.read())
        else:
            # File copy
            # check if in subfolder
            if len(dest_path.parents) > 1:
                # subfolder was specified
                # each parent has to be created individually,
                # because of ampy limitations
                for d in sorted(dest_path.parents)[1:]:  # first is /, discard
                    self.files.mkdir(d)

            # Put the file on the board.
            with open(file_path, "rb") as infile:
                self.files.put(dest_path.absolute(), infile.read())

    def ls(self, dirname='/', recurse=True):
        """ List files on board """
        dirpath = dirname if type(dirname) == Path else Path(dirname)
        return self.files.ls(dirpath.absolute(),
                             long_format=False, recursive=recurse)

Putting both things together we can interact very easily with the board like this:

from badgerutils import get_badger
from BadgerManager import MyBadger

# Note: in real life, remember to manage error conditions ! 
port = get_badger()
board = MyBadger(port)
board.upload("./something.txt", "/something.txt")
assert('/something.txt' in board.ls())

Happy hacking!

15 August 2018

how to load initial data and test data in Django 2+

There are two ways to automatically load data in Django:

for data you need while running tests, place xml/json/yaml files in yourapp/fixtures.
for data you need while setting up the database from scratch, or at specific points in time, you must create a Migration

This is a bit annoying, because chances are these locations will get out of sync sooner or later, and it duplicates effort if you do reproducible builds, docker, and stuff like that.

The solution is to create a migration that actually loads fixtures. So:

Create your fixtures: manage.py dumpdata --output yourapp/fixtures/yourmodel.json yourapp.YourModel
Create an empty Migration: manage.py makemigrations --empty yourapp

Edit the resulting migration (the last file created under yourapp/migrations, making it look like this:

from django.db import migrations

def load_fixtures(apps, schema_editor):
    # This is what will be executed by the migration
    from django.core.management import call_command
    # this is the equivalent of running manage.py loaddata yourmodel.json
    for fixture_name in ['yourmodel']: # add any additional model here
        call_command("loaddata", fixture_name)
    # add other calls if you have multiple models

def rollback(apps, schema_editor):
    # This will be executed if you rollback the migration, so you want to clean up
    for model_name in ["YourModel"]:  # add any additional model here
        model = apps.get_model("yourapp", model_name)
        model.objects.all().delete()

class Migration(migrations.Migration):
    dependencies = [
      # ... don't touch anything here ...
    ]

    operations = [
        migrations.RunPython(load_fixtures, rollback),
    ]
# -*- coding: utf-8 -*-

Profit

Note that this does not remove the option to have data that is available only in certain situation: just don't list the fixtures you don't want in the migration, and vice-versa.

16 April 2018

Nespresso Blends - a comprehensive spreadsheet

As a faithful Nespresso user, I was a bit shocked last month when I discovered one of my favourite blends contained Robusta. I had always assumed all blends were 100% Arabica and alas, that was not the case. So I started looking up what is what, but I was quickly overwhelmed - I wasn't going to browse through 24 pages of slow-loading images to read all blurbs. Enter Python.

As it often happens, the situation degenerated quickly. The result is an Excel spreadsheet (also available in Google Docs), listing all attributes of all blends so that you can filter out what you need. Decaffeinated varieties are highlighted, because that's not real coffee 😁

08 December 2016

Best way to get locale info and localized strings on Python

One area where Windows often beats the Unix tradition is internationalisation. Traditional POSIX interfaces assume there will be One And Only One set of internationalised conventions (display language, date format etc) at runtime, and are mostly concerned with displaying data formatted for that One True Set. When you want something "international" on POSIX, you switch your locale with setlocale() and do your business. This approach unfortunately percolates in Unix-borne tools and languages, in this case Python. There is no "pure-Python" alternative to good ol' GetLocaleInfo, so there is no way to retrieve, say, the French name for January without switching the whole process locale with setlocale(). This is pretty insane and likely dangerous.
Most libraries hack their way around this by packing an arbitrary subset of i18n strings, or just go full-YOLO by switching process locale back and forth. It's a sad state of affairs.
However, there is a better way. The Unicode Consortium, in its infinite wisdom, maintains a big database of localisation metadata, the CLDR. You can either download the full set yourself and parse a bunch of XML files, or you can use the Babel library which basically does it for you.

$> pip install babel
...
$> python
>>> import babel
>>> locale = babel.Locale('fr')
>>> locale.months['format']['wide'][1]
'janvier'

Et voilà.

18 June 2016

Python SDK for Azure Basic Tutorial

As Spider-Man would say, from great enterprise comes great complexity. Microsoft cloud services are very, very enterprisey; which means they're also absurdly overcomplicated. One can probably spend most of his 30-day trial simply wandering around their dozens of different "portals" and "account management" screens. So here's a simple tutorial on going from zero to spinning up a VM with the Python SDK. (This is a work in progress, but hopefully it saves you the headaches I got).

Sign up for an Azure free trial. You'll need a phone and a credit card, because MS requires verification like pr0n sites of yore.
WAIT! DON'T DO ANYTHING! After the signup is successful and you're sent to the dashboard, chances are that your account is not actually fully formed, and you might be getting a lot of prompts about signing up for a Pay As You Go subscription. Wait 10 to 15 minutes. Grab a coffee; check Hacker News; live the enterprise life.
close your browser and go back to the portal.
Go to your active directory
Create a Global Admin user by clicking on ADD USER (not the giant NEW, that would be too easy!). Write down the temporary password. (Note: I've no idea whether it has to be a global admin, but we're just trying to keep things simple here.)
Now you have to associate the user to your Azure subscription, because you created it, it's in your AD, but obviously it's completely unrelated to your resources. Enterprise life! Go back to Azure portal, click on Subscriptions. NOTE DOWN YOUR SUBSCRIPTION ID, you'll need it later.
Click on the subscription then Settings
Click on Users (bottom right)
Click on Add, select the Owner role, then add the new user to it. (Note: again, Owner is probably a bit too powerful, but we're trying to keep things simple.) Reference here.
Now open a Private Window in your browser, or sign out of your account, because you have to log on the same portal as the new user.
After logging on, you'll be forced to change the password. Done? Good; log out, close the window, the web-based ordeal is officially over.

Create and activate a virtualenv (this procedure will differ depending on your platform/setup, reference here):

mkdir azure_test && cd azure_test
pyvenv-3.5 env
source env/bin/activate
pip install --upgrade pip   # this is optional but good practice

install the Azure sdk
```
pip install --pre azure
```

Launch python and get cracking:

sub_id = 'your-sub-id'  # you should have got this earlier, it's visible in "Subscriptions"
# authentication reference at
# http://azure-sdk-for-python.readthedocs.io/en/latest/resourcemanagementauthentication.html#using-ad-user-password
from azure.common.credentials import UserPassCredentials
credentials = UserPassCredentials('yourADuser@youraccount.onmicrosoft.com','youropassword')
from azure.mgmt.resource.resources import ResourceManagementClient
resource_client = ResourceManagementClient(credentials, sub_id)

# one-off registrations, supposedly you won't need them next time
resource_client.providers.register('Microsoft.Compute')
resource_client.providers.register('Microsoft.Network')
resource_client.providers.register('Microsoft.Storage')

# create the clients
from azure.mgmt.compute import ComputeManagementClient
compute_client = ComputeManagementClient(credentials, sub_id)
from azure.mgmt.network import NetworkManagementClient
network_client = NetworkManagementClient(credentials, sub_id)
from azure.mgmt.storage import StorageManagementClient
storage_client = StorageManagementClient(credentials, sub_id)

Now follow the code to create a VM here, skipping the 4 lines that define resource_client, storage_client etc, because you already have them.

07 November 2015

Link feeds about Weblogic, EPM, Hyperion etc

I've recently started using the excellent Pinboard bookmark manager (a wonderful throwback to the glory days of del.icio.us) to keep track of interesting posts. While I think about the best way to syndicate those links across all my public accounts, you can check out these feeds:

Everything -- https://pinboard.in/u:toyg/ [ RSS ]
Python -- https://pinboard.in/u:toyg/t:python/ [ RSS ]
Weblogic -- https://pinboard.in/u:toyg/t:weblogic/ [ RSS ]
EPM -- https://pinboard.in/u:toyg/t:epm/ [ RSS ]
OSX -- https://pinboard.in/u:toyg/t:osx [ RSS ]

19 July 2014

Oracle ODBC Connection Strings - how I learnt to stop googling and RTFM

I just wasted four hours on the most idiotic thing, so I thought I'd document it here as self-reference.

Background: to connect to some Oracle db, I'm using the excellent pypyodbc module, which is a pure-Python ODBC implementation - basically a not-so-thin layer on top of your installed ODBC providers - that works great with Python 3. If you have to support multiple database vendors (in my case, Oracle, MSSQL, DB2 and maybe others), it makes sense to avoid packing a module for each product and just let ODBC work its magic.

The main problem with ODBC has always been the dark magic involved in crafting connection strings. Each driver provides different options, and when the syntax is not correct, in most cases there is precious little feedback. This is why we have sites like connectionstrings.com.

In my case, the connection string I was using worked fine with TNS names (the stuff in tnsnames.ora) like this:

Driver={Oracle in OraClient11g_home1};DBQ=myTnsServiceName; Uid=myUsername; Pwd=myPassword;

However, I did not want to rely on that particular catalog (which is often misconfigured/broken in the real world), and would rather specify the usual host, port and sid trimurti. So I went on connectionstrings.com and found the following:

Driver={Oracle in OraClient11g_home1}; Server=serverSID; Uid=myUsername; Pwd=myPassword;

... and then I spent four hours figuring out why it wasn't working. I turned on all tracing options, spent ages reading tracing logs, tried umpteen different values for SERVER... all for nought: from logs, it was clear that my SERVER option was completely disregarded and replaced with some default "orcl" values.

Desperate, I eventually thought of daring the (usually unwieldy) original driver documentation from Oracle. And lo, I've found in the FAQ doc for Oracle ODBC, on page 13, a very helpful table listing all the options you can specify in a connection string. "SERVER" was nowhere to be seen. Ouch.

It turns out the trick was to keep using "DBQ" and just replace it with the standard Oracle network syntax:

Driver={Oracle in OraClient11g_home1}; DBQ=myserver.mydomain.com:1521/mySid; Uid=myUsername; Pwd=myPassword;

In the end, I wasted 4 hours because I thought googling would have been faster than Reading The Fine Manual. Lesson learnt.

06 June 2014

Dash docset for Python 2.2.1 (i.e. Jython for Weblogic / Websphere)

I use Dash quite a bit, so I just spent a little bit of time creating a docset from Python 2.2.1 documentation. This old Python version matches the Jython implementation shipped with Oracle WebLogic ("WebLogic Scripting Tool", or WLST) and IBM WebSphere.

To install it in your Dash, just click on this link:dash-feed://https%3A%2F%2Fraw.githubusercontent.com%2Ftoyg%2Fpy221dashdocs%2Fmaster%2Ffeed.xml

The source script is in my GitHub repo, and you can manually download resulting packages on the Release page.

As tempting as it is, the idea to repackage webapp-specific documentation (e.g. for connect(), startEdit() etc) is a non-starter due to Oracle and IBM being quite trigger-happy with their copyright lawyers.

25 May 2014

Python and cmd.exe on Windows - a world of pain.

As I mentioned a few days ago on Hacker News in a Ruby thread, CPython support for Windows is, overall, extraordinarily good for a runtime with clear Unix roots. This said, occasionally you'll eventually hit a wall and find yourself cursing Guido & Bill under your breath. Yesterday was one of those times for me.

I'm currently working on a project using Python 3.3.5 on Windows 2008 r2, building a program that will talk to Weblogic 10.3 via the Jython-powered WLST interface (it actually does more than that: it leverages Jython to also instantiate several complex Java classes, launch VBS scripts and so on and so forth). In order to correctly set up WLST/Jython, I have to launch a batch file which in turns calls several other batch files in order to set up all sorts of environment variables. These are all pure-DOS batch files doing very little except creating or reading environment variables, but they're nested two or three levels deep from the entry-point batch.

For some reason, when I launched this batch with Popen(), variables were not set correctly. In fact, it looked like "call" statements in the batch were just silently ignored. I tried using shell=True, and it made no difference whatsoever. I put it down to some weird cmd.exe behaviour; tried to switch extension from .bat to .cmd and things started to move a bit more (so much for all those posts saying there is no difference between the two) but still some stuff wouldn't work, so I eventually settled for reimplementing the whole batch chain in Python (which is terrible and will likely bite me a year down the line as the version of Weblogic changes, but beggars can't be choosers).

The most frustrating thing, however, was that opening an interactive pipe to test and do exploratory programming was just too difficult. There are a lot of examples out there talking about .send(), .communicate() and stdin=subprocess.PIPE, but nobody seems to mention what I experienced: as soon as you call communicate() on a cmd.exe launched with Popen(), all pipes are closed and there is no obvious way to reopen them. I don't think this is due to cmd.exe, because the last output I got was always ">More ?"; I think this is just CPython being too eager to clean up.

Luckily, I found a solution in WinPexpect, a fork of Pexpect that actually deals with Windows weirdness. Processes launched with winpexpect.winspawn() actually keep their stdin pipes up long enough for me to figure out enough stuff to fully re-implement the batch chain.

The result is a Python script three times as long as the original batch and likely to break the first time Oracle changes a line here or there. It will do for now, but the experience left a sour taste in my mouth, so to speak; cmd.exe is a crappy shell, but it shouldn't be that hard to open a long-running prompt-like process piping stuff to it. If I'm missing an obviously-better solution, please let me know and I'll happily blog about it, because clearly Google and DuckDuckGo need to know about it.

18 July 2013

Unicode URL-handling in web.py

Web.py is a lovely tool I'm currently using for a silly project (warning: explicit Italian language). Unfortunately, it does some clever things to support URLs containing Unicode, but then drops the ball when it comes to actually do anything with them (i.e. dispatch/route them as expected, using regular expressions that actually match Unicode objects).

This was a real problem in my app, so I came up with a quick and dirty patch, which may or may not work for you and may or may not break other things. Basically I've tracked down the regex operations on URLs, and added Python's re.UNICODE flag to them, so that unicode characters will be matched as "\w" etc.

Feel free to tell me where I'm going wrong -- I'm not a web.py guru by all means -- but this little patch significantly improved my quality of life today, so to speak.

17 December 2012

Why I've never really liked the Facebook API

The other day, I got an email from Virgin Media stating that my connection had been "upgraded to 100Mb/s". I went to a bunch of speed-testing websites, and reported speeds were indeed much higher than in the past. I was tempted to brag about it on Facebook then I remembered that, last time I did something similar, I was humbled by a bunch of Dutch friends with "big pipes". I wondered what sort of speed they reported then, so I went to Facebook to search for that old status. And that's where my problems started.

The standard FB search UI failed to return anything even vaguely related, as usual. So I started googling for apps that would allow me to search my previous posts, and found a few which just wanted to gather all my personal data (on FB -- you don't say!). Then I found that you can actually request a complete download of all your data from FB (under Settings) and launched the request, but it looked like it would take a long time (for the record, I finally got it about 24 hours later). So I thought "hey, surely I can work with the FB API". How naive of me!

There is, in fact, a straightforward API call to get your statuses: /me/statuses. By default, it will return some 25 records, and links to further paginated results. Except pagination is ridiculously buggy: after the first 50 records, it will just return a blank page. If you try to use the limit parameter, it will return a maximum of 100 records per page, and again it will stop after the second page (i.e. max 200 results, which it's actually 199 because everybody knows "there are only two hard things in computer science"). Time-based parameters (until, since) didn't seem to work at all. Using wrappers rather than direct calls didn't seem to make any difference. Being very late, I gave up and went to sleep.

A day later, still incredulous and obviously fairly frustrated, I googled harder and finally found a relevant question on StackOverflow, which pointed to a Facebook bug about pagination. As the bug says, you can work around the problem by always using offset rather than relying on 'next' and 'previous' links returned in JSON responses. I verified and that's actually the case. By now, my export was available for download anyway. You can imagine how happy I am (not).

Lessons learnt from this whole debacle:

The unofficial facebook-sdk for python doesn't work with Python 3. There is an experimental fork with very limited testing (i.e. it passes 2to3 and that's it).
the json module in Python 3 Standard Library, as used by facebook-sdk, chokes on Facebook data. Don't even ask me how I found out. Trying with a more up-to-date version from the original upstream doesn't help. There is a Python 3 fork which didn't help. Juggling between json.load and json.loads didn't seem to help, and I didn't want to rip the guts out of facebook-sdk in fear of dropping compatibility with 2.x (although I cringed at times: using "file" as variable name? Really?). No wonder @kennethreitz rolled his own JSON parser in Requests.
facebook-sdk should probably be rewritten from scratch in Python 3 using Requests. Not that I'll ever do it.
After so many years and botched revamps, the Facebook API is still terrible. For something reportedly so essential to "2.0" internet infrastructure, and with so many uber-smart people on their payroll, the whole thing still feels incredibly hackish.

19 August 2012

How to compile PyObjC for Python 3 on OSX 10.8 Mountain Lion

Another one for teh Google...

It so happens that I am curious about PyObjC, the Python bindings for Objective-C, which is the "native" language of choice for OSX/iOS.

As usual, my timing is completely wrong: recent versions of Xcode dropped support for PyObjC, and the project has shrunk to basically one person (that Ronald Oussoren I previously mentioned). The version on PyPI seems to work with Python 2.x only. Even the official page on Sourceforge is basically abandoned, and packages available from there are obsolete. This is a problem because I'm really trying hard to do everything with Python 3 these days, and the PyObjC version shipped with OSX 10.8 "Mountain Lion" is for 2.7 (the only Python version Apple ships and supports).

Luckily, from my past tribolations I knew that Ronald had his own repository on BitBucket, so I tried that and it worked fine. However, the documentation on how to build PyObjC from source is quite scarce (in fine geek tradition), and I had to figure out the following principles in the hard way:

Ronald's repository is split into many separate packages that have to be individually built. This is very fine-grained, but a bit cumbersome for the general case.
Do not use the setup.py script you'll find under /pyobjc . These are just for people pulling from PyPI, i.e. post-release.
/pyobjc-xcode is obsolete, and there's nothing to build there.
/pyobjc-framework-XgridFoundation simply refuses to build under ML. Xgrid is a somewhat obscure, proprietary Apple technology for highly-parallel computation. If you don't know what it is, chances are that you won't need it. I personally don't care about it.
/pyobjc-core is a requirement for all other packages, so it should be built first.
In order of importance, /pyobjc-framework-Cocoa, -Quartz and -CoreData are dependencies of other packages, so they should be built in this order before any other pyobjc-framework-*.
Python 3 support is occasionally shaky. In one occasion, one file had to be patched to remove unicode literals (the u'mystring' notation from Python 2 that was dropped in Python 3.0), but that's just a temporary snag: Python 3.3 will reintroduce that syntax as a compatibility hack for exactly this type of situation. I've submitted a patch anyway, but if you can't wait for Ronald to consider it, it is available in the below-mentioned repository.
Looking at BitBucket, I noticed there's at least one significant fork that is arguably targeting Python 3 more consistently. You might want to try that if Ronald's version is not good enough for you.

Because I don't plan to do this sort of work every day, I've put together a script so that I won't have to remember all this stuff when starting a new virtualenv environment. It's now available from my utils repo on BitBucket. There is no documentation but OMG IT'S FULL OF COMMENTS so there. As usual, any feedback is more than welcome.

13 August 2012

How to run py2app on OSX 10.8 Mountain Lion and Python 3.2

This post is for teh Google and all poor souls trying to use py2app on Mountain Lion.

To make it short, the latest official release of py2app does not work with ML and Python 3.2, you have to get the current development snapshot. Unfortunately, py2app requires a number of smaller libraries written by his developer, Ronald Oussoren, and most of them have to be upgraded as well (and before you curse his name: he's single-handedly maintaining py2app, pyObjC and virtualenv-mac; what have you done recently for the community?).

So here's my recipe:

Clone all required repos.
Oussoren uses Bitbucket, which is better accessed through Mercurial (hg); you can get hg from your favourite package manager (Homebrew/MacPorts/Fink/whatever).
Then:
```
hg clone https://bitbucket.org/ronaldoussoren/altgraph
hg clone https://bitbucket.org/ronaldoussoren/macholib
hg clone https://bitbucket.org/ronaldoussoren/py2app
```
Install the packages. Since you're basically tracking trunk, you should probably use the develop mode of setuptools:
```
cd altgraph && python setup.py develop && cd ..
cd macholib && python setup.py develop && cd ..
cd py2app && python setup.py develop && cd ..
```
Note that this means you'll have to keep these "source" folders available forever. If you don't like that, you should create an egg (e.g. python setup.py bdist_egg), then install it (easy_install dist/your-resulting.egg).

For the record, altgraph will present itself as version 0.10, macholib as 0.7, and py2app as 1.5.
Now you should be able to run your python setup.py py2app

Bonus achievement: if you're using PyQt, this version of py2app will give you Retina-ready packages, by automatically adding the NSPrincipalClass key to the generated Info.plist and setting it to NSApplication. Nice one, Roland!

27 March 2012

RDP Quick Screenshots, Or: How I've Learnt To Stop Worrying And Reverse The Problem

My work involves installing stuff on customers' servers, mostly running Windows. I usually have very limited access to them, often having to go through the customers' own computers, and what I can or cannot install is regulated by strict policies (which is good practice). And of course, one wants to minimize potential problems and maximize performance, so only the minimum amount of necessary applications and tools are installed. This would all be fine, if I didn't have to take lots and lots of screenshots in order to document (and prove) what I'm doing and how I'm doing it.

This is not a problem if I can work from my laptop, where I can run a powerful app like SnagIt or Camtasia, but it's a real pain if I have to use other hardware. If it's a simple environment with a handful of machines, I can make do with the default Remote Desktop client (mstsc.exe); if I'm lucky, it'll be a modern version that supports the CTRL-ALT-+ shortcut, which takes a screenshot of the active window inside the RDP session. That's not ideal: the resulting images are large BMP files, and you have to manually paste each one into a document right after taking the screenshot; it breaks your flow and there's a good chance you'll forget to paste it right away and lose the image after some careless CTRL-C... but I guess I could live with it.

Unfortunately, I mostly have to work on environments including dozens of machines, so the only practical approach is to use a RDP manager; since I cannot install any fancy app, it usually means I have to make do with the Remote Desktop Console (tsmmc.msc) or its modern equivalent Remote Desktop Manager. That means saying bye-bye to CTRL-ALT-+ and hello PrintScreen and mspaint.exe/Edit/Crop. Argh.

Today I thought I'd solve this problem once and for all. As Bruno Oliveira eloquently illustrated in his chart, automation is The Way of The Geek, and I am a goddamn geek. Embracing my Google-fu, I set off to find The One True Tool for this task.

My first stop was QuickScreenShots. It's a simple screenshotting app that doesn't require installation; just unzip it on the server and off you go. It features shortcuts to take screenshots of an active window, arbitrary region or full desktop; images can be automatically saved to a specific folder; best of all, it's written in (ta-daaa!) Python! w00t!

Unfortunately, it doesn't feature anything similar to CTRL-ALT-+. Not a problem, I thought: where there's Python, there's a way. Except that it didn't turn out to be the case here. RDP deals in graphic screens, not desktop widgets, and it has no concept of something like "the active window"; this is what Raymond Chen himself told me, and Raymond knows a thing or two about Windows (euphemism of the month). Mstsc.exe probably uses an undocumented extension (I guess through the Virtual Channel interfaces for RDP "plugins") to get the active window, and as far as I can see, it doesn't expose the feature through automation objects (although I haven't looked very hard, to be honest; at the end of the day, I figured it would probably be inaccessible when run through tsmmc.msc anyway). At one point I've even tried to hack it by using WshShell.SendKeys to fake a CTRL-ALT-+, but somehow it didn't work (I find SendKeys quite "temperamental" and very dependent on the Windows version; on one XP image, for example, the documented {PRTSC} keycode simply wouldn't work for me).

Sad and lonely, I was almost resigned to long, intimate sessions with mspaint, when I had the most classic epiphany. I realized my problem could be easily solved by reversing the approach: instead of trying to pull screenshots through the RDP client, I could run QuickScreenShots on all machines (after all, it's portable!), inside the RDP server sessions. I just need to point the "autosave folder" to a network share and lo, all my screenshots of the active window should end up there, nicely saved as PNG. It's so easy it almost hurts, considering I've wasted a couple of hours going through MSDN, but I'm happy I've found a decent solution anyway.

19 March 2012

Simple Python script to clean up HTML produced by Excel

Here's a throwaway Python script to clean up HTML produced by Microsoft Excel 2010. I leave it here just so that I can find it later, or if anybody else has the same problem -- for some reason, I couldn't google an easy solution anywhere. I'm sure this doesn't cover all the corner cases and complex layouts, but it's a starting point showing most of the techniques you'll ever need: tag stripping, attribute stripping (either en-masse or selective), and handling crappy declarations ("<!if" tags).

It's for Python 3 (although I think it'll work almost unmodified in 2.7, you'll just have to change open() calls with codecs.open()) and requires BeautifulSoup 4+, which really does all the magic. I don't know if it's the power of Py3k or BS getting better and better, but it's gone through a dozen files in a blink.

04 January 2012

IntelliJ PyCharm 2.0 and Jython 2.2 don't really go together

UPDATE: Dmitry Jemerov from IntelliJ responded, explaining what the situation is. TL;DR: 2.2 is simply too old, other features might come if there is demand. I've amended the post to reflect this.

Let me preface this rant by saying that I've been happily using JetBrains PyCharm for a few months, and it's certainly one of the best Python IDEs out there. The price is ridiculously low and if you're serious about Python, buying PyCharm is one of the best investments you can make. It can be used for free for 30 days, so you really should give it a shot.

This said, if you happen to work with Jython 2.2, you'll probably want to use something else. The claim that Jython is fully supported as a runtime, while literally true, ~~is somehow stretched~~ is only valid for 2.5+.

Let's say you work on Windows, and you have your Jython installed under C:\jython-2.2 (yes, it's damn old, but it's still the most widely-deployed release out there -- just ask IBM and Oracle).

You create a new project in PyCharm, then go to Settings -> Python interpreter, remove the preconfigured CPython runtime, then click on Add and point to your C:\jython-2.2\jython.bat. Bang, "SDK is not valid". The list of library paths, which supposed to be automatically generated by the IDE picking up the environment configuration, is now empty.

Still, PyCharm should be smart enough to parse arbitrary .py files in specific directories, right? So let's click on Add... to point to C:\jython-2.2\Lib, then OK.
Now let's create a .py file, "from pprint import pprint", the module is recognised; "Run" the script, output is correct, life is good: Jython is indeed supported as a Python runtime.

Ok, let's do some file I/O, "import os"... uh, os is not recognised as a valid module. Same for sys. Apparently, they are somewhat special in Jython and are implemented directly into the main jar, so PyCharm can't see them for autocompletion or any other smart feature. I don't know what else is "special" in Jython 2.2, but I'd rather not have to find out.

Which brings us to the main shortcoming of PyCharm as a Jython IDE: it simply won't recognise or parse any Java jar. This is somewhat surprising, considering how the program is basically a spin-off of IntelliJ IDEA, a Java IDE, and is completely built on Java. In fact, it shares the codebase with the Python Plugin for IDEA. One would think PyCharm would be ideally suited to the task of handling the "Python on Java" mesh that is Jython, but alas that's not the case. A quick search on the IntelliJ forum brings up recent posts stating that full Jython support for autocompletion is simply not on the cards; Jython is supported as a runtime and nothing more. In fact, the Python plugin for IDEA ~~probably~~ handles a Jython setup better than PyCharm, and that's not going to change any time soon. The main target for PyCharm are clearly Django/web developers, not integrators. UPDATE: see this blog post for more details on the real situation.

This state of things is a bit saddening. I don't know if this is a way for JetBrains to avoid cannibalizing its main cash-cow (IDEA), or simply a commercial oversight; the fact is that we have a product, ideally positioned to completely own a niche, which simply refuses to do so and actually delivers a second-rate experience. I hope JetBrains will re-evaluate their stance at some point, because it's a bit of a shame really.

21 December 2011

VBScript, PowerShell, Python, IronPython, Jython, or... ?

In my job, more and more I find that we end up building distributed architectures, i.e. multi-server installations where services are routinely spread around 20+ Windows boxes, plus the occasional Unix. These services have specific startup and shutdown sequences and dependencies, so they can't just be set to Automatic startup; we usually provide batch files to manage them, but this is quickly becoming ugly and unreliable as the number of machines go up every year. It's also tricky to test actual service availability -- some are binary-based, some are HTTP based, some are Weblogic services, etc etc etc... So I'm investigating alternatives.

The first, natural choice for me was obviously Python: it does everything I need, it's flexible etc. However, distributing scripts to customers is a bitch; either you compile everything with cx_freeze (crashy) or py2exe (no python 3! no 64bit bundling! party like it was 2004! have fun tracking down which un-redistributable DLL or manifest you need on each release...), or you drop an entire environment and teach the customer what python is -- not ideal.

The traditional "native" approach to these problems in the Windows world is VBScript. It's fairly flexible, doesn't need to be deployed on Windows Win2003+ (yes, we still deal with loads of Win2003 servers), documentation is extensive and there are plenty of resources out there.
The problem with VBScript, apart from the ugly syntax and pseudo-OOP quirkiness, is that it's clearly seen as legacy by Microsoft. Year after year, running scripts becomes more cumbersome, security checks increase and new technologies don't expose the necessary interfaces. Does it make sense, in 2011, to invest time and effort building solutions that are, de-facto, already obsolete?

So we come to PowerShell, Microsoft's "future of scripting", which must be the less intuitive shell I've ever had the pleasure to deal with. I simply can't get my head around the way it deals with input and output; it doesn't seem to have reference assignment, so you have to retrieve an object on every line before you can use it; the syntax seems to combine the worst of Perl and Bash, and it quickly becomes unreadable. Also, deploying it on anything older than Vista has to be done manually and has to be justified to customers.

I honestly can't see a good solution here. I keep looking at IronPython, but its infrastructure baffles me and I wouldn't know where to start redistributing programs (I don't use Visual Studio). It's clearly a second-class citizen in the .Net world, with all that it entails.

Maybe Jython? After all, the products we install will drop JREs absolutely everywhere, so I could leverage that. I'd like to avoid going full-Java if I can, I hate the whole byzantine toolchain and I'm not really up to speed with post-1.4 changes; plus, there's always a degree of customization required in each environment, so I'd like to keep stuff as easily-editable as possible.

Please feel free to drop any suggestions in comments, I could really do with them!

06 December 2011

State of Python 3

I want to start a little side-project, basically a few lines of HTTP automation and mail sending; nothing that some Taco Bell Programming couldn't handle, but I don't particularly like shell scripts and I figured it'd be cleaner in Python.

It's 2011 and One Should Really Use Python 3, right ? The box I'll work on is a brand new Debian Testing where Python 2.x is not even installed, perfect! I'll just grab virtualenv... Waitamminute, "python" doesn't even run, but I'm sure I had installed the python3 package...? Oh, "python3" in Debian doesn't give you /usr/bin/python but rather /usr/bin/python3 !
Thank you, Debian Project, for your consistent inconsistency dealing with anything Python-related. Ok, so a quick run of update-alternatives and that's sorted.

Now I'll

easy_install virtualenv

... sorry, easy_install3 virtualenv, of course... oops, syntax error, clearly a Python 2.x package there. Does easy_install discriminate between 2.x and 3.x package? Er, no. Joy.

Whatever, I'll just grab virtualenv.py and drop it in /usr/local/bin, right? cool, works a treat. Activated my virtualenv, let's try again to download a couple of libs... paramiko: syntax error, clearly a 2.x package. funkload: syntax error, again a 2.x package. I know, I'll use pip! ... no difference.

Basically I can choose whether to handle libraries like it was 2001, website after website, setup.py after setup.py; or I can develop like it was 2009, i.e. with python 2.x.

I think for now I'll choose life, and drop python 3.

16 September 2011

Python WMI, services and UAC

In this day and age, security-conscious Windows users "enjoy" the protection of User Access Control (UAC), a feature introduced with Vista and now a stalwart of the Microsoft world in Windows Server 2008 and Windows 7. This is why you get prompted to "allow this program to make changes to your computer" every time you install a program, or if you run a lot of "legacy" Win32 applications.

Under UAC, even when you are a local administrator, the programs you launch will not, by default, enjoy all system privileges you would expect. Until you get prompted by UAC to allow for system access, processes will run under a basic user profile.

This is a problem when you try to interoperate with Windows from Python. As soon as you try to manipulate system objects, for example to start/stop services, you'll get a lot of "Access Denied" return codes.

One solution, obviously, is to use the "Run as Administrator" option to launch python.exe, so that you get prompted by UAC and the resulting process will effectively run under elevated privileges. Note that it's not enough to launch Command Prompt (cmd.exe) with "run as an administrator" and to then call python.exe from there. You must explicitly hunt down python.exe in explorer and right-click on it, nothing less will do.

This is fine and dandy for the basic REPL interpreter, but what if you need to do it from a script? How do you elevate the process from inside a running python.exe?

The answer is to use ShellExecute with the "runas" verb, like this:

import win32api
win32api.ShellExecute( 0, # parent window
    "runas", # need this to force UAC to act
    "C:\\python27\\python.exe", 
    "c:\\path\\to\\script.py", 
    "C:\\python27", # base dir
    1 ) # window visibility - 1: visible, 0: background

UAC will then prompt the user and elevate the process. For an example script that you can only run under elevated privileges, check this:

import wmi
c = wmi.WMI()
serviceToStart = 'aspnet_state' # example
for service in c.Win32_Service(Name=serviceToStart): 
    service.StartService()

Note: I haven't tested this with py2exe yet, but at worst you should be able to right-click on the py2exe-generated binary and select to Run as Administrator anyway.

23 March 2011

Social experiments with Facebook IDs

Having just watched "The Social Network", I stumbled on a post on Twitter pointing to graph.facebook.com, the free API you can use to scrape the shit out of FB (well, almost).

Turns out the API will work with IDs. Since FB started as a Harvard-only site, the first few hundred users were all Harvard alumni, obviously. So I started thinking about simple experiments like finding the most popular surnames, certain of having my class-based prejudices reinforced by loads of Winklevoss-style "aristonames". Turns out the most common names are actually Asian -- the elites of tomorrow, of course.

That's the issue, isn't it? Harvard is (supposedly) a top institution, churning out the "elites of tomorrow"; they won't all become Mark Zuckerberg, but they probably won't be homeless either.

So, as a joke, I wrote a script looking for Wikipedia pages dedicated to the first 1000 users of Facebook. Turns out there are a lot of very common names, which obviously result in false positives; unfortunately Wikipedia doesn't give you easily-parsed metadata (here's a new project for Jimbo Wales and friends), so I couldn't do things like discarding everyone born before 1970. With a bit of patience, I narrowed down the number to a rough 6%. Some of them are (or were) Facebook employees, of course, but there are also young poets, writers and comedians.

You would probably get better results by replacing Wikipedia with LinkedIn, which would include more successful businesspeople and professionals -- Harvard's bread and butter. Obviously you could also start digging across the entire FB userbase, beyond the first lucky Harvardites.

These web APIs are a great tool for smart researchers; you now have a lot of data to be correlated with a little bit of programming glue and very little time. The result might not be scientifically exact, but could still unearth surprising insights.