[2] | 1 | :tocdepth: 2
|
---|
| 2 |
|
---|
| 3 | =========================
|
---|
| 4 | Library and Extension FAQ
|
---|
| 5 | =========================
|
---|
| 6 |
|
---|
[391] | 7 | .. only:: html
|
---|
[2] | 8 |
|
---|
[391] | 9 | .. contents::
|
---|
| 10 |
|
---|
[2] | 11 | General Library Questions
|
---|
| 12 | =========================
|
---|
| 13 |
|
---|
| 14 | How do I find a module or application to perform task X?
|
---|
| 15 | --------------------------------------------------------
|
---|
| 16 |
|
---|
| 17 | Check :ref:`the Library Reference <library-index>` to see if there's a relevant
|
---|
| 18 | standard library module. (Eventually you'll learn what's in the standard
|
---|
[391] | 19 | library and will be able to skip this step.)
|
---|
[2] | 20 |
|
---|
| 21 | For third-party packages, search the `Python Package Index
|
---|
| 22 | <http://pypi.python.org/pypi>`_ or try `Google <http://www.google.com>`_ or
|
---|
| 23 | another Web search engine. Searching for "Python" plus a keyword or two for
|
---|
| 24 | your topic of interest will usually find something helpful.
|
---|
| 25 |
|
---|
| 26 |
|
---|
| 27 | Where is the math.py (socket.py, regex.py, etc.) source file?
|
---|
| 28 | -------------------------------------------------------------
|
---|
| 29 |
|
---|
| 30 | If you can't find a source file for a module it may be a built-in or
|
---|
| 31 | dynamically loaded module implemented in C, C++ or other compiled language.
|
---|
| 32 | In this case you may not have the source file or it may be something like
|
---|
[391] | 33 | :file:`mathmodule.c`, somewhere in a C source directory (not on the Python Path).
|
---|
[2] | 34 |
|
---|
| 35 | There are (at least) three kinds of modules in Python:
|
---|
| 36 |
|
---|
| 37 | 1) modules written in Python (.py);
|
---|
| 38 | 2) modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc);
|
---|
| 39 | 3) modules written in C and linked with the interpreter; to get a list of these,
|
---|
| 40 | type::
|
---|
| 41 |
|
---|
| 42 | import sys
|
---|
| 43 | print sys.builtin_module_names
|
---|
| 44 |
|
---|
| 45 |
|
---|
| 46 | How do I make a Python script executable on Unix?
|
---|
| 47 | -------------------------------------------------
|
---|
| 48 |
|
---|
| 49 | You need to do two things: the script file's mode must be executable and the
|
---|
| 50 | first line must begin with ``#!`` followed by the path of the Python
|
---|
| 51 | interpreter.
|
---|
| 52 |
|
---|
| 53 | The first is done by executing ``chmod +x scriptfile`` or perhaps ``chmod 755
|
---|
| 54 | scriptfile``.
|
---|
| 55 |
|
---|
| 56 | The second can be done in a number of ways. The most straightforward way is to
|
---|
| 57 | write ::
|
---|
| 58 |
|
---|
| 59 | #!/usr/local/bin/python
|
---|
| 60 |
|
---|
| 61 | as the very first line of your file, using the pathname for where the Python
|
---|
| 62 | interpreter is installed on your platform.
|
---|
| 63 |
|
---|
| 64 | If you would like the script to be independent of where the Python interpreter
|
---|
[391] | 65 | lives, you can use the :program:`env` program. Almost all Unix variants support
|
---|
| 66 | the following, assuming the Python interpreter is in a directory on the user's
|
---|
| 67 | :envvar:`PATH`::
|
---|
[2] | 68 |
|
---|
| 69 | #!/usr/bin/env python
|
---|
| 70 |
|
---|
[391] | 71 | *Don't* do this for CGI scripts. The :envvar:`PATH` variable for CGI scripts is
|
---|
| 72 | often very minimal, so you need to use the actual absolute pathname of the
|
---|
[2] | 73 | interpreter.
|
---|
| 74 |
|
---|
[391] | 75 | Occasionally, a user's environment is so full that the :program:`/usr/bin/env`
|
---|
| 76 | program fails; or there's no env program at all. In that case, you can try the
|
---|
[2] | 77 | following hack (due to Alex Rezinsky)::
|
---|
| 78 |
|
---|
| 79 | #! /bin/sh
|
---|
| 80 | """:"
|
---|
| 81 | exec python $0 ${1+"$@"}
|
---|
| 82 | """
|
---|
| 83 |
|
---|
| 84 | The minor disadvantage is that this defines the script's __doc__ string.
|
---|
| 85 | However, you can fix that by adding ::
|
---|
| 86 |
|
---|
| 87 | __doc__ = """...Whatever..."""
|
---|
| 88 |
|
---|
| 89 |
|
---|
| 90 |
|
---|
| 91 | Is there a curses/termcap package for Python?
|
---|
| 92 | ---------------------------------------------
|
---|
| 93 |
|
---|
| 94 | .. XXX curses *is* built by default, isn't it?
|
---|
| 95 |
|
---|
[391] | 96 | For Unix variants the standard Python source distribution comes with a curses
|
---|
| 97 | module in the :source:`Modules` subdirectory, though it's not compiled by default.
|
---|
| 98 | (Note that this is not available in the Windows distribution -- there is no
|
---|
| 99 | curses module for Windows.)
|
---|
[2] | 100 |
|
---|
[391] | 101 | The :mod:`curses` module supports basic curses features as well as many additional
|
---|
[2] | 102 | functions from ncurses and SYSV curses such as colour, alternative character set
|
---|
| 103 | support, pads, and mouse support. This means the module isn't compatible with
|
---|
| 104 | operating systems that only have BSD curses, but there don't seem to be any
|
---|
| 105 | currently maintained OSes that fall into this category.
|
---|
| 106 |
|
---|
| 107 | For Windows: use `the consolelib module
|
---|
| 108 | <http://effbot.org/zone/console-index.htm>`_.
|
---|
| 109 |
|
---|
| 110 |
|
---|
| 111 | Is there an equivalent to C's onexit() in Python?
|
---|
| 112 | -------------------------------------------------
|
---|
| 113 |
|
---|
| 114 | The :mod:`atexit` module provides a register function that is similar to C's
|
---|
[391] | 115 | :c:func:`onexit`.
|
---|
[2] | 116 |
|
---|
| 117 |
|
---|
| 118 | Why don't my signal handlers work?
|
---|
| 119 | ----------------------------------
|
---|
| 120 |
|
---|
| 121 | The most common problem is that the signal handler is declared with the wrong
|
---|
| 122 | argument list. It is called as ::
|
---|
| 123 |
|
---|
| 124 | handler(signum, frame)
|
---|
| 125 |
|
---|
| 126 | so it should be declared with two arguments::
|
---|
| 127 |
|
---|
| 128 | def handler(signum, frame):
|
---|
| 129 | ...
|
---|
| 130 |
|
---|
| 131 |
|
---|
| 132 | Common tasks
|
---|
| 133 | ============
|
---|
| 134 |
|
---|
| 135 | How do I test a Python program or component?
|
---|
| 136 | --------------------------------------------
|
---|
| 137 |
|
---|
| 138 | Python comes with two testing frameworks. The :mod:`doctest` module finds
|
---|
| 139 | examples in the docstrings for a module and runs them, comparing the output with
|
---|
| 140 | the expected output given in the docstring.
|
---|
| 141 |
|
---|
| 142 | The :mod:`unittest` module is a fancier testing framework modelled on Java and
|
---|
| 143 | Smalltalk testing frameworks.
|
---|
| 144 |
|
---|
[391] | 145 | To make testing easier, you should use good modular design in your program.
|
---|
| 146 | Your program should have almost all functionality
|
---|
[2] | 147 | encapsulated in either functions or class methods -- and this sometimes has the
|
---|
| 148 | surprising and delightful effect of making the program run faster (because local
|
---|
| 149 | variable accesses are faster than global accesses). Furthermore the program
|
---|
| 150 | should avoid depending on mutating global variables, since this makes testing
|
---|
| 151 | much more difficult to do.
|
---|
| 152 |
|
---|
| 153 | The "global main logic" of your program may be as simple as ::
|
---|
| 154 |
|
---|
| 155 | if __name__ == "__main__":
|
---|
| 156 | main_logic()
|
---|
| 157 |
|
---|
| 158 | at the bottom of the main module of your program.
|
---|
| 159 |
|
---|
| 160 | Once your program is organized as a tractable collection of functions and class
|
---|
| 161 | behaviours you should write test functions that exercise the behaviours. A test
|
---|
[391] | 162 | suite that automates a sequence of tests can be associated with each module.
|
---|
[2] | 163 | This sounds like a lot of work, but since Python is so terse and flexible it's
|
---|
| 164 | surprisingly easy. You can make coding much more pleasant and fun by writing
|
---|
| 165 | your test functions in parallel with the "production code", since this makes it
|
---|
| 166 | easy to find bugs and even design flaws earlier.
|
---|
| 167 |
|
---|
| 168 | "Support modules" that are not intended to be the main module of a program may
|
---|
| 169 | include a self-test of the module. ::
|
---|
| 170 |
|
---|
| 171 | if __name__ == "__main__":
|
---|
| 172 | self_test()
|
---|
| 173 |
|
---|
| 174 | Even programs that interact with complex external interfaces may be tested when
|
---|
| 175 | the external interfaces are unavailable by using "fake" interfaces implemented
|
---|
| 176 | in Python.
|
---|
| 177 |
|
---|
| 178 |
|
---|
| 179 | How do I create documentation from doc strings?
|
---|
| 180 | -----------------------------------------------
|
---|
| 181 |
|
---|
| 182 | The :mod:`pydoc` module can create HTML from the doc strings in your Python
|
---|
| 183 | source code. An alternative for creating API documentation purely from
|
---|
| 184 | docstrings is `epydoc <http://epydoc.sf.net/>`_. `Sphinx
|
---|
| 185 | <http://sphinx.pocoo.org>`_ can also include docstring content.
|
---|
| 186 |
|
---|
| 187 |
|
---|
| 188 | How do I get a single keypress at a time?
|
---|
| 189 | -----------------------------------------
|
---|
| 190 |
|
---|
[391] | 191 | For Unix variants there are several solutions. It's straightforward to do this
|
---|
[2] | 192 | using curses, but curses is a fairly large module to learn. Here's a solution
|
---|
| 193 | without curses::
|
---|
| 194 |
|
---|
| 195 | import termios, fcntl, sys, os
|
---|
| 196 | fd = sys.stdin.fileno()
|
---|
| 197 |
|
---|
| 198 | oldterm = termios.tcgetattr(fd)
|
---|
| 199 | newattr = termios.tcgetattr(fd)
|
---|
| 200 | newattr[3] = newattr[3] & ~termios.ICANON & ~termios.ECHO
|
---|
| 201 | termios.tcsetattr(fd, termios.TCSANOW, newattr)
|
---|
| 202 |
|
---|
| 203 | oldflags = fcntl.fcntl(fd, fcntl.F_GETFL)
|
---|
| 204 | fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)
|
---|
| 205 |
|
---|
| 206 | try:
|
---|
| 207 | while 1:
|
---|
| 208 | try:
|
---|
| 209 | c = sys.stdin.read(1)
|
---|
[391] | 210 | print "Got character", repr(c)
|
---|
[2] | 211 | except IOError: pass
|
---|
| 212 | finally:
|
---|
| 213 | termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
|
---|
| 214 | fcntl.fcntl(fd, fcntl.F_SETFL, oldflags)
|
---|
| 215 |
|
---|
| 216 | You need the :mod:`termios` and the :mod:`fcntl` module for any of this to work,
|
---|
| 217 | and I've only tried it on Linux, though it should work elsewhere. In this code,
|
---|
| 218 | characters are read and printed one at a time.
|
---|
| 219 |
|
---|
| 220 | :func:`termios.tcsetattr` turns off stdin's echoing and disables canonical mode.
|
---|
| 221 | :func:`fcntl.fnctl` is used to obtain stdin's file descriptor flags and modify
|
---|
| 222 | them for non-blocking mode. Since reading stdin when it is empty results in an
|
---|
| 223 | :exc:`IOError`, this error is caught and ignored.
|
---|
| 224 |
|
---|
| 225 |
|
---|
| 226 | Threads
|
---|
| 227 | =======
|
---|
| 228 |
|
---|
| 229 | How do I program using threads?
|
---|
| 230 | -------------------------------
|
---|
| 231 |
|
---|
| 232 | .. XXX it's _thread in py3k
|
---|
| 233 |
|
---|
| 234 | Be sure to use the :mod:`threading` module and not the :mod:`thread` module.
|
---|
| 235 | The :mod:`threading` module builds convenient abstractions on top of the
|
---|
| 236 | low-level primitives provided by the :mod:`thread` module.
|
---|
| 237 |
|
---|
| 238 | Aahz has a set of slides from his threading tutorial that are helpful; see
|
---|
| 239 | http://www.pythoncraft.com/OSCON2001/.
|
---|
| 240 |
|
---|
| 241 |
|
---|
| 242 | None of my threads seem to run: why?
|
---|
| 243 | ------------------------------------
|
---|
| 244 |
|
---|
| 245 | As soon as the main thread exits, all threads are killed. Your main thread is
|
---|
| 246 | running too quickly, giving the threads no time to do any work.
|
---|
| 247 |
|
---|
| 248 | A simple fix is to add a sleep to the end of the program that's long enough for
|
---|
| 249 | all the threads to finish::
|
---|
| 250 |
|
---|
| 251 | import threading, time
|
---|
| 252 |
|
---|
| 253 | def thread_task(name, n):
|
---|
| 254 | for i in range(n): print name, i
|
---|
| 255 |
|
---|
| 256 | for i in range(10):
|
---|
| 257 | T = threading.Thread(target=thread_task, args=(str(i), i))
|
---|
| 258 | T.start()
|
---|
| 259 |
|
---|
| 260 | time.sleep(10) # <----------------------------!
|
---|
| 261 |
|
---|
| 262 | But now (on many platforms) the threads don't run in parallel, but appear to run
|
---|
| 263 | sequentially, one at a time! The reason is that the OS thread scheduler doesn't
|
---|
| 264 | start a new thread until the previous thread is blocked.
|
---|
| 265 |
|
---|
| 266 | A simple fix is to add a tiny sleep to the start of the run function::
|
---|
| 267 |
|
---|
| 268 | def thread_task(name, n):
|
---|
| 269 | time.sleep(0.001) # <---------------------!
|
---|
| 270 | for i in range(n): print name, i
|
---|
| 271 |
|
---|
| 272 | for i in range(10):
|
---|
| 273 | T = threading.Thread(target=thread_task, args=(str(i), i))
|
---|
| 274 | T.start()
|
---|
| 275 |
|
---|
| 276 | time.sleep(10)
|
---|
| 277 |
|
---|
[391] | 278 | Instead of trying to guess a good delay value for :func:`time.sleep`,
|
---|
[2] | 279 | it's better to use some kind of semaphore mechanism. One idea is to use the
|
---|
| 280 | :mod:`Queue` module to create a queue object, let each thread append a token to
|
---|
| 281 | the queue when it finishes, and let the main thread read as many tokens from the
|
---|
| 282 | queue as there are threads.
|
---|
| 283 |
|
---|
| 284 |
|
---|
| 285 | How do I parcel out work among a bunch of worker threads?
|
---|
| 286 | ---------------------------------------------------------
|
---|
| 287 |
|
---|
| 288 | Use the :mod:`Queue` module to create a queue containing a list of jobs. The
|
---|
[391] | 289 | :class:`~Queue.Queue` class maintains a list of objects and has a ``.put(obj)``
|
---|
| 290 | method that adds items to the queue and a ``.get()`` method to return them.
|
---|
| 291 | The class will take care of the locking necessary to ensure that each job is
|
---|
| 292 | handed out exactly once.
|
---|
[2] | 293 |
|
---|
| 294 | Here's a trivial example::
|
---|
| 295 |
|
---|
| 296 | import threading, Queue, time
|
---|
| 297 |
|
---|
| 298 | # The worker thread gets jobs off the queue. When the queue is empty, it
|
---|
| 299 | # assumes there will be no more work and exits.
|
---|
| 300 | # (Realistically workers will run until terminated.)
|
---|
[391] | 301 | def worker():
|
---|
[2] | 302 | print 'Running worker'
|
---|
| 303 | time.sleep(0.1)
|
---|
| 304 | while True:
|
---|
| 305 | try:
|
---|
| 306 | arg = q.get(block=False)
|
---|
| 307 | except Queue.Empty:
|
---|
| 308 | print 'Worker', threading.currentThread(),
|
---|
| 309 | print 'queue empty'
|
---|
| 310 | break
|
---|
| 311 | else:
|
---|
| 312 | print 'Worker', threading.currentThread(),
|
---|
| 313 | print 'running with argument', arg
|
---|
| 314 | time.sleep(0.5)
|
---|
| 315 |
|
---|
| 316 | # Create queue
|
---|
| 317 | q = Queue.Queue()
|
---|
| 318 |
|
---|
| 319 | # Start a pool of 5 workers
|
---|
| 320 | for i in range(5):
|
---|
| 321 | t = threading.Thread(target=worker, name='worker %i' % (i+1))
|
---|
| 322 | t.start()
|
---|
| 323 |
|
---|
| 324 | # Begin adding work to the queue
|
---|
| 325 | for i in range(50):
|
---|
| 326 | q.put(i)
|
---|
| 327 |
|
---|
| 328 | # Give threads time to run
|
---|
| 329 | print 'Main thread sleeping'
|
---|
| 330 | time.sleep(5)
|
---|
| 331 |
|
---|
| 332 | When run, this will produce the following output:
|
---|
| 333 |
|
---|
[391] | 334 | .. code-block:: none
|
---|
| 335 |
|
---|
[2] | 336 | Running worker
|
---|
| 337 | Running worker
|
---|
| 338 | Running worker
|
---|
| 339 | Running worker
|
---|
| 340 | Running worker
|
---|
| 341 | Main thread sleeping
|
---|
| 342 | Worker <Thread(worker 1, started)> running with argument 0
|
---|
| 343 | Worker <Thread(worker 2, started)> running with argument 1
|
---|
| 344 | Worker <Thread(worker 3, started)> running with argument 2
|
---|
| 345 | Worker <Thread(worker 4, started)> running with argument 3
|
---|
| 346 | Worker <Thread(worker 5, started)> running with argument 4
|
---|
| 347 | Worker <Thread(worker 1, started)> running with argument 5
|
---|
| 348 | ...
|
---|
| 349 |
|
---|
[391] | 350 | Consult the module's documentation for more details; the :class:`~Queue.Queue`
|
---|
| 351 | class provides a featureful interface.
|
---|
[2] | 352 |
|
---|
| 353 |
|
---|
| 354 | What kinds of global value mutation are thread-safe?
|
---|
| 355 | ----------------------------------------------------
|
---|
| 356 |
|
---|
[391] | 357 | A :term:`global interpreter lock` (GIL) is used internally to ensure that only
|
---|
| 358 | one thread runs in the Python VM at a time. In general, Python offers to switch
|
---|
[2] | 359 | among threads only between bytecode instructions; how frequently it switches can
|
---|
| 360 | be set via :func:`sys.setcheckinterval`. Each bytecode instruction and
|
---|
| 361 | therefore all the C implementation code reached from each instruction is
|
---|
| 362 | therefore atomic from the point of view of a Python program.
|
---|
| 363 |
|
---|
| 364 | In theory, this means an exact accounting requires an exact understanding of the
|
---|
| 365 | PVM bytecode implementation. In practice, it means that operations on shared
|
---|
| 366 | variables of built-in data types (ints, lists, dicts, etc) that "look atomic"
|
---|
| 367 | really are.
|
---|
| 368 |
|
---|
| 369 | For example, the following operations are all atomic (L, L1, L2 are lists, D,
|
---|
| 370 | D1, D2 are dicts, x, y are objects, i, j are ints)::
|
---|
| 371 |
|
---|
| 372 | L.append(x)
|
---|
| 373 | L1.extend(L2)
|
---|
| 374 | x = L[i]
|
---|
| 375 | x = L.pop()
|
---|
| 376 | L1[i:j] = L2
|
---|
| 377 | L.sort()
|
---|
| 378 | x = y
|
---|
| 379 | x.field = y
|
---|
| 380 | D[x] = y
|
---|
| 381 | D1.update(D2)
|
---|
| 382 | D.keys()
|
---|
| 383 |
|
---|
| 384 | These aren't::
|
---|
| 385 |
|
---|
| 386 | i = i+1
|
---|
| 387 | L.append(L[-1])
|
---|
| 388 | L[i] = L[j]
|
---|
| 389 | D[x] = D[x] + 1
|
---|
| 390 |
|
---|
| 391 | Operations that replace other objects may invoke those other objects'
|
---|
| 392 | :meth:`__del__` method when their reference count reaches zero, and that can
|
---|
| 393 | affect things. This is especially true for the mass updates to dictionaries and
|
---|
| 394 | lists. When in doubt, use a mutex!
|
---|
| 395 |
|
---|
| 396 |
|
---|
| 397 | Can't we get rid of the Global Interpreter Lock?
|
---|
| 398 | ------------------------------------------------
|
---|
| 399 |
|
---|
| 400 | .. XXX mention multiprocessing
|
---|
| 401 | .. XXX link to dbeazley's talk about GIL?
|
---|
| 402 |
|
---|
[391] | 403 | The :term:`global interpreter lock` (GIL) is often seen as a hindrance to Python's
|
---|
[2] | 404 | deployment on high-end multiprocessor server machines, because a multi-threaded
|
---|
| 405 | Python program effectively only uses one CPU, due to the insistence that
|
---|
| 406 | (almost) all Python code can only run while the GIL is held.
|
---|
| 407 |
|
---|
| 408 | Back in the days of Python 1.5, Greg Stein actually implemented a comprehensive
|
---|
| 409 | patch set (the "free threading" patches) that removed the GIL and replaced it
|
---|
| 410 | with fine-grained locking. Unfortunately, even on Windows (where locks are very
|
---|
| 411 | efficient) this ran ordinary Python code about twice as slow as the interpreter
|
---|
| 412 | using the GIL. On Linux the performance loss was even worse because pthread
|
---|
| 413 | locks aren't as efficient.
|
---|
| 414 |
|
---|
| 415 | Since then, the idea of getting rid of the GIL has occasionally come up but
|
---|
| 416 | nobody has found a way to deal with the expected slowdown, and users who don't
|
---|
[391] | 417 | use threads would not be happy if their code ran at half the speed. Greg's
|
---|
[2] | 418 | free threading patch set has not been kept up-to-date for later Python versions.
|
---|
| 419 |
|
---|
| 420 | This doesn't mean that you can't make good use of Python on multi-CPU machines!
|
---|
| 421 | You just have to be creative with dividing the work up between multiple
|
---|
| 422 | *processes* rather than multiple *threads*. Judicious use of C extensions will
|
---|
| 423 | also help; if you use a C extension to perform a time-consuming task, the
|
---|
| 424 | extension can release the GIL while the thread of execution is in the C code and
|
---|
| 425 | allow other threads to get some work done.
|
---|
| 426 |
|
---|
| 427 | It has been suggested that the GIL should be a per-interpreter-state lock rather
|
---|
| 428 | than truly global; interpreters then wouldn't be able to share objects.
|
---|
| 429 | Unfortunately, this isn't likely to happen either. It would be a tremendous
|
---|
| 430 | amount of work, because many object implementations currently have global state.
|
---|
| 431 | For example, small integers and short strings are cached; these caches would
|
---|
| 432 | have to be moved to the interpreter state. Other object types have their own
|
---|
| 433 | free list; these free lists would have to be moved to the interpreter state.
|
---|
| 434 | And so on.
|
---|
| 435 |
|
---|
| 436 | And I doubt that it can even be done in finite time, because the same problem
|
---|
| 437 | exists for 3rd party extensions. It is likely that 3rd party extensions are
|
---|
| 438 | being written at a faster rate than you can convert them to store all their
|
---|
| 439 | global state in the interpreter state.
|
---|
| 440 |
|
---|
| 441 | And finally, once you have multiple interpreters not sharing any state, what
|
---|
| 442 | have you gained over running each interpreter in a separate process?
|
---|
| 443 |
|
---|
| 444 |
|
---|
| 445 | Input and Output
|
---|
| 446 | ================
|
---|
| 447 |
|
---|
| 448 | How do I delete a file? (And other file questions...)
|
---|
| 449 | -----------------------------------------------------
|
---|
| 450 |
|
---|
| 451 | Use ``os.remove(filename)`` or ``os.unlink(filename)``; for documentation, see
|
---|
| 452 | the :mod:`os` module. The two functions are identical; :func:`unlink` is simply
|
---|
| 453 | the name of the Unix system call for this function.
|
---|
| 454 |
|
---|
| 455 | To remove a directory, use :func:`os.rmdir`; use :func:`os.mkdir` to create one.
|
---|
| 456 | ``os.makedirs(path)`` will create any intermediate directories in ``path`` that
|
---|
| 457 | don't exist. ``os.removedirs(path)`` will remove intermediate directories as
|
---|
| 458 | long as they're empty; if you want to delete an entire directory tree and its
|
---|
| 459 | contents, use :func:`shutil.rmtree`.
|
---|
| 460 |
|
---|
| 461 | To rename a file, use ``os.rename(old_path, new_path)``.
|
---|
| 462 |
|
---|
| 463 | To truncate a file, open it using ``f = open(filename, "r+")``, and use
|
---|
| 464 | ``f.truncate(offset)``; offset defaults to the current seek position. There's
|
---|
[391] | 465 | also ``os.ftruncate(fd, offset)`` for files opened with :func:`os.open`, where
|
---|
| 466 | *fd* is the file descriptor (a small integer).
|
---|
[2] | 467 |
|
---|
| 468 | The :mod:`shutil` module also contains a number of functions to work on files
|
---|
| 469 | including :func:`~shutil.copyfile`, :func:`~shutil.copytree`, and
|
---|
| 470 | :func:`~shutil.rmtree`.
|
---|
| 471 |
|
---|
| 472 |
|
---|
| 473 | How do I copy a file?
|
---|
| 474 | ---------------------
|
---|
| 475 |
|
---|
| 476 | The :mod:`shutil` module contains a :func:`~shutil.copyfile` function. Note
|
---|
| 477 | that on MacOS 9 it doesn't copy the resource fork and Finder info.
|
---|
| 478 |
|
---|
| 479 |
|
---|
| 480 | How do I read (or write) binary data?
|
---|
| 481 | -------------------------------------
|
---|
| 482 |
|
---|
| 483 | To read or write complex binary data formats, it's best to use the :mod:`struct`
|
---|
| 484 | module. It allows you to take a string containing binary data (usually numbers)
|
---|
| 485 | and convert it to Python objects; and vice versa.
|
---|
| 486 |
|
---|
| 487 | For example, the following code reads two 2-byte integers and one 4-byte integer
|
---|
| 488 | in big-endian format from a file::
|
---|
| 489 |
|
---|
| 490 | import struct
|
---|
| 491 |
|
---|
| 492 | f = open(filename, "rb") # Open in binary mode for portability
|
---|
| 493 | s = f.read(8)
|
---|
| 494 | x, y, z = struct.unpack(">hhl", s)
|
---|
| 495 |
|
---|
| 496 | The '>' in the format string forces big-endian data; the letter 'h' reads one
|
---|
| 497 | "short integer" (2 bytes), and 'l' reads one "long integer" (4 bytes) from the
|
---|
| 498 | string.
|
---|
| 499 |
|
---|
[391] | 500 | For data that is more regular (e.g. a homogeneous list of ints or floats),
|
---|
[2] | 501 | you can also use the :mod:`array` module.
|
---|
| 502 |
|
---|
| 503 |
|
---|
| 504 | I can't seem to use os.read() on a pipe created with os.popen(); why?
|
---|
| 505 | ---------------------------------------------------------------------
|
---|
| 506 |
|
---|
| 507 | :func:`os.read` is a low-level function which takes a file descriptor, a small
|
---|
| 508 | integer representing the opened file. :func:`os.popen` creates a high-level
|
---|
| 509 | file object, the same type returned by the built-in :func:`open` function.
|
---|
[391] | 510 | Thus, to read *n* bytes from a pipe *p* created with :func:`os.popen`, you need to
|
---|
[2] | 511 | use ``p.read(n)``.
|
---|
| 512 |
|
---|
| 513 |
|
---|
| 514 | How do I run a subprocess with pipes connected to both input and output?
|
---|
| 515 | ------------------------------------------------------------------------
|
---|
| 516 |
|
---|
| 517 | .. XXX update to use subprocess
|
---|
| 518 |
|
---|
| 519 | Use the :mod:`popen2` module. For example::
|
---|
| 520 |
|
---|
| 521 | import popen2
|
---|
| 522 | fromchild, tochild = popen2.popen2("command")
|
---|
| 523 | tochild.write("input\n")
|
---|
| 524 | tochild.flush()
|
---|
| 525 | output = fromchild.readline()
|
---|
| 526 |
|
---|
| 527 | Warning: in general it is unwise to do this because you can easily cause a
|
---|
| 528 | deadlock where your process is blocked waiting for output from the child while
|
---|
[391] | 529 | the child is blocked waiting for input from you. This can be caused by the
|
---|
| 530 | parent expecting the child to output more text than it does or by data being
|
---|
| 531 | stuck in stdio buffers due to lack of flushing. The Python parent
|
---|
[2] | 532 | can of course explicitly flush the data it sends to the child before it reads
|
---|
| 533 | any output, but if the child is a naive C program it may have been written to
|
---|
| 534 | never explicitly flush its output, even if it is interactive, since flushing is
|
---|
| 535 | normally automatic.
|
---|
| 536 |
|
---|
| 537 | Note that a deadlock is also possible if you use :func:`popen3` to read stdout
|
---|
| 538 | and stderr. If one of the two is too large for the internal buffer (increasing
|
---|
| 539 | the buffer size does not help) and you ``read()`` the other one first, there is
|
---|
| 540 | a deadlock, too.
|
---|
| 541 |
|
---|
| 542 | Note on a bug in popen2: unless your program calls ``wait()`` or ``waitpid()``,
|
---|
| 543 | finished child processes are never removed, and eventually calls to popen2 will
|
---|
| 544 | fail because of a limit on the number of child processes. Calling
|
---|
| 545 | :func:`os.waitpid` with the :data:`os.WNOHANG` option can prevent this; a good
|
---|
| 546 | place to insert such a call would be before calling ``popen2`` again.
|
---|
| 547 |
|
---|
| 548 | In many cases, all you really need is to run some data through a command and get
|
---|
| 549 | the result back. Unless the amount of data is very large, the easiest way to do
|
---|
| 550 | this is to write it to a temporary file and run the command with that temporary
|
---|
[391] | 551 | file as input. The standard module :mod:`tempfile` exports a
|
---|
| 552 | :func:`~tempfile.mktemp` function to generate unique temporary file names. ::
|
---|
[2] | 553 |
|
---|
| 554 | import tempfile
|
---|
| 555 | import os
|
---|
| 556 |
|
---|
| 557 | class Popen3:
|
---|
| 558 | """
|
---|
| 559 | This is a deadlock-safe version of popen that returns
|
---|
| 560 | an object with errorlevel, out (a string) and err (a string).
|
---|
| 561 | (capturestderr may not work under windows.)
|
---|
| 562 | Example: print Popen3('grep spam','\n\nhere spam\n\n').out
|
---|
| 563 | """
|
---|
| 564 | def __init__(self,command,input=None,capturestderr=None):
|
---|
| 565 | outfile=tempfile.mktemp()
|
---|
| 566 | command="( %s ) > %s" % (command,outfile)
|
---|
| 567 | if input:
|
---|
| 568 | infile=tempfile.mktemp()
|
---|
| 569 | open(infile,"w").write(input)
|
---|
| 570 | command=command+" <"+infile
|
---|
| 571 | if capturestderr:
|
---|
| 572 | errfile=tempfile.mktemp()
|
---|
| 573 | command=command+" 2>"+errfile
|
---|
| 574 | self.errorlevel=os.system(command) >> 8
|
---|
| 575 | self.out=open(outfile,"r").read()
|
---|
| 576 | os.remove(outfile)
|
---|
| 577 | if input:
|
---|
| 578 | os.remove(infile)
|
---|
| 579 | if capturestderr:
|
---|
| 580 | self.err=open(errfile,"r").read()
|
---|
| 581 | os.remove(errfile)
|
---|
| 582 |
|
---|
| 583 | Note that many interactive programs (e.g. vi) don't work well with pipes
|
---|
| 584 | substituted for standard input and output. You will have to use pseudo ttys
|
---|
| 585 | ("ptys") instead of pipes. Or you can use a Python interface to Don Libes'
|
---|
| 586 | "expect" library. A Python extension that interfaces to expect is called "expy"
|
---|
| 587 | and available from http://expectpy.sourceforge.net. A pure Python solution that
|
---|
| 588 | works like expect is `pexpect <http://pypi.python.org/pypi/pexpect/>`_.
|
---|
| 589 |
|
---|
| 590 |
|
---|
| 591 | How do I access the serial (RS232) port?
|
---|
| 592 | ----------------------------------------
|
---|
| 593 |
|
---|
| 594 | For Win32, POSIX (Linux, BSD, etc.), Jython:
|
---|
| 595 |
|
---|
| 596 | http://pyserial.sourceforge.net
|
---|
| 597 |
|
---|
| 598 | For Unix, see a Usenet post by Mitch Chapman:
|
---|
| 599 |
|
---|
| 600 | http://groups.google.com/groups?selm=34A04430.CF9@ohioee.com
|
---|
| 601 |
|
---|
| 602 |
|
---|
| 603 | Why doesn't closing sys.stdout (stdin, stderr) really close it?
|
---|
| 604 | ---------------------------------------------------------------
|
---|
| 605 |
|
---|
| 606 | Python file objects are a high-level layer of abstraction on top of C streams,
|
---|
| 607 | which in turn are a medium-level layer of abstraction on top of (among other
|
---|
| 608 | things) low-level C file descriptors.
|
---|
| 609 |
|
---|
| 610 | For most file objects you create in Python via the built-in ``file``
|
---|
| 611 | constructor, ``f.close()`` marks the Python file object as being closed from
|
---|
| 612 | Python's point of view, and also arranges to close the underlying C stream.
|
---|
| 613 | This also happens automatically in ``f``'s destructor, when ``f`` becomes
|
---|
| 614 | garbage.
|
---|
| 615 |
|
---|
| 616 | But stdin, stdout and stderr are treated specially by Python, because of the
|
---|
| 617 | special status also given to them by C. Running ``sys.stdout.close()`` marks
|
---|
| 618 | the Python-level file object as being closed, but does *not* close the
|
---|
| 619 | associated C stream.
|
---|
| 620 |
|
---|
| 621 | To close the underlying C stream for one of these three, you should first be
|
---|
| 622 | sure that's what you really want to do (e.g., you may confuse extension modules
|
---|
| 623 | trying to do I/O). If it is, use os.close::
|
---|
| 624 |
|
---|
| 625 | os.close(0) # close C's stdin stream
|
---|
| 626 | os.close(1) # close C's stdout stream
|
---|
| 627 | os.close(2) # close C's stderr stream
|
---|
| 628 |
|
---|
| 629 |
|
---|
| 630 | Network/Internet Programming
|
---|
| 631 | ============================
|
---|
| 632 |
|
---|
| 633 | What WWW tools are there for Python?
|
---|
| 634 | ------------------------------------
|
---|
| 635 |
|
---|
| 636 | See the chapters titled :ref:`internet` and :ref:`netdata` in the Library
|
---|
| 637 | Reference Manual. Python has many modules that will help you build server-side
|
---|
| 638 | and client-side web systems.
|
---|
| 639 |
|
---|
| 640 | .. XXX check if wiki page is still up to date
|
---|
| 641 |
|
---|
| 642 | A summary of available frameworks is maintained by Paul Boddie at
|
---|
| 643 | http://wiki.python.org/moin/WebProgramming .
|
---|
| 644 |
|
---|
| 645 | Cameron Laird maintains a useful set of pages about Python web technologies at
|
---|
| 646 | http://phaseit.net/claird/comp.lang.python/web_python.
|
---|
| 647 |
|
---|
| 648 |
|
---|
| 649 | How can I mimic CGI form submission (METHOD=POST)?
|
---|
| 650 | --------------------------------------------------
|
---|
| 651 |
|
---|
| 652 | I would like to retrieve web pages that are the result of POSTing a form. Is
|
---|
| 653 | there existing code that would let me do this easily?
|
---|
| 654 |
|
---|
| 655 | Yes. Here's a simple example that uses httplib::
|
---|
| 656 |
|
---|
| 657 | #!/usr/local/bin/python
|
---|
| 658 |
|
---|
| 659 | import httplib, sys, time
|
---|
| 660 |
|
---|
| 661 | ### build the query string
|
---|
| 662 | qs = "First=Josephine&MI=Q&Last=Public"
|
---|
| 663 |
|
---|
| 664 | ### connect and send the server a path
|
---|
| 665 | httpobj = httplib.HTTP('www.some-server.out-there', 80)
|
---|
| 666 | httpobj.putrequest('POST', '/cgi-bin/some-cgi-script')
|
---|
| 667 | ### now generate the rest of the HTTP headers...
|
---|
| 668 | httpobj.putheader('Accept', '*/*')
|
---|
| 669 | httpobj.putheader('Connection', 'Keep-Alive')
|
---|
| 670 | httpobj.putheader('Content-type', 'application/x-www-form-urlencoded')
|
---|
| 671 | httpobj.putheader('Content-length', '%d' % len(qs))
|
---|
| 672 | httpobj.endheaders()
|
---|
| 673 | httpobj.send(qs)
|
---|
| 674 | ### find out what the server said in response...
|
---|
| 675 | reply, msg, hdrs = httpobj.getreply()
|
---|
| 676 | if reply != 200:
|
---|
| 677 | sys.stdout.write(httpobj.getfile().read())
|
---|
| 678 |
|
---|
[391] | 679 | Note that in general for percent-encoded POST operations, query strings must be
|
---|
| 680 | quoted using :func:`urllib.urlencode`. For example, to send
|
---|
| 681 | ``name=Guy Steele, Jr.``::
|
---|
[2] | 682 |
|
---|
[391] | 683 | >>> import urllib
|
---|
| 684 | >>> urllib.urlencode({'name': 'Guy Steele, Jr.'})
|
---|
| 685 | 'name=Guy+Steele%2C+Jr.'
|
---|
[2] | 686 |
|
---|
| 687 |
|
---|
| 688 | What module should I use to help with generating HTML?
|
---|
| 689 | ------------------------------------------------------
|
---|
| 690 |
|
---|
| 691 | .. XXX add modern template languages
|
---|
| 692 |
|
---|
[391] | 693 | You can find a collection of useful links on the `Web Programming wiki page
|
---|
| 694 | <http://wiki.python.org/moin/WebProgramming>`_.
|
---|
[2] | 695 |
|
---|
| 696 |
|
---|
| 697 | How do I send mail from a Python script?
|
---|
| 698 | ----------------------------------------
|
---|
| 699 |
|
---|
| 700 | Use the standard library module :mod:`smtplib`.
|
---|
| 701 |
|
---|
| 702 | Here's a very simple interactive mail sender that uses it. This method will
|
---|
| 703 | work on any host that supports an SMTP listener. ::
|
---|
| 704 |
|
---|
| 705 | import sys, smtplib
|
---|
| 706 |
|
---|
| 707 | fromaddr = raw_input("From: ")
|
---|
| 708 | toaddrs = raw_input("To: ").split(',')
|
---|
| 709 | print "Enter message, end with ^D:"
|
---|
| 710 | msg = ''
|
---|
| 711 | while True:
|
---|
| 712 | line = sys.stdin.readline()
|
---|
| 713 | if not line:
|
---|
| 714 | break
|
---|
| 715 | msg += line
|
---|
| 716 |
|
---|
| 717 | # The actual mail send
|
---|
| 718 | server = smtplib.SMTP('localhost')
|
---|
| 719 | server.sendmail(fromaddr, toaddrs, msg)
|
---|
| 720 | server.quit()
|
---|
| 721 |
|
---|
| 722 | A Unix-only alternative uses sendmail. The location of the sendmail program
|
---|
[391] | 723 | varies between systems; sometimes it is ``/usr/lib/sendmail``, sometimes
|
---|
[2] | 724 | ``/usr/sbin/sendmail``. The sendmail manual page will help you out. Here's
|
---|
| 725 | some sample code::
|
---|
| 726 |
|
---|
| 727 | SENDMAIL = "/usr/sbin/sendmail" # sendmail location
|
---|
| 728 | import os
|
---|
| 729 | p = os.popen("%s -t -i" % SENDMAIL, "w")
|
---|
| 730 | p.write("To: receiver@example.com\n")
|
---|
| 731 | p.write("Subject: test\n")
|
---|
| 732 | p.write("\n") # blank line separating headers from body
|
---|
| 733 | p.write("Some text\n")
|
---|
| 734 | p.write("some more text\n")
|
---|
| 735 | sts = p.close()
|
---|
| 736 | if sts != 0:
|
---|
| 737 | print "Sendmail exit status", sts
|
---|
| 738 |
|
---|
| 739 |
|
---|
| 740 | How do I avoid blocking in the connect() method of a socket?
|
---|
| 741 | ------------------------------------------------------------
|
---|
| 742 |
|
---|
| 743 | The select module is commonly used to help with asynchronous I/O on sockets.
|
---|
| 744 |
|
---|
| 745 | To prevent the TCP connect from blocking, you can set the socket to non-blocking
|
---|
| 746 | mode. Then when you do the ``connect()``, you will either connect immediately
|
---|
| 747 | (unlikely) or get an exception that contains the error number as ``.errno``.
|
---|
| 748 | ``errno.EINPROGRESS`` indicates that the connection is in progress, but hasn't
|
---|
| 749 | finished yet. Different OSes will return different values, so you're going to
|
---|
| 750 | have to check what's returned on your system.
|
---|
| 751 |
|
---|
| 752 | You can use the ``connect_ex()`` method to avoid creating an exception. It will
|
---|
| 753 | just return the errno value. To poll, you can call ``connect_ex()`` again later
|
---|
| 754 | -- 0 or ``errno.EISCONN`` indicate that you're connected -- or you can pass this
|
---|
| 755 | socket to select to check if it's writable.
|
---|
| 756 |
|
---|
| 757 |
|
---|
| 758 | Databases
|
---|
| 759 | =========
|
---|
| 760 |
|
---|
| 761 | Are there any interfaces to database packages in Python?
|
---|
| 762 | --------------------------------------------------------
|
---|
| 763 |
|
---|
| 764 | Yes.
|
---|
| 765 |
|
---|
| 766 | .. XXX remove bsddb in py3k, fix other module names
|
---|
| 767 |
|
---|
| 768 | Python 2.3 includes the :mod:`bsddb` package which provides an interface to the
|
---|
| 769 | BerkeleyDB library. Interfaces to disk-based hashes such as :mod:`DBM <dbm>`
|
---|
| 770 | and :mod:`GDBM <gdbm>` are also included with standard Python.
|
---|
| 771 |
|
---|
| 772 | Support for most relational databases is available. See the
|
---|
| 773 | `DatabaseProgramming wiki page
|
---|
| 774 | <http://wiki.python.org/moin/DatabaseProgramming>`_ for details.
|
---|
| 775 |
|
---|
| 776 |
|
---|
| 777 | How do you implement persistent objects in Python?
|
---|
| 778 | --------------------------------------------------
|
---|
| 779 |
|
---|
| 780 | The :mod:`pickle` library module solves this in a very general way (though you
|
---|
| 781 | still can't store things like open files, sockets or windows), and the
|
---|
| 782 | :mod:`shelve` library module uses pickle and (g)dbm to create persistent
|
---|
| 783 | mappings containing arbitrary Python objects. For better performance, you can
|
---|
| 784 | use the :mod:`cPickle` module.
|
---|
| 785 |
|
---|
| 786 | A more awkward way of doing things is to use pickle's little sister, marshal.
|
---|
| 787 | The :mod:`marshal` module provides very fast ways to store noncircular basic
|
---|
| 788 | Python types to files and strings, and back again. Although marshal does not do
|
---|
| 789 | fancy things like store instances or handle shared references properly, it does
|
---|
[391] | 790 | run extremely fast. For example, loading a half megabyte of data may take less
|
---|
[2] | 791 | than a third of a second. This often beats doing something more complex and
|
---|
| 792 | general such as using gdbm with pickle/shelve.
|
---|
| 793 |
|
---|
| 794 |
|
---|
| 795 | Why is cPickle so slow?
|
---|
| 796 | -----------------------
|
---|
| 797 |
|
---|
| 798 | .. XXX update this, default protocol is 2/3
|
---|
| 799 |
|
---|
[391] | 800 | By default :mod:`pickle` uses a relatively old and slow format for backward
|
---|
| 801 | compatibility. You can however specify other protocol versions that are
|
---|
| 802 | faster::
|
---|
[2] | 803 |
|
---|
| 804 | largeString = 'z' * (100 * 1024)
|
---|
| 805 | myPickle = cPickle.dumps(largeString, protocol=1)
|
---|
| 806 |
|
---|
| 807 |
|
---|
| 808 | If my program crashes with a bsddb (or anydbm) database open, it gets corrupted. How come?
|
---|
| 809 | ------------------------------------------------------------------------------------------
|
---|
| 810 |
|
---|
| 811 | Databases opened for write access with the bsddb module (and often by the anydbm
|
---|
| 812 | module, since it will preferentially use bsddb) must explicitly be closed using
|
---|
| 813 | the ``.close()`` method of the database. The underlying library caches database
|
---|
| 814 | contents which need to be converted to on-disk form and written.
|
---|
| 815 |
|
---|
| 816 | If you have initialized a new bsddb database but not written anything to it
|
---|
| 817 | before the program crashes, you will often wind up with a zero-length file and
|
---|
| 818 | encounter an exception the next time the file is opened.
|
---|
| 819 |
|
---|
| 820 |
|
---|
| 821 | I tried to open Berkeley DB file, but bsddb produces bsddb.error: (22, 'Invalid argument'). Help! How can I restore my data?
|
---|
| 822 | ----------------------------------------------------------------------------------------------------------------------------
|
---|
| 823 |
|
---|
| 824 | Don't panic! Your data is probably intact. The most frequent cause for the error
|
---|
| 825 | is that you tried to open an earlier Berkeley DB file with a later version of
|
---|
| 826 | the Berkeley DB library.
|
---|
| 827 |
|
---|
| 828 | Many Linux systems now have all three versions of Berkeley DB available. If you
|
---|
| 829 | are migrating from version 1 to a newer version use db_dump185 to dump a plain
|
---|
| 830 | text version of the database. If you are migrating from version 2 to version 3
|
---|
| 831 | use db2_dump to create a plain text version of the database. In either case,
|
---|
| 832 | use db_load to create a new native database for the latest version installed on
|
---|
| 833 | your computer. If you have version 3 of Berkeley DB installed, you should be
|
---|
| 834 | able to use db2_load to create a native version 2 database.
|
---|
| 835 |
|
---|
| 836 | You should move away from Berkeley DB version 1 files because the hash file code
|
---|
| 837 | contains known bugs that can corrupt your data.
|
---|
| 838 |
|
---|
| 839 |
|
---|
| 840 | Mathematics and Numerics
|
---|
| 841 | ========================
|
---|
| 842 |
|
---|
| 843 | How do I generate random numbers in Python?
|
---|
| 844 | -------------------------------------------
|
---|
| 845 |
|
---|
| 846 | The standard module :mod:`random` implements a random number generator. Usage
|
---|
| 847 | is simple::
|
---|
| 848 |
|
---|
| 849 | import random
|
---|
| 850 | random.random()
|
---|
| 851 |
|
---|
| 852 | This returns a random floating point number in the range [0, 1).
|
---|
| 853 |
|
---|
| 854 | There are also many other specialized generators in this module, such as:
|
---|
| 855 |
|
---|
| 856 | * ``randrange(a, b)`` chooses an integer in the range [a, b).
|
---|
| 857 | * ``uniform(a, b)`` chooses a floating point number in the range [a, b).
|
---|
| 858 | * ``normalvariate(mean, sdev)`` samples the normal (Gaussian) distribution.
|
---|
| 859 |
|
---|
| 860 | Some higher-level functions operate on sequences directly, such as:
|
---|
| 861 |
|
---|
| 862 | * ``choice(S)`` chooses random element from a given sequence
|
---|
| 863 | * ``shuffle(L)`` shuffles a list in-place, i.e. permutes it randomly
|
---|
| 864 |
|
---|
| 865 | There's also a ``Random`` class you can instantiate to create independent
|
---|
| 866 | multiple random number generators.
|
---|