Context Navigation

libcgi.tex

Visit:

Last change on this file was 3225, checked in by bird, 18 years ago
Python 2.5
File size: 23.8 KB

Line
1	\section{\module{cgi} ---
2	Common Gateway Interface support.}
3	\declaremodule{standard}{cgi}
4
5	\modulesynopsis{Common Gateway Interface support, used to interpret
6	forms in server-side scripts.}
7
8	\indexii{WWW}{server}
9	\indexii{CGI}{protocol}
10	\indexii{HTTP}{protocol}
11	\indexii{MIME}{headers}
12	\index{URL}
13
14
15	Support module for Common Gateway Interface (CGI) scripts.%
16	\index{Common Gateway Interface}
17
18	This module defines a number of utilities for use by CGI scripts
19	written in Python.
20
21	\subsection{Introduction}
22	\nodename{cgi-intro}
23
24	A CGI script is invoked by an HTTP server, usually to process user
25	input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element.
26
27	Most often, CGI scripts live in the server's special \file{cgi-bin}
28	directory. The HTTP server places all sorts of information about the
29	request (such as the client's hostname, the requested URL, the query
30	string, and lots of other goodies) in the script's shell environment,
31	executes the script, and sends the script's output back to the client.
32
33	The script's input is connected to the client too, and sometimes the
34	form data is read this way; at other times the form data is passed via
35	the ``query string'' part of the URL. This module is intended
36	to take care of the different cases and provide a simpler interface to
37	the Python script. It also provides a number of utilities that help
38	in debugging scripts, and the latest addition is support for file
39	uploads from a form (if your browser supports it).
40
41	The output of a CGI script should consist of two sections, separated
42	by a blank line. The first section contains a number of headers,
43	telling the client what kind of data is following. Python code to
44	generate a minimal header section looks like this:
45
46	\begin{verbatim}
47	print "Content-Type: text/html" # HTML is following
48	print # blank line, end of headers
49	\end{verbatim}
50
51	The second section is usually HTML, which allows the client software
52	to display nicely formatted text with header, in-line images, etc.
53	Here's Python code that prints a simple piece of HTML:
54
55	\begin{verbatim}
56	print "<TITLE>CGI script output</TITLE>"
57	print "<H1>This is my first CGI script</H1>"
58	print "Hello, world!"
59	\end{verbatim}
60
61	\subsection{Using the cgi module}
62	\nodename{Using the cgi module}
63
64	Begin by writing \samp{import cgi}. Do not use \samp{from cgi import
65	*} --- the module defines all sorts of names for its own use or for
66	backward compatibility that you don't want in your namespace.
67
68	When you write a new script, consider adding the line:
69
70	\begin{verbatim}
71	import cgitb; cgitb.enable()
72	\end{verbatim}
73
74	This activates a special exception handler that will display detailed
75	reports in the Web browser if any errors occur. If you'd rather not
76	show the guts of your program to users of your script, you can have
77	the reports saved to files instead, with a line like this:
78
79	\begin{verbatim}
80	import cgitb; cgitb.enable(display=0, logdir="/tmp")
81	\end{verbatim}
82
83	It's very helpful to use this feature during script development.
84	The reports produced by \refmodule{cgitb} provide information that
85	can save you a lot of time in tracking down bugs. You can always
86	remove the \code{cgitb} line later when you have tested your script
87	and are confident that it works correctly.
88
89	To get at submitted form data,
90	it's best to use the \class{FieldStorage} class. The other classes
91	defined in this module are provided mostly for backward compatibility.
92	Instantiate it exactly once, without arguments. This reads the form
93	contents from standard input or the environment (depending on the
94	value of various environment variables set according to the CGI
95	standard). Since it may consume standard input, it should be
96	instantiated only once.
97
98	The \class{FieldStorage} instance can be indexed like a Python
99	dictionary, and also supports the standard dictionary methods
100	\method{has_key()} and \method{keys()}. The built-in \function{len()}
101	is also supported. Form fields containing empty strings are ignored
102	and do not appear in the dictionary; to keep such values, provide
103	a true value for the optional \var{keep_blank_values} keyword
104	parameter when creating the \class{FieldStorage} instance.
105
106	For instance, the following code (which assumes that the
107	\mailheader{Content-Type} header and blank line have already been
108	printed) checks that the fields \code{name} and \code{addr} are both
109	set to a non-empty string:
110
111	\begin{verbatim}
112	form = cgi.FieldStorage()
113	if not (form.has_key("name") and form.has_key("addr")):
114	print "<H1>Error</H1>"
115	print "Please fill in the name and addr fields."
116	return
117	print "<p>name:", form["name"].value
118	print "<p>addr:", form["addr"].value
119	...further form processing here...
120	\end{verbatim}
121
122	Here the fields, accessed through \samp{form[\var{key}]}, are
123	themselves instances of \class{FieldStorage} (or
124	\class{MiniFieldStorage}, depending on the form encoding).
125	The \member{value} attribute of the instance yields the string value
126	of the field. The \method{getvalue()} method returns this string value
127	directly; it also accepts an optional second argument as a default to
128	return if the requested key is not present.
129
130	If the submitted form data contains more than one field with the same
131	name, the object retrieved by \samp{form[\var{key}]} is not a
132	\class{FieldStorage} or \class{MiniFieldStorage}
133	instance but a list of such instances. Similarly, in this situation,
134	\samp{form.getvalue(\var{key})} would return a list of strings.
135	If you expect this possibility
136	(when your HTML form contains multiple fields with the same name), use
137	the \function{getlist()} function, which always returns a list of values (so that you
138	do not need to special-case the single item case). For example, this
139	code concatenates any number of username fields, separated by
140	commas:
141
142	\begin{verbatim}
143	value = form.getlist("username")
144	usernames = ",".join(value)
145	\end{verbatim}
146
147	If a field represents an uploaded file, accessing the value via the
148	\member{value} attribute or the \function{getvalue()} method reads the
149	entire file in memory as a string. This may not be what you want.
150	You can test for an uploaded file by testing either the \member{filename}
151	attribute or the \member{file} attribute. You can then read the data at
152	leisure from the \member{file} attribute:
153
154	\begin{verbatim}
155	fileitem = form["userfile"]
156	if fileitem.file:
157	# It's an uploaded file; count lines
158	linecount = 0
159	while 1:
160	line = fileitem.file.readline()
161	if not line: break
162	linecount = linecount + 1
163	\end{verbatim}
164
165	The file upload draft standard entertains the possibility of uploading
166	multiple files from one field (using a recursive
167	\mimetype{multipart/*} encoding). When this occurs, the item will be
168	a dictionary-like \class{FieldStorage} item. This can be determined
169	by testing its \member{type} attribute, which should be
170	\mimetype{multipart/form-data} (or perhaps another MIME type matching
171	\mimetype{multipart/*}). In this case, it can be iterated over
172	recursively just like the top-level form object.
173
174	When a form is submitted in the ``old'' format (as the query string or
175	as a single data part of type
176	\mimetype{application/x-www-form-urlencoded}), the items will actually
177	be instances of the class \class{MiniFieldStorage}. In this case, the
178	\member{list}, \member{file}, and \member{filename} attributes are
179	always \code{None}.
180
181
182	\subsection{Higher Level Interface}
183
184	\versionadded{2.2} % XXX: Is this true ?
185
186	The previous section explains how to read CGI form data using the
187	\class{FieldStorage} class. This section describes a higher level
188	interface which was added to this class to allow one to do it in a
189	more readable and intuitive way. The interface doesn't make the
190	techniques described in previous sections obsolete --- they are still
191	useful to process file uploads efficiently, for example.
192
193	The interface consists of two simple methods. Using the methods
194	you can process form data in a generic way, without the need to worry
195	whether only one or more values were posted under one name.
196
197	In the previous section, you learned to write following code anytime
198	you expected a user to post more than one value under one name:
199
200	\begin{verbatim}
201	item = form.getvalue("item")
202	if isinstance(item, list):
203	# The user is requesting more than one item.
204	else:
205	# The user is requesting only one item.
206	\end{verbatim}
207
208	This situation is common for example when a form contains a group of
209	multiple checkboxes with the same name:
210
211	\begin{verbatim}
212	<input type="checkbox" name="item" value="1" />
213	<input type="checkbox" name="item" value="2" />
214	\end{verbatim}
215
216	In most situations, however, there's only one form control with a
217	particular name in a form and then you expect and need only one value
218	associated with this name. So you write a script containing for
219	example this code:
220
221	\begin{verbatim}
222	user = form.getvalue("user").upper()
223	\end{verbatim}
224
225	The problem with the code is that you should never expect that a
226	client will provide valid input to your scripts. For example, if a
227	curious user appends another \samp{user=foo} pair to the query string,
228	then the script would crash, because in this situation the
229	\code{getvalue("user")} method call returns a list instead of a
230	string. Calling the \method{toupper()} method on a list is not valid
231	(since lists do not have a method of this name) and results in an
232	\exception{AttributeError} exception.
233
234	Therefore, the appropriate way to read form data values was to always
235	use the code which checks whether the obtained value is a single value
236	or a list of values. That's annoying and leads to less readable
237	scripts.
238
239	A more convenient approach is to use the methods \method{getfirst()}
240	and \method{getlist()} provided by this higher level interface.
241
242	\begin{methoddesc}[FieldStorage]{getfirst}{name\optional{, default}}
243	This method always returns only one value associated with form field
244	\var{name}. The method returns only the first value in case that
245	more values were posted under such name. Please note that the order
246	in which the values are received may vary from browser to browser
247	and should not be counted on.\footnote{Note that some recent
248	versions of the HTML specification do state what order the
249	field values should be supplied in, but knowing whether a
250	request was received from a conforming browser, or even from a
251	browser at all, is tedious and error-prone.} If no such form
252	field or value exists then the method returns the value specified by
253	the optional parameter \var{default}. This parameter defaults to
254	\code{None} if not specified.
255	\end{methoddesc}
256
257	\begin{methoddesc}[FieldStorage]{getlist}{name}
258	This method always returns a list of values associated with form
259	field \var{name}. The method returns an empty list if no such form
260	field or value exists for \var{name}. It returns a list consisting
261	of one item if only one such value exists.
262	\end{methoddesc}
263
264	Using these methods you can write nice compact code:
265
266	\begin{verbatim}
267	import cgi
268	form = cgi.FieldStorage()
269	user = form.getfirst("user", "").upper() # This way it's safe.
270	for item in form.getlist("item"):
271	do_something(item)
272	\end{verbatim}
273
274
275	\subsection{Old classes}
276
277	These classes, present in earlier versions of the \module{cgi} module,
278	are still supported for backward compatibility. New applications
279	should use the \class{FieldStorage} class.
280
281	\class{SvFormContentDict} stores single value form content as
282	dictionary; it assumes each field name occurs in the form only once.
283
284	\class{FormContentDict} stores multiple value form content as a
285	dictionary (the form items are lists of values). Useful if your form
286	contains multiple fields with the same name.
287
288	Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
289	present for backwards compatibility with really old applications only.
290	If you still use these and would be inconvenienced when they
291	disappeared from a next version of this module, drop me a note.
292
293
294	\subsection{Functions}
295	\nodename{Functions in cgi module}
296
297	These are useful if you want more control, or if you want to employ
298	some of the algorithms implemented in this module in other
299	circumstances.
300
301	\begin{funcdesc}{parse}{fp\optional{, keep_blank_values\optional{,
302	strict_parsing}}}
303	Parse a query in the environment or from a file (the file defaults
304	to \code{sys.stdin}). The \var{keep_blank_values} and
305	\var{strict_parsing} parameters are passed to \function{parse_qs()}
306	unchanged.
307	\end{funcdesc}
308
309	\begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
310	strict_parsing}}}
311	Parse a query string given as a string argument (data of type
312	\mimetype{application/x-www-form-urlencoded}). Data are
313	returned as a dictionary. The dictionary keys are the unique query
314	variable names and the values are lists of values for each name.
315
316	The optional argument \var{keep_blank_values} is
317	a flag indicating whether blank values in
318	URL encoded queries should be treated as blank strings.
319	A true value indicates that blanks should be retained as
320	blank strings. The default false value indicates that
321	blank values are to be ignored and treated as if they were
322	not included.
323
324	The optional argument \var{strict_parsing} is a flag indicating what
325	to do with parsing errors. If false (the default), errors
326	are silently ignored. If true, errors raise a \exception{ValueError}
327	exception.
328
329	Use the \function{\refmodule{urllib}.urlencode()} function to convert
330	such dictionaries into query strings.
331
332	\end{funcdesc}
333
334	\begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
335	strict_parsing}}}
336	Parse a query string given as a string argument (data of type
337	\mimetype{application/x-www-form-urlencoded}). Data are
338	returned as a list of name, value pairs.
339
340	The optional argument \var{keep_blank_values} is
341	a flag indicating whether blank values in
342	URL encoded queries should be treated as blank strings.
343	A true value indicates that blanks should be retained as
344	blank strings. The default false value indicates that
345	blank values are to be ignored and treated as if they were
346	not included.
347
348	The optional argument \var{strict_parsing} is a flag indicating what
349	to do with parsing errors. If false (the default), errors
350	are silently ignored. If true, errors raise a \exception{ValueError}
351	exception.
352
353	Use the \function{\refmodule{urllib}.urlencode()} function to convert
354	such lists of pairs into query strings.
355	\end{funcdesc}
356
357	\begin{funcdesc}{parse_multipart}{fp, pdict}
358	Parse input of type \mimetype{multipart/form-data} (for
359	file uploads). Arguments are \var{fp} for the input file and
360	\var{pdict} for a dictionary containing other parameters in
361	the \mailheader{Content-Type} header.
362
363	Returns a dictionary just like \function{parse_qs()} keys are the
364	field names, each value is a list of values for that field. This is
365	easy to use but not much good if you are expecting megabytes to be
366	uploaded --- in that case, use the \class{FieldStorage} class instead
367	which is much more flexible.
368
369	Note that this does not parse nested multipart parts --- use
370	\class{FieldStorage} for that.
371	\end{funcdesc}
372
373	\begin{funcdesc}{parse_header}{string}
374	Parse a MIME header (such as \mailheader{Content-Type}) into a main
375	value and a dictionary of parameters.
376	\end{funcdesc}
377
378	\begin{funcdesc}{test}{}
379	Robust test CGI script, usable as main program.
380	Writes minimal HTTP headers and formats all information provided to
381	the script in HTML form.
382	\end{funcdesc}
383
384	\begin{funcdesc}{print_environ}{}
385	Format the shell environment in HTML.
386	\end{funcdesc}
387
388	\begin{funcdesc}{print_form}{form}
389	Format a form in HTML.
390	\end{funcdesc}
391
392	\begin{funcdesc}{print_directory}{}
393	Format the current directory in HTML.
394	\end{funcdesc}
395
396	\begin{funcdesc}{print_environ_usage}{}
397	Print a list of useful (used by CGI) environment variables in
398	HTML.
399	\end{funcdesc}
400
401	\begin{funcdesc}{escape}{s\optional{, quote}}
402	Convert the characters
403	\character{\&}, \character{<} and \character{>} in string \var{s} to
404	HTML-safe sequences. Use this if you need to display text that might
405	contain such characters in HTML. If the optional flag \var{quote} is
406	true, the quotation mark character (\character{"}) is also translated;
407	this helps for inclusion in an HTML attribute value, as in \code{<A
408	HREF="...">}. If the value to be quoted might include single- or
409	double-quote characters, or both, consider using the
410	\function{quoteattr()} function in the \refmodule{xml.sax.saxutils}
411	module instead.
412	\end{funcdesc}
413
414
415	\subsection{Caring about security \label{cgi-security}}
416
417	\indexii{CGI}{security}
418
419	There's one important rule: if you invoke an external program (via the
420	\function{os.system()} or \function{os.popen()} functions. or others
421	with similar functionality), make very sure you don't pass arbitrary
422	strings received from the client to the shell. This is a well-known
423	security hole whereby clever hackers anywhere on the Web can exploit a
424	gullible CGI script to invoke arbitrary shell commands. Even parts of
425	the URL or field names cannot be trusted, since the request doesn't
426	have to come from your form!
427
428	To be on the safe side, if you must pass a string gotten from a form
429	to a shell command, you should make sure the string contains only
430	alphanumeric characters, dashes, underscores, and periods.
431
432
433	\subsection{Installing your CGI script on a \UNIX\ system}
434
435	Read the documentation for your HTTP server and check with your local
436	system administrator to find the directory where CGI scripts should be
437	installed; usually this is in a directory \file{cgi-bin} in the server tree.
438
439	Make sure that your script is readable and executable by ``others''; the
440	\UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
441	\var{filename}}). Make sure that the first line of the script contains
442	\code{\#!} starting in column 1 followed by the pathname of the Python
443	interpreter, for instance:
444
445	\begin{verbatim}
446	#!/usr/local/bin/python
447	\end{verbatim}
448
449	Make sure the Python interpreter exists and is executable by ``others''.
450
451	Make sure that any files your script needs to read or write are
452	readable or writable, respectively, by ``others'' --- their mode
453	should be \code{0644} for readable and \code{0666} for writable. This
454	is because, for security reasons, the HTTP server executes your script
455	as user ``nobody'', without any special privileges. It can only read
456	(write, execute) files that everybody can read (write, execute). The
457	current directory at execution time is also different (it is usually
458	the server's cgi-bin directory) and the set of environment variables
459	is also different from what you get when you log in. In particular, don't
460	count on the shell's search path for executables (\envvar{PATH}) or
461	the Python module search path (\envvar{PYTHONPATH}) to be set to
462	anything interesting.
463
464	If you need to load modules from a directory which is not on Python's
465	default module search path, you can change the path in your script,
466	before importing other modules. For example:
467
468	\begin{verbatim}
469	import sys
470	sys.path.insert(0, "/usr/home/joe/lib/python")
471	sys.path.insert(0, "/usr/local/lib/python")
472	\end{verbatim}
473
474	(This way, the directory inserted last will be searched first!)
475
476	Instructions for non-\UNIX{} systems will vary; check your HTTP server's
477	documentation (it will usually have a section on CGI scripts).
478
479
480	\subsection{Testing your CGI script}
481
482	Unfortunately, a CGI script will generally not run when you try it
483	from the command line, and a script that works perfectly from the
484	command line may fail mysteriously when run from the server. There's
485	one reason why you should still test your script from the command
486	line: if it contains a syntax error, the Python interpreter won't
487	execute it at all, and the HTTP server will most likely send a cryptic
488	error to the client.
489
490	Assuming your script has no syntax errors, yet it does not work, you
491	have no choice but to read the next section.
492
493
494	\subsection{Debugging CGI scripts} \indexii{CGI}{debugging}
495
496	First of all, check for trivial installation errors --- reading the
497	section above on installing your CGI script carefully can save you a
498	lot of time. If you wonder whether you have understood the
499	installation procedure correctly, try installing a copy of this module
500	file (\file{cgi.py}) as a CGI script. When invoked as a script, the file
501	will dump its environment and the contents of the form in HTML form.
502	Give it the right mode etc, and send it a request. If it's installed
503	in the standard \file{cgi-bin} directory, it should be possible to send it a
504	request by entering a URL into your browser of the form:
505
506	\begin{verbatim}
507	http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
508	\end{verbatim}
509
510	If this gives an error of type 404, the server cannot find the script
511	-- perhaps you need to install it in a different directory. If it
512	gives another error, there's an installation problem that
513	you should fix before trying to go any further. If you get a nicely
514	formatted listing of the environment and form content (in this
515	example, the fields should be listed as ``addr'' with value ``At Home''
516	and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
517	installed correctly. If you follow the same procedure for your own
518	script, you should now be able to debug it.
519
520	The next step could be to call the \module{cgi} module's
521	\function{test()} function from your script: replace its main code
522	with the single statement
523
524	\begin{verbatim}
525	cgi.test()
526	\end{verbatim}
527
528	This should produce the same results as those gotten from installing
529	the \file{cgi.py} file itself.
530
531	When an ordinary Python script raises an unhandled exception (for
532	whatever reason: of a typo in a module name, a file that can't be
533	opened, etc.), the Python interpreter prints a nice traceback and
534	exits. While the Python interpreter will still do this when your CGI
535	script raises an exception, most likely the traceback will end up in
536	one of the HTTP server's log files, or be discarded altogether.
537
538	Fortunately, once you have managed to get your script to execute
539	\emph{some} code, you can easily send tracebacks to the Web browser
540	using the \refmodule{cgitb} module. If you haven't done so already,
541	just add the line:
542
543	\begin{verbatim}
544	import cgitb; cgitb.enable()
545	\end{verbatim}
546
547	to the top of your script. Then try running it again; when a
548	problem occurs, you should see a detailed report that will
549	likely make apparent the cause of the crash.
550
551	If you suspect that there may be a problem in importing the
552	\refmodule{cgitb} module, you can use an even more robust approach
553	(which only uses built-in modules):
554
555	\begin{verbatim}
556	import sys
557	sys.stderr = sys.stdout
558	print "Content-Type: text/plain"
559	print
560	...your code here...
561	\end{verbatim}
562
563	This relies on the Python interpreter to print the traceback. The
564	content type of the output is set to plain text, which disables all
565	HTML processing. If your script works, the raw HTML will be displayed
566	by your client. If it raises an exception, most likely after the
567	first two lines have been printed, a traceback will be displayed.
568	Because no HTML interpretation is going on, the traceback will be
569	readable.
570
571
572	\subsection{Common problems and solutions}
573
574	\begin{itemize}
575	\item Most HTTP servers buffer the output from CGI scripts until the
576	script is completed. This means that it is not possible to display a
577	progress report on the client's display while the script is running.
578
579	\item Check the installation instructions above.
580
581	\item Check the HTTP server's log files. (\samp{tail -f logfile} in a
582	separate window may be useful!)
583
584	\item Always check a script for syntax errors first, by doing something
585	like \samp{python script.py}.
586
587	\item If your script does not have any syntax errors, try adding
588	\samp{import cgitb; cgitb.enable()} to the top of the script.
589
590	\item When invoking external programs, make sure they can be found.
591	Usually, this means using absolute path names --- \envvar{PATH} is
592	usually not set to a very useful value in a CGI script.
593
594	\item When reading or writing external files, make sure they can be read
595	or written by the userid under which your CGI script will be running:
596	this is typically the userid under which the web server is running, or some
597	explicitly specified userid for a web server's \samp{suexec} feature.
598
599	\item Don't try to give a CGI script a set-uid mode. This doesn't work on
600	most systems, and is a security liability as well.
601	\end{itemize}
602

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: vendor/python/2.5/Doc/lib/libcgi.tex

Download in other formats: