source: trunk/essentials/sys-apps/gawk/doc/gawkinet.texi

Last change on this file was 3076, checked in by bird, 18 years ago

gawk 3.1.5

File size: 212.4 KB
Line 
1\input texinfo @c -*-texinfo-*-
2@c %**start of header (This is for running Texinfo on a region.)
3@setfilename gawkinet.info
4@settitle TCP/IP Internetworking With @command{gawk}
5@c %**end of header (This is for running Texinfo on a region.)
6@c FIXME: web vs. Web
7
8@dircategory Network applications
9@direntry
10* Gawkinet: (gawkinet). TCP/IP Internetworking With `gawk'.
11@end direntry
12
13@iftex
14@set DOCUMENT book
15@set CHAPTER chapter
16@set SECTION section
17@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}}
18@end iftex
19@ifinfo
20@set DOCUMENT Info file
21@set CHAPTER major node
22@set SECTION node
23@set DARKCORNER (d.c.)
24@end ifinfo
25@ifhtml
26@set DOCUMENT web page
27@set CHAPTER chapter
28@set SECTION section
29@set DARKCORNER (d.c.)
30@end ifhtml
31
32@set FSF
33
34@set FN file name
35@set FFN File Name
36
37@c merge the function and variable indexes into the concept index
38@ifinfo
39@synindex fn cp
40@synindex vr cp
41@end ifinfo
42@iftex
43@syncodeindex fn cp
44@syncodeindex vr cp
45@end iftex
46
47@c If "finalout" is commented out, the printed output will show
48@c black boxes that mark lines that are too long. Thus, it is
49@c unwise to comment it out when running a master in case there are
50@c overfulls which are deemed okay.
51
52@iftex
53@finalout
54@end iftex
55
56@smallbook
57
58@c Special files are described in chapter 6 Printing Output under
59@c 6.7 Special File Names in gawk. I think the networking does not
60@c fit into that chapter, thus this separate document. At over 50
61@c pages, I think this is the right decision. ADR.
62
63@set TITLE TCP/IP Internetworking With @command{gawk}
64@set EDITION 1.1
65@set UPDATE-MONTH January, 2004
66@c gawk versions:
67@set VERSION 3.1
68@set PATCHLEVEL 4
69
70@copying
71This is Edition @value{EDITION} of @cite{@value{TITLE}},
72for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU
73implementation of AWK.
74@sp 2
75Copyright (C) 2000, 2001, 2002, 2004 Free Software Foundation, Inc.
76@sp 2
77Permission is granted to copy, distribute and/or modify this document
78under the terms of the GNU Free Documentation License, Version 1.2 or
79any later version published by the Free Software Foundation; with the
80Invariant Sections being ``GNU General Public License'', the Front-Cover
81texts being (a) (see below), and with the Back-Cover Texts being (b)
82(see below). A copy of the license is included in the section entitled
83``GNU Free Documentation License''.
84
85@enumerate a
86@item
87``A GNU Manual''
88
89@item
90``You have freedom to copy and modify this GNU Manual, like GNU
91software. Copies published by the Free Software Foundation raise
92funds for GNU development.''
93@end enumerate
94@end copying
95
96@ifinfo
97This file documents the networking features in GNU @command{awk}.
98
99@insertcopying
100@end ifinfo
101
102@setchapternewpage odd
103
104@titlepage
105@title @value{TITLE}
106@subtitle Edition @value{EDITION}
107@subtitle @value{UPDATE-MONTH}
108@author J@"urgen Kahrs
109@author with Arnold D. Robbins
110
111@c Include the Distribution inside the titlepage environment so
112@c that headings are turned off. Headings on and off do not work.
113
114@page
115@vskip 0pt plus 1filll
116@sp 2
117Published by:
118@sp 1
119
120Free Software Foundation @*
12151 Franklin Street, Fifth Floor @*
122Boston, MA 02110-1301 USA @*
123Phone: +1-617-542-5942 @*
124Fax: +1-617-542-2652 @*
125Email: @email{gnu@@gnu.org} @*
126URL: @uref{http://www.gnu.org/} @*
127
128ISBN 1-882114-93-0 @*
129
130@insertcopying
131
132@c @sp 2
133@c Cover art by ?????.
134@end titlepage
135
136@iftex
137@headings off
138@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
139@oddheading @| @| @strong{@thischapter}@ @ @ @thispage
140@end iftex
141
142@ifnottex
143@node Top, Preface, (dir), (dir)
144@top General Introduction
145@comment node-name, next, previous, up
146
147This file documents the networking features in GNU Awk (@command{gawk})
148version 3.1 and later.
149
150@insertcopying
151@end ifnottex
152
153@menu
154* Preface:: About this document.
155* Introduction:: About networkiing.
156* Using Networking:: Some examples.
157* Some Applications and Techniques:: More extended examples.
158* Links:: Where to find the stuff mentioned in this
159 document.
160* GNU Free Documentation License:: The license for this document.
161* Index:: The index.
162
163@detailmenu
164* Stream Communications:: Sending data streams.
165* Datagram Communications:: Sending self-contained messages.
166* The TCP/IP Protocols:: How these models work in the Internet.
167* Basic Protocols:: The basic protocols.
168* Ports:: The idea behind ports.
169* Making Connections:: Making TCP/IP connections.
170* Gawk Special Files:: How to do @command{gawk} networking.
171* Special File Fields:: The fields in the special file name.
172* Comparing Protocols:: Differences between the protocols.
173* File /inet/tcp:: The TCP special file.
174* File /inet/udp:: The UDP special file.
175* File /inet/raw:: The RAW special file.
176* TCP Connecting:: Making a TCP connection.
177* Troubleshooting:: Troubleshooting TCP/IP connections.
178* Interacting:: Interacting with a service.
179* Setting Up:: Setting up a service.
180* Email:: Reading email.
181* Web page:: Reading a Web page.
182* Primitive Service:: A primitive Web service.
183* Interacting Service:: A Web service with interaction.
184* CGI Lib:: A simple CGI library.
185* Simple Server:: A simple Web server.
186* Caveats:: Network programming caveats.
187* Challenges:: Where to go from here.
188* PANIC:: An Emergency Web Server.
189* GETURL:: Retrieving Web Pages.
190* REMCONF:: Remote Configuration Of Embedded Systems.
191* URLCHK:: Look For Changed Web Pages.
192* WEBGRAB:: Extract Links From A Page.
193* STATIST:: Graphing A Statistical Distribution.
194* MAZE:: Walking Through A Maze In Virtual Reality.
195* MOBAGWHO:: A Simple Mobile Agent.
196* STOXPRED:: Stock Market Prediction As A Service.
197* PROTBASE:: Searching Through A Protein Database.
198@end detailmenu
199@end menu
200
201@contents
202
203@node Preface, Introduction, Top, Top
204@unnumbered Preface
205
206In May of 1997, J@"urgen Kahrs felt the need for network access
207from @command{awk}, and, with a little help from me, set about adding
208features to do this for @command{gawk}. At that time, he
209wrote the bulk of this @value{DOCUMENT}.
210
211The code and documentation were added to the @command{gawk} 3.1 development
212tree, and languished somewhat until I could finally get
213down to some serious work on that version of @command{gawk}.
214This finally happened in the middle of 2000.
215
216Meantime, J@"urgen wrote an article about the Internet special
217files and @samp{|&} operator for @cite{Linux Journal}, and made a
218networking patch for the production versions of @command{gawk}
219available from his home page.
220In August of 2000 (for @command{gawk} 3.0.6), this patch
221also made it to the main GNU @command{ftp} distribution site.
222
223For release with @command{gawk}, I edited J@"urgen's prose
224for English grammar and style, as he is not a native English
225speaker. I also
226rearranged the material somewhat for what I felt was a better order of
227presentation, and (re)wrote some of the introductory material.
228
229The majority of this document and the code are his work, and the
230high quality and interesting ideas speak for themselves. It is my
231hope that these features will be of significant value to the @command{awk}
232community.
233
234@sp 1
235@noindent
236Arnold Robbins @*
237Nof Ayalon, ISRAEL @*
238March, 2001
239
240@node Introduction, Using Networking, Preface, Top
241@chapter Networking Concepts
242
243This @value{CHAPTER} provides a (necessarily) brief intoduction to
244computer networking concepts. For many applications of @command{gawk}
245to TCP/IP networking, we hope that this is enough. For more
246advanced tasks, you will need deeper background, and it may be necessary
247to switch to lower-level programming in C or C++.
248
249There are two real-life models for the way computers send messages
250to each other over a network. While the analogies are not perfect,
251they are close enough to convey the major concepts.
252These two models are the phone system (reliable byte-stream communications),
253and the postal system (best-effort datagrams).
254
255@menu
256* Stream Communications:: Sending data streams.
257* Datagram Communications:: Sending self-contained messages.
258* The TCP/IP Protocols:: How these models work in the Internet.
259* Making Connections:: Making TCP/IP connections.
260@end menu
261
262@node Stream Communications, Datagram Communications, Introduction, Introduction
263@section Reliable Byte-streams (Phone Calls)
264
265When you make a phone call, the following steps occur:
266
267@enumerate
268@item
269You dial a number.
270
271@item
272The phone system connects to the called party, telling
273them there is an incoming call. (Their phone rings.)
274
275@item
276The other party answers the call, or, in the case of a
277computer network, refuses to answer the call.
278
279@item
280Assuming the other party answers, the connection between
281you is now a @dfn{duplex} (two-way), @dfn{reliable} (no data lost),
282sequenced (data comes out in the order sent) data stream.
283
284@item
285You and your friend may now talk freely, with the phone system
286moving the data (your voices) from one end to the other.
287From your point of view, you have a direct end-to-end
288connection with the person on the other end.
289@end enumerate
290
291The same steps occur in a duplex reliable computer networking connection.
292There is considerably more overhead in setting up the communications,
293but once it's done, data moves in both directions, reliably, in sequence.
294
295@node Datagram Communications, The TCP/IP Protocols, Stream Communications, Introduction
296@section Best-effort Datagrams (Mailed Letters)
297
298Suppose you mail three different documents to your office on the
299other side of the country on two different days. Doing so
300entails the following.
301
302@enumerate
303@item
304Each document travels in its own envelope.
305
306@item
307Each envelope contains both the sender and the
308recipient address.
309
310@item
311Each envelope may travel a different route to its destination.
312
313@item
314The envelopes may arrive in a different order from the one
315in which they were sent.
316
317@item
318One or more may get lost in the mail.
319(Although, fortunately, this does not occur very often.)
320
321@item
322In a computer network, one or more @dfn{packets}
323may also arrive multiple times. (This doesn't happen
324with the postal system!)
325
326@end enumerate
327
328The important characteristics of datagram communications, like
329those of the postal system are thus:
330
331@itemize @bullet
332@item
333Delivery is ``best effort;'' the data may never get there.
334
335@item
336Each message is self-contained, including the source and
337destination addresses.
338
339@item
340Delivery is @emph{not} sequenced; packets may arrive out
341of order, and/or multiple times.
342
343@item
344Unlike the phone system, overhead is considerably lower.
345It is not necessary to set up the call first.
346@end itemize
347
348The price the user pays for the lower overhead of datagram communications
349is exactly the lower reliability; it is often necessary for user-level
350protocols that use datagram communications to add their own reliability
351features on top of the basic communications.
352
353@node The TCP/IP Protocols, Making Connections, Datagram Communications, Introduction
354@section The Internet Protocols
355
356The Internet Protocol Suite (usually referred to as just TCP/IP)@footnote{
357It should be noted that although the Internet seems to have conquered the
358world, there are other networking protocol suites in existence and in use.}
359consists of a number of different protocols at different levels or ``layers.''
360For our purposes, three protocols provide the fundamental communications
361mechanisms. All other defined protocols are referred to as user-level
362protocols (e.g., HTTP, used later in this @value{DOCUMENT}).
363
364@menu
365* Basic Protocols:: The basic protocols.
366* Ports:: The idea behind ports.
367@end menu
368
369@node Basic Protocols, Ports, The TCP/IP Protocols, The TCP/IP Protocols
370@subsection The Basic Internet Protocols
371
372@table @asis
373@item IP
374The Internet Protocol. This protocol is almost never used directly by
375applications. It provides the basic packet delivery and routing infrastructure
376of the Internet. Much like the phone company's switching centers or the Post
377Office's trucks, it is not of much day-to-day interest to the regular user
378(or programmer).
379It happens to be a best effort datagram protocol.
380
381@item UDP
382The User Datagram Protocol. This is a best effort datagram protocol.
383It provides a small amount of extra reliability over IP, and adds
384the notion of @dfn{ports}, described in @ref{Ports, ,TCP and UDP Ports}.
385
386@item TCP
387The Transmission Control Protocol. This is a duplex, reliable, sequenced
388byte-stream protocol, again layered on top of IP, and also providing the
389notion of ports. This is the protocol that you will most likely use
390when using @command{gawk} for network programming.
391@end table
392
393All other user-level protocols use either TCP or UDP to do their basic
394communications. Examples are SMTP (Simple Mail Transfer Protocol),
395FTP (File Transfer Protocol), and HTTP (HyperText Transfer Protocol).
396@cindex SMTP (Simple Mail Transfer Protocol)
397@cindex FTP (File Transfer Protocol)
398@cindex HTTP (Hypertext Transfer Protocol)
399
400@node Ports, , Basic Protocols, The TCP/IP Protocols
401@subsection TCP and UDP Ports
402
403In the postal system, the address on an envelope indicates a physical
404location, such as a residence or office building. But there may be
405more than one person at a location; thus you have to further quantify
406the recipient by putting a person or company name on the envelope.
407
408In the phone system, one phone number may represent an entire company,
409in which case you need a person's extension number in order to
410reach that individual directly. Or, when you call a home, you have to
411say, ``May I please speak to ...'' before talking to the person directly.
412
413IP networking provides the concept of addressing. An IP address represents
414a particular computer, but no more. In order to reach the mail service
415on a system, or the FTP or WWW service on a system, you must have some
416way to further specify which service you want. In the Internet Protocol suite,
417this is done with @dfn{port numbers}, which represent the services, much
418like an extension number used with a phone number.
419
420Port numbers are 16-bit integers. Unix and Unix-like systems reserve ports
421below 1024 for ``well known'' services, such as SMTP, FTP, and HTTP.
422Numbers 1024 and above may be used by any application, although there is no
423promise made that a particular port number is always available.
424
425@node Making Connections, , The TCP/IP Protocols, Introduction
426@section Making TCP/IP Connections (And Some Terminology)
427
428Two terms come up repeatedly when discussing networking:
429@dfn{client} and @dfn{server}. For now, we'll discuss these terms
430at the @dfn{connection level}, when first establishing connections
431between two processes on different systems over a network.
432(Once the connection is established, the higher level, or
433@dfn{application level} protocols,
434such as HTTP or FTP, determine who is the client and who is the
435server. Often, it turns out that the client and server are the
436same in both roles.)
437
438@cindex servers
439The @dfn{server} is the system providing the service, such as the
440web server or email server. It is the @dfn{host} (system) which
441is @emph{connected to} in a transaction.
442For this to work though, the server must be expecting connections.
443Much as there has to be someone at the office building to answer
444the phone@footnote{In the days before voice mail systems!}, the
445server process (usually) has to be started first and be waiting
446for a connection.
447
448@cindex clients
449The @dfn{client} is the system requesting the service.
450It is the system @emph{initiating the connection} in a transaction.
451(Just as when you pick up the phone to call an office or store.)
452
453In the TCP/IP framework, each end of a connection is represented by a pair
454of (@var{address}, @var{port}) pairs. For the duration of the connection,
455the ports in use at each end are unique, and cannot be used simultaneously
456by other processes on the same system. (Only after closing a connection
457can a new one be built up on the same port. This is contrary to the usual
458behavior of fully developed web servers which have to avoid situations
459in which they are not reachable. We have to pay this price in order to
460enjoy the benefits of a simple communication paradigm in @command{gawk}.)
461
462@cindex blocking
463@cindex synchronous communications
464Furthermore, once the connection is established, communications are
465@dfn{synchronous}.@footnote{For the technically savvy, data reads
466block---if there's no incoming data, the program is made to wait until
467there is, instead of receiving a ``there's no data'' error return.} I.e.,
468each end waits on the other to finish transmitting, before replying. This
469is much like two people in a phone conversation. While both could talk
470simultaneously, doing so usually doesn't work too well.
471
472In the case of TCP, the synchronicity is enforced by the protocol when
473sending data. Data writes @dfn{block} until the data have been received on the
474other end. For both TCP and UDP, data reads block until there is incoming
475data waiting to be read. This is summarized in the following table,
476where an ``X'' indicates that the given action blocks.
477
478@ifnottex
479@multitable {Protocol} {Reads} {Writes}
480@item TCP @tab X @tab X
481@item UDP @tab X @tab
482@item RAW @tab X @tab
483@end multitable
484@end ifnottex
485@tex
486\centerline{
487\vbox{\bigskip % space above the table (about 1 linespace)
488% Because we have vertical rules, we can't let TeX insert interline space
489% in its usual way.
490\offinterlineskip
491\halign{\hfil\strut# &\vrule #& \hfil#\hfil& \hfil#\hfil\cr
492Protocol&&\quad Reads\quad &Writes\cr
493\noalign{\hrule}
494\omit&height 2pt\cr
495\noalign{\hrule height0pt}% without this the rule does not extend; why?
496TCP&&X&X\cr
497UDP&&X&\cr
498RAW&&X&\cr
499}}}
500@end tex
501
502@node Using Networking, Some Applications and Techniques, Introduction, Top
503@comment node-name, next, previous, up
504@chapter Networking With @command{gawk}
505
506@c STARTOFRANGE netgawk
507@cindex networks, @command{gawk} and
508@c STARTOFRANGE gawknet
509@cindex @command{gawk}, networking
510The @command{awk} programming language was originally developed as a
511pattern-matching language for writing short programs to perform
512data manipulation tasks.
513@command{awk}'s strength is the manipulation of textual data
514that is stored in files.
515It was never meant to be used for networking purposes.
516To exploit its features in a
517networking context, it's necessary to use an access mode for network connections
518that resembles the access of files as closely as possible.
519
520@cindex Perl
521@cindex Python
522@cindex Tcl/Tk
523@command{awk} is also meant to be a prototyping language. It is used
524to demonstrate feasibility and to play with features and user interfaces.
525This can be done with file-like handling of network
526connections.
527@command{gawk} trades the lack
528of many of the advanced features of the TCP/IP family of protocols
529for the convenience of simple connection handling.
530The advanced
531features are available when programming in C or Perl. In fact, the
532network programming
533in this @value{CHAPTER}
534is very similar to what is described in books such as
535@cite{Internet Programming with Python},
536@cite{Advanced Perl Programming},
537or
538@cite{Web Client Programming with Perl}.
539
540@cindex Perl, @command{gawk} networking and
541@cindex Python, @command{gawk} networking and
542@cindex Tcl/Tk, @command{gawk} and
543However, you can do the programming here without first having to learn object-oriented
544ideology; underlying languages such as Tcl/Tk, Perl, Python; or all of
545the libraries necessary to extend these languages before they are ready for the Internet.
546
547@cindex Transmission Control Protocol, See TCP
548@cindex TCP (Transmission Control Protocol)
549This @value{CHAPTER} demonstrates how to use the TCP protocol. The
550other protocols are much less important for most users (UDP) or even
551untractable (RAW).
552
553@menu
554* Gawk Special Files:: How to do @command{gawk} networking.
555* TCP Connecting:: Making a TCP connection.
556* Troubleshooting:: Troubleshooting TCP/IP connections.
557* Interacting:: Interacting with a service.
558* Setting Up:: Setting up a service.
559* Email:: Reading email.
560* Web page:: Reading a Web page.
561* Primitive Service:: A primitive Web service.
562* Interacting Service:: A Web service with interaction.
563* Simple Server:: A simple Web server.
564* Caveats:: Network programming caveats.
565* Challenges:: Where to go from here.
566@end menu
567
568@node Gawk Special Files, TCP Connecting, Using Networking, Using Networking
569@comment node-name, next, previous, up
570@section @command{gawk}'s Networking Mechanisms
571
572The @samp{|&} operator introduced in @command{gawk} 3.1 for use in
573communicating with a @dfn{coprocess} is described in
574@ref{Two-way I/O, ,Two-way Communications With Another Process, gawk, GAWK: Effective AWK Programming}.
575It shows how to do two-way I/O to a
576separate process, sending it data with @code{print} or @code{printf} and
577reading data with @code{getline}. If you haven't read it already, you should
578detour there to do so.
579
580@command{gawk} transparently extends the two-way I/O mechanism to simple networking through
581the use of special @value{FN}s. When a ``coprocess'' that matches
582the special files we are about to describe
583is started, @command{gawk} creates the appropriate network
584connection, and then two-way I/O proceeds as usual.
585
586@c last comma is part of see-also
587@cindex input/output, two-way, See Also @command{gawk}, networking
588@cindex TCP/IP, sockets and
589At the C, C++, and Perl level, networking is accomplished
590via @dfn{sockets}, an Application Programming Interface (API) originally
591developed at the University of California at Berkeley that is now used
592almost universally for TCP/IP networking.
593Socket level programming, while fairly straightforward, requires paying
594attention to a number of details, as well as using binary data. It is not
595well-suited for use from a high-level language like @command{awk}.
596The special files provided in @command{gawk} hide the details from
597the programmer, making things much simpler and easier to use.
598@c Who sez we can't toot our own horn occasionally?
599
600@c STARTOFRANGE filenet
601@cindex filenames, for network access
602@c STARTOFRANGE gawnetf
603@cindex @command{gawk}, networking, filenames
604@c STARTOFRANGE netgawf
605@cindex networks, @command{gawk} and, filenames
606The special @value{FN} for network access is made up of several fields, all
607of which are mandatory:
608
609@example
610/inet/@var{protocol}/@var{localport}/@var{hostname}/@var{remoteport}
611@end example
612
613@cindex @code{/inet/} files (@command{gawk})
614@cindex files, @code{/inet/} (@command{gawk})
615@cindex localport field
616@cindex remoteport field
617The @file{/inet/} field is, of course, constant when accessing the network.
618The @var{localport} and @var{remoteport} fields do not have a meaning
619when used with @file{/inet/raw} because ``ports'' only apply to
620TCP and UDP. So, when using @file{/inet/raw}, the port fields always have
621to be @samp{0}.
622
623@menu
624* Special File Fields:: The fields in the special file name.
625* Comparing Protocols:: Differences between the protocols.
626@end menu
627
628@node Special File Fields, Comparing Protocols, Gawk Special Files, Gawk Special Files
629@subsection The Fields of the Special @value{FFN}
630This @value{SECTION} explains the meaning of all the other fields,
631as well as the range of values and the defaults.
632All of the fields are mandatory. To let the system pick a value,
633or if the field doesn't apply to the protocol, specify it as @samp{0}:
634
635@table @var
636@cindex protocol field
637@c last comma is part of secondary
638@cindex TCP/IP, protocols, selecting
639@item protocol
640Determines which member of the TCP/IP
641family of protocols is selected to transport the data across the
642network. There are three possible values (always written in lowercase):
643@samp{tcp}, @samp{udp}, and @samp{raw}. The exact meaning of each is
644explained later in this @value{SECTION}.
645
646@item localport
647@cindex networks, ports, specifying
648Determines which port on the local
649machine is used to communicate across the network. It has no meaning
650with @file{/inet/raw} and must therefore be @samp{0}. Application-level clients
651usually use @samp{0} to indicate they do not care which local port is
652used---instead they specify a remote port to connect to. It is vital for
653application-level servers to use a number different from @samp{0} here
654because their service has to be available at a specific publicly known
655port number. It is possible to use a name from @file{/etc/services} here.
656
657@item hostname
658@cindex hostname field
659@cindex servers, as hosts
660Determines which remote host is to
661be at the other end of the connection. Application-level servers must fill
662this field with a @samp{0} to indicate their being open for all other hosts
663to connect to them and enforce connection level server behavior this way.
664It is not possible for an application-level server to restrict its
665availability to one remote host by entering a host name here.
666Application-level clients must enter a name different from @samp{0}.
667The name can be either symbolic
668(e.g., @samp{jpl-devvax.jpl.nasa.gov}) or numeric (e.g., @samp{128.149.1.143}).
669
670@item remoteport
671Determines which port on the remote
672machine is used to communicate across the network. It has no meaning
673with @file{/inet/raw} and must therefore be 0.
674For @file{/inet/tcp} and @file{/inet/udp},
675application-level clients @emph{must} use a number
676other than @samp{0} to indicate to which port on the remote machine
677they want to connect. Application-level servers must not fill this field with
678a @samp{0}. Instead they specify a local port to which clients connect.
679It is possible to use a name from @file{/etc/services} here.
680@end table
681
682@cindex networks, @command{gawk} and, connections
683@cindex @command{gawk}, networking, connections
684Experts in network programming will notice that the usual
685client/server asymmetry found at the level of the socket API is not visible
686here. This is for the sake of simplicity of the high-level concept. If this
687asymmetry is necessary for your application,
688use another language.
689For @command{gawk}, it is
690more important to enable users to write a client program with a minimum
691of code. What happens when first accessing a network connection is seen
692in the following pseudocode:
693
694@smallexample
695if ((name of remote host given) && (other side accepts connection)) @{
696 rendez-vous successful; transmit with getline or print
697@} else @{
698 if ((other side did not accept) && (localport == 0))
699 exit unsuccessful
700 if (TCP) @{
701 set up a server accepting connections
702 this means waiting for the client on the other side to connect
703 @} else
704 ready
705@}
706@end smallexample
707
708The exact behavior of this algorithm depends on the values of the
709fields of the special @value{FN}. When in doubt, @ref{table-inet-components}
710gives you the combinations of values and their meaning. If this
711table is too complicated, focus on the three lines printed in
712@strong{bold}. All the examples in
713@ref{Using Networking, ,Networking With @command{gawk}},
714use only the
715patterns printed in bold letters.
716
717@float Table,table-inet-components
718@caption{/inet Special File Components}
719@multitable @columnfractions .15 .15 .15 .15 .40
720@headitem @sc{protocol} @tab @sc{local port} @tab @sc{host name}
721@tab @sc{remote port} @tab @sc{Resulting connection-level behavior}
722@item @strong{tcp} @tab @strong{0} @tab @strong{x} @tab @strong{x} @tab
723 @strong{Dedicated client, fails if immediately connecting to a
724 server on the other side fails}
725@item udp @tab 0 @tab x @tab x @tab Dedicated client
726@item raw @tab 0 @tab x @tab 0 @tab Dedicated client, works only as @code{root}
727@item @strong{tcp, udp} @tab @strong{x} @tab @strong{x} @tab @strong{x} @tab
728 @strong{Client, switches to dedicated server if necessary}
729@item @strong{tcp, udp} @tab @strong{x} @tab @strong{0} @tab @strong{0} @tab
730 @strong{Dedicated server}
731@item raw @tab 0 @tab 0 @tab 0 @tab Dedicated server, works only as @code{root}
732@item tcp, udp, raw @tab x @tab x @tab 0 @tab Invalid
733@item tcp, udp, raw @tab 0 @tab 0 @tab x @tab Invalid
734@item tcp, udp, raw @tab x @tab 0 @tab x @tab Invalid
735@item tcp, udp @tab 0 @tab 0 @tab 0 @tab Invalid
736@item tcp, udp @tab 0 @tab x @tab 0 @tab Invalid
737@item raw @tab x @tab 0 @tab 0 @tab Invalid
738@item raw @tab 0 @tab x @tab x @tab Invalid
739@item raw @tab x @tab x @tab x @tab Invalid
740@end multitable
741@end float
742
743In general, TCP is the preferred mechanism to use. It is the simplest
744protocol to understand and to use. Use the others only if circumstances
745demand low-overhead.
746
747@node Comparing Protocols, , Special File Fields, Gawk Special Files
748@subsection Comparing Protocols
749
750This @value{SECTION} develops a pair of programs (sender and receiver)
751that do nothing but send a timestamp from one machine to another. The
752sender and the receiver are implemented with each of the three protocols
753available and demonstrate the differences between them.
754
755@menu
756* File /inet/tcp:: The TCP special file.
757* File /inet/udp:: The UDP special file.
758* File /inet/raw:: The RAW special file.
759@end menu
760
761@node File /inet/tcp, File /inet/udp, Comparing Protocols, Comparing Protocols
762@subsubsection @file{/inet/tcp}
763@cindex @code{/inet/tcp} special files (@command{gawk})
764@cindex files, @code{/inet/tcp} (@command{gawk})
765@cindex TCP (Transmission Control Protocol)
766Once again, always use TCP.
767(Use UDP when low overhead is a necessity, and use RAW for
768network experimentation.)
769The first example is the sender
770program:
771
772@example
773# Server
774BEGIN @{
775 print strftime() |& "/inet/tcp/8888/0/0"
776 close("/inet/tcp/8888/0/0")
777@}
778@end example
779
780The receiver is very simple:
781
782@example
783# Client
784BEGIN @{
785 "/inet/tcp/0/localhost/8888" |& getline
786 print $0
787 close("/inet/tcp/0/localhost/8888")
788@}
789@end example
790
791TCP guarantees that the bytes arrive at the receiving end in exactly
792the same order that they were sent. No byte is lost
793(except for broken connections), doubled, or out of order. Some
794overhead is necessary to accomplish this, but this is the price to pay for
795a reliable service.
796It does matter which side starts first. The sender/server has to be started
797first, and it waits for the receiver to read a line.
798
799@node File /inet/udp, File /inet/raw, File /inet/tcp, Comparing Protocols
800@subsubsection @file{/inet/udp}
801@cindex @code{/inet/udp} special files (@command{gawk})
802@cindex files, @code{/inet/udp} (@command{gawk})
803@cindex UDP (User Datagram Protocol)
804@cindex User Datagram Protocol, See UDP
805The server and client programs that use UDP are almost identical to their TCP counterparts;
806only the @var{protocol} has changed. As before, it does matter which side
807starts first. The receiving side blocks and waits for the sender.
808In this case, the receiver/client has to be started first:
809
810@page
811@example
812# Server
813BEGIN @{
814 print strftime() |& "/inet/udp/8888/0/0"
815 close("/inet/udp/8888/0/0")
816@}
817@end example
818
819The receiver is almost identical to the TCP receiver:
820
821@example
822# Client
823BEGIN @{
824 "/inet/udp/0/localhost/8888" |& getline
825 print $0
826 close("/inet/udp/0/localhost/8888")
827@}
828@end example
829
830UDP cannot guarantee that the datagrams at the receiving end will arrive in exactly
831the same order they were sent. Some datagrams could be
832lost, some doubled, and some out of order. But no overhead is necessary to
833accomplish this. This unreliable behavior is good enough for tasks
834such as data acquisition, logging, and even stateless services like NFS.
835
836@node File /inet/raw, , File /inet/udp, Comparing Protocols
837@subsubsection @file{/inet/raw}
838@cindex @code{/inet/raw} special files (@command{gawk})
839@cindex files, @code{/inet/raw} (@command{gawk})
840@cindex RAW protocol
841
842This is an IP-level protocol. Only @code{root} is allowed to access this
843special file. It is meant to be the basis for implementing
844and experimenting with transport-level protocols.@footnote{This special file
845is reserved, but not otherwise currently implemented.}
846In the most general case,
847the sender has to supply the encapsulating header bytes in front of the
848packet and the receiver has to strip the additional bytes from the message.
849
850@cindex dark corner, RAW protocol
851RAW receivers cannot receive packets sent with TCP or UDP because the
852operating system does not deliver the packets to a RAW receiver. The
853operating system knows about some of the protocols on top of IP
854and decides on its own which packet to deliver to which process.
855@value{DARKCORNER}
856Therefore, the UDP receiver must be used for receiving UDP
857datagrams sent with the RAW sender. This is a dark corner, not only of
858@command{gawk}, but also of TCP/IP.
859
860@cindex SPAK utility
861For extended experimentation with protocols, look into
862the approach implemented in a tool called SPAK.
863This tool reflects the hierarchical layering of protocols (encapsulation)
864in the way data streams are piped out of one program into the next one.
865It shows which protocol is based on which other (lower-level) protocol
866by looking at the command-line ordering of the program calls.
867Cleverly thought out, SPAK is much better than @command{gawk}'s
868@file{/inet} for learning the meaning of each and every bit in the
869protocol headers.
870
871The next example uses the RAW protocol to emulate
872the behavior of UDP. The sender program is the same as above, but with some
873additional bytes that fill the places of the UDP fields:
874
875@example
876@group
877BEGIN @{
878 Message = "Hello world\n"
879 SourcePort = 0
880 DestinationPort = 8888
881 MessageLength = length(Message)+8
882 RawService = "/inet/raw/0/localhost/0"
883 printf("%c%c%c%c%c%c%c%c%s",
884 SourcePort/256, SourcePort%256,
885 DestinationPort/256, DestinationPort%256,
886 MessageLength/256, MessageLength%256,
887 0, 0, Message) |& RawService
888 fflush(RawService)
889 close(RawService)
890@}
891@end group
892@end example
893
894Since this program tries
895to emulate the behavior of UDP, it checks if
896the RAW sender is understood by the UDP receiver but not if the RAW receiver
897can understand the UDP sender. In a real network, the
898RAW receiver is hardly
899of any use because it gets every IP packet that
900comes across the network. There are usually so many packets that
901@command{gawk} would be too slow for processing them.
902Only on a network with little
903traffic can the IP-level receiver program be tested. Programs for analyzing
904IP traffic on modem or ISDN channels should be possible.
905
906Port numbers do not have a meaning when using @file{/inet/raw}. Their fields
907have to be @samp{0}. Only TCP and UDP use ports. Receiving data from
908@file{/inet/raw} is difficult, not only because of processing speed but also
909because data is usually binary and not restricted to ASCII. This
910implies that line separation with @code{RS} does not work as usual.
911
912@node TCP Connecting, Troubleshooting, Gawk Special Files, Using Networking
913@section Establishing a TCP Connection
914
915@c STARTOFRANGE tcpcon
916@cindex TCP (Transmission Control Protocol), connection, establishing
917@c STARTOFRANGE netcon
918@cindex networks, @command{gawk} and, connections
919@c STARTOFRANGE gawcon
920@cindex @command{gawk}, networking, connections
921Let's observe a network connection at work. Type in the following program
922and watch the output. Within a second, it connects via TCP (@file{/inet/tcp})
923to the machine it is running on (@samp{localhost}) and asks the service
924@samp{daytime} on the machine what time it is:
925
926@cindex @code{getline} command
927@example
928BEGIN @{
929 "/inet/tcp/0/localhost/daytime" |& getline
930 print $0
931 close("/inet/tcp/0/localhost/daytime")
932@}
933@end example
934
935Even experienced @command{awk} users will find the second line strange in two
936respects:
937
938@itemize @bullet
939@item
940A special file is used as a shell command that pipes its output
941into @code{getline}. One would rather expect to see the special file
942being read like any other file (@samp{getline <
943"/inet/tcp/0/localhost/daytime")}.
944
945@item
946@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
947@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
948The operator @samp{|&} has not been part of any @command{awk}
949implementation (until now).
950It is actually the only extension of the @command{awk}
951language needed (apart from the special files) to introduce network access.
952@end itemize
953
954@cindex pipes, networking and
955The @samp{|&} operator was introduced in @command{gawk} 3.1 in order to
956overcome the crucial restriction that access to files and pipes in
957@command{awk} is always unidirectional. It was formerly impossible to use
958both access modes on the same file or pipe. Instead of changing the whole
959concept of file access, the @samp{|&} operator
960behaves exactly like the usual pipe operator except for two additions:
961
962@itemize @bullet
963@item
964Normal shell commands connected to their @command{gawk} program with a @samp{|&}
965pipe can be accessed bidirectionally. The @samp{|&} turns out to be a quite
966general, useful, and natural extension of @command{awk}.
967
968@item
969Pipes that consist of a special @value{FN} for network connections are not
970executed as shell commands. Instead, they can be read and written to, just
971like a full-duplex network connection.
972@end itemize
973
974In the earlier example, the @samp{|&} operator tells @code{getline}
975to read a line from the special file @file{/inet/tcp/0/localhost/daytime}.
976We could also have printed a line into the special file. But instead we just
977read a line with the time, printed it, and closed the connection.
978(While we could just let @command{gawk} close the connection by finishing
979the program, in this @value{DOCUMENT}
980we are pedantic and always explicitly close the connections.)
981
982@node Troubleshooting, Interacting, TCP Connecting, Using Networking
983@section Troubleshooting Connection Problems
984@cindex advanced features, network connections
985@c last comma is part of secondary
986@cindex troubleshooting, networks, connections
987It may well be that for some reason the program shown in the previous example does not run on your
988machine. When looking at possible reasons for this, you will learn much
989about typical problems that arise in network programming. First of all,
990your implementation of @command{gawk} may not support network access
991because it is
992a pre-3.1 version or you do not have a network interface in your machine.
993Perhaps your machine uses some other protocol, such as
994DECnet or Novell's IPX. For the rest of this @value{CHAPTER},
995we will assume
996you work on a Unix machine that supports TCP/IP. If the previous example program does
997not run on your machine, it may help to replace the name
998@samp{localhost} with the name of your machine or its IP address. If it
999does, you could replace @samp{localhost} with the name of another machine
1000in your vicinity---this way, the program connects to another machine.
1001Now you should see the date and time being printed by the program,
1002otherwise your machine may not support the @samp{daytime} service.
1003Try changing the service to @samp{chargen} or @samp{ftp}. This way, the program
1004connects to other services that should give you some response. If you are
1005curious, you should have a look at your @file{/etc/services} file. It could
1006look like this:
1007
1008@ignore
1009@multitable {1234567890123} {1234567890123} {123456789012345678901234567890123456789012}
1010@item Service @strong{name} @tab Service @strong{number}
1011@item echo @tab 7/tcp @tab echo sends back each line it receivces
1012@item echo @tab 7/udp @tab echo is good for testing purposes
1013@item discard @tab 9/tcp @tab discard behaves like @file{/dev/null}
1014@item discard @tab 9/udp @tab discard just throws away each line
1015@item daytime @tab 13/tcp @tab daytime sends date & time once per connection
1016@item daytime @tab 13/udp
1017@item chargen @tab 19/tcp @tab chargen infinitely produces character sets
1018@item chargen @tab 19/udp @tab chargen is good for testing purposes
1019@item ftp @tab 21/tcp @tab ftp is the usual file transfer protocol
1020@item telnet @tab 23/tcp @tab telnet is the usual login facility
1021@item smtp @tab 25/tcp @tab smtp is the Simple Mail Transfer Protocol
1022@item finger @tab 79/tcp @tab finger tells you who is logged in
1023@item www @tab 80/tcp @tab www is the HyperText Transfer Protocol
1024@item pop2 @tab 109/tcp @tab pop2 is an older version of pop3
1025@item pop2 @tab 109/udp
1026@item pop3 @tab 110/tcp @tab pop3 is the Post Office Protocol
1027@item pop3 @tab 110/udp @tab pop3 is used for receiving email
1028@item nntp @tab 119/tcp @tab nntp is the USENET News Transfer Protocol
1029@item irc @tab 194/tcp @tab irc is the Internet Relay Chat
1030@item irc @tab 194/udp
1031@end multitable
1032@end ignore
1033
1034@smallexample
1035# /etc/services:
1036#
1037# Network services, Internet style
1038#
1039# Name Number/Protcol Alternate name # Comments
1040
1041echo 7/tcp
1042echo 7/udp
1043discard 9/tcp sink null
1044discard 9/udp sink null
1045daytime 13/tcp
1046daytime 13/udp
1047chargen 19/tcp ttytst source
1048chargen 19/udp ttytst source
1049ftp 21/tcp
1050telnet 23/tcp
1051smtp 25/tcp mail
1052finger 79/tcp
1053www 80/tcp http # WorldWideWeb HTTP
1054www 80/udp # HyperText Transfer Protocol
1055pop-2 109/tcp postoffice # POP version 2
1056pop-2 109/udp
1057pop-3 110/tcp # POP version 3
1058pop-3 110/udp
1059nntp 119/tcp readnews untp # USENET News
1060irc 194/tcp # Internet Relay Chat
1061irc 194/udp
1062@dots{}
1063@end smallexample
1064
1065@cindex Linux
1066@cindex GNU/Linux
1067@cindex Microsoft Windows, networking
1068Here, you find a list of services that traditional Unix machines usually
1069support. If your GNU/Linux machine does not do so, it may be that these
1070services are switched off in some startup script. Systems running some
1071flavor of Microsoft Windows usually do @emph{not} support these services.
1072Nevertheless, it @emph{is} possible to do networking with @command{gawk} on
1073Microsoft
1074Windows.@footnote{Microsoft prefered to ignore the TCP/IP
1075family of protocols until 1995. Then came the rise of the Netscape browser
1076as a landmark ``killer application.'' Microsoft added TCP/IP support and
1077their own browser to Microsoft Windows 95 at the last minute. They even back-ported
1078their TCP/IP implementation to Microsoft Windows for Workgroups 3.11, but it was
1079a rather rudimentary and half-hearted implementation. Nevertheless,
1080the equivalent of @file{/etc/services} resides under
1081@file{C:\WINNT\system32\drivers\etc\services} on Microsoft Windows 2000.}
1082The first column of the file gives the name of the service, and
1083the second column gives a unique number and the protocol that one can use to connect to
1084this service.
1085The rest of the line is treated as a comment.
1086You see that some services (@samp{echo}) support TCP as
1087well as UDP.
1088
1089@node Interacting, Setting Up, Troubleshooting, Using Networking
1090@section Interacting with a Network Service
1091
1092The next program makes use of the possibility to really interact with a
1093network service by printing something into the special file. It asks the
1094so-called @command{finger} service if a user of the machine is logged in. When
1095testing this program, try to change @samp{localhost} to
1096some other machine name in your local network:
1097
1098@c system if test ! -d eg ; then mkdir eg ; fi
1099@c system if test ! -d eg/network ; then mkdir eg/network ; fi
1100@example
1101@c file eg/network/fingerclient.awk
1102BEGIN @{
1103 NetService = "/inet/tcp/0/localhost/finger"
1104 print "@var{name}" |& NetService
1105 while ((NetService |& getline) > 0)
1106 print $0
1107 close(NetService)
1108@}
1109@c endfile
1110@end example
1111
1112After telling the service on the machine which user to look for,
1113the program repeatedly reads lines that come as a reply. When no more
1114lines are coming (because the service has closed the connection), the
1115program also closes the connection. Try replacing @code{"@var{name}"} with your
1116login name (or the name of someone else logged in). For a list
1117of all users currently logged in, replace @var{name} with an empty string
1118(@code{""}).
1119
1120@cindex Linux
1121@cindex GNU/Linux
1122The final @code{close} command could be safely deleted from
1123the above script, because the operating system closes any open connection
1124by default when a script reaches the end of execution. In order to avoid
1125portability problems, it is best to always close connections explicitly.
1126With the Linux kernel,
1127for example, proper closing results in flushing of buffers. Letting
1128the close happen by default may result in discarding buffers.
1129
1130@ignore
1131@c Chuck comments that this seems out of place. He's right. I dunno
1132@c where to put it though.
1133@cindex @command{finger} utility
1134@cindex RFC 1288
1135In the early days of the Internet (up until about 1992), you could use
1136such a program to check if some user in another country was logged in on
1137a specific machine.
1138RFC 1288@footnote{@uref{http://www.cis.ohio-state.edu/htbin/rfc/rfc1288.html}}
1139provides the exact definition of the @command{finger} protocol.
1140Every contemporary Unix system also has a command named @command{finger},
1141which functions as a client for the protocol of the same name.
1142Still today, some people maintain simple information systems
1143with this ancient protocol. For example, by typing
1144@samp{finger quake@@seismo.unr.edu}
1145you get the latest @dfn{Earthquake Bulletin} for the state of Nevada.
1146
1147@cindex Earthquake Bulletin
1148@smallexample
1149$ finger quake@@seismo.unr.edu
1150
1151[@dots{}]
1152
1153DATE-(UTC)-TIME LAT LON DEP MAG COMMENTS
1154yy/mm/dd hh:mm:ss deg. deg. km
1155
115698/12/14 21:09:22 37.47N 116.30W 0.0 2.3Md 76.4 km S of WARM SPRINGS, NEVA
115798/12/14 22:05:09 39.69N 120.41W 11.9 2.1Md 53.8 km WNW of RENO, NEVADA
115898/12/15 14:14:19 38.04N 118.60W 2.0 2.3Md 51.0 km S of HAWTHORNE, NEVADA
115998/12/17 01:49:02 36.06N 117.58W 13.9 3.0Md 74.9 km SE of LONE PINE, CALIFOR
116098/12/17 05:39:26 39.95N 120.87W 6.2 2.6Md 101.6 km WNW of RENO, NEVADA
116198/12/22 06:07:42 38.68N 119.82W 5.2 2.3Md 50.7 km S of CARSON CITY, NEVAD
1162@end smallexample
1163
1164@noindent
1165This output from @command{finger} contains the time, location, depth,
1166magnitude, and a short comment about
1167the earthquakes registered in that region during the last 10 days.
1168In many places today the use of such services is restricted
1169because most networks have firewalls and proxy servers between them
1170and the Internet. Most firewalls are programmed to not let
1171@command{finger} requests go beyond the local network.
1172
1173@cindex Coke machine
1174Another (ab)use of the @command{finger} protocol are several Coke machines
1175that are connected to the Internet. There is a short list of such
1176Coke machines.@footnote{@uref{http://ca.yahoo.com/Computers_and_Internet/Internet/Devices_Connected_to_the_Internet/Soda_Machines/}}
1177You can access them either from the command-line or with a simple
1178@command{gawk} script. They usually tell you about the different
1179flavors of Coke and beer available there. If you have an account there,
1180you can even order some drink this way.
1181@end ignore
1182
1183When looking at @file{/etc/services} you may have noticed that the
1184@samp{daytime} service is also available with @samp{udp}. In the earlier
1185example, change @samp{tcp} to @samp{udp},
1186and change @samp{finger} to @samp{daytime}.
1187After starting the modified program, you see the expected day and time message.
1188The program then hangs, because it waits for more lines coming from the
1189service. However, they never come. This behavior is a consequence of the
1190differences between TCP and UDP. When using UDP, neither party is
1191automatically informed about the other closing the connection.
1192Continuing to experiment this way reveals many other subtle
1193differences between TCP and UDP. To avoid such trouble, one should always
1194remember the advice Douglas E.@: Comer and David Stevens give in
1195Volume III of their series @cite{Internetworking With TCP}
1196(page 14):
1197
1198@cindex TCP (Transmission Control Protocol), UDP and
1199@cindex UDP (User Datagram Protocol), TCP and
1200@cindex Internet, See networks
1201@quotation
1202When designing client-server applications, beginners are strongly
1203advised to use TCP because it provides reliable, connection-oriented
1204communication. Programs only use UDP if the application protocol handles
1205reliability, the application requires hardware broadcast or multicast,
1206or the application cannot tolerate virtual circuit overhead.
1207@end quotation
1208
1209@node Setting Up, Email, Interacting, Using Networking
1210@section Setting Up a Service
1211@c last comma is part of tertiary
1212@cindex networks, @command{gawk} and, service, establishing
1213@c last comma is part of tertiary
1214@cindex @command{gawk}, networking, service, establishing
1215The preceding programs behaved as clients that connect to a server somewhere
1216on the Internet and request a particular service. Now we set up such a
1217service to mimic the behavior of the @samp{daytime} service.
1218Such a server does not know in advance who is going to connect to it over
1219the network. Therefore, we cannot insert a name for the host to connect to
1220in our special @value{FN}.
1221
1222Start the following program in one window. Notice that the service does
1223not have the name @samp{daytime}, but the number @samp{8888}.
1224From looking at @file{/etc/services}, you know that names like @samp{daytime}
1225are just mnemonics for predetermined 16-bit integers.
1226Only the system administrator (@code{root}) could enter
1227our new service into @file{/etc/services} with an appropriate name.
1228Also notice that the service name has to be entered into a different field
1229of the special @value{FN} because we are setting up a server, not a client:
1230
1231@cindex @command{finger} utility
1232@cindex servers
1233@example
1234BEGIN @{
1235 print strftime() |& "/inet/tcp/8888/0/0"
1236 close("/inet/tcp/8888/0/0")
1237@}
1238@end example
1239
1240Now open another window on the same machine.
1241Copy the client program given as the first example
1242(@pxref{TCP Connecting, ,Establishing a TCP Connection})
1243to a new file and edit it, changing the name @samp{daytime} to
1244@samp{8888}. Then start the modified client. You should get a reply
1245like this:
1246
1247@example
1248Sat Sep 27 19:08:16 CEST 1997
1249@end example
1250
1251@noindent
1252Both programs explicitly close the connection.
1253
1254@c first comma is part of primary
1255@cindex Microsoft Windows, networking, ports
1256@cindex networks, ports, reserved
1257@cindex Unix, network ports and
1258Now we will intentionally make a mistake to see what happens when the name
1259@samp{8888} (the so-called port) is already used by another service.
1260Start the server
1261program in both windows. The first one works, but the second one
1262complains that it could not open the connection. Each port on a single
1263machine can only be used by one server program at a time. Now terminate the
1264server program and change the name @samp{8888} to @samp{echo}. After restarting it,
1265the server program does not run any more, and you know why: there is already
1266an @samp{echo} service running on your machine. But even if this isn't true,
1267you would not get
1268your own @samp{echo} server running on a Unix machine,
1269because the ports with numbers smaller
1270than 1024 (@samp{echo} is at port 7) are reserved for @code{root}.
1271On machines running some flavor of Microsoft Windows, there is no restriction
1272that reserves ports 1 to 1024 for a privileged user; hence, you can start
1273an @samp{echo} server there.
1274
1275Turning this short server program into something really useful is simple.
1276Imagine a server that first reads a @value{FN} from the client through the
1277network connection, then does something with the file and
1278sends a result back to the client. The server-side processing
1279could be:
1280
1281@example
1282BEGIN @{
1283 NetService = "/inet/tcp/8888/0/0"
1284 NetService |& getline
1285 CatPipe = ("cat " $1) # sets $0 and the fields
1286 while ((CatPipe | getline) > 0)
1287 print $0 |& NetService
1288 close(NetService)
1289@}
1290@end example
1291
1292@noindent
1293and we would
1294have a remote copying facility. Such a server reads the name of a file
1295from any client that connects to it and transmits the contents of the
1296named file across the net. The server-side processing could also be
1297the execution of a command that is transmitted across the network. From this
1298example, you can see how simple it is to open up a security hole on your
1299machine. If you allow clients to connect to your machine and
1300execute arbitrary commands, anyone would be free to do @samp{rm -rf *}.
1301
1302@node Email, Web page, Setting Up, Using Networking
1303@section Reading Email
1304@c @cindex RFC 1939
1305@c @cindex RFC 821
1306@cindex @command{gawk}, networking, See Also email
1307@cindex networks, @command{gawk} and, See Also email
1308@cindex POP (Post Office Protocol)
1309@cindex SMTP (Simple Mail Transfer Protocol)
1310@cindex Post Office Protocol (POP)
1311@cindex Simple Mail Transfer Protocol (SMTP)
1312The distribution of email is usually done by dedicated email servers that
1313communicate with your machine using special protocols. To receive email, we
1314will use the Post Office Protocol (POP). Sending can be done with the much
1315older Simple Mail Transfer Protocol (SMTP).
1316@ignore
1317@footnote{RFC 1939 defines POP.
1318RFC 821 defines SMTP. See
1319@uref{http://rfc.fh-koeln.de/doc/rfc/html/rfc.html, RFCs in HTML}.}
1320@end ignore
1321
1322@cindex email
1323When you type in the following program, replace the @var{emailhost} by the
1324name of your local email server. Ask your administrator if the server has a
1325POP service, and then use its name or number in the program below.
1326Now the program is ready to connect to your email server, but it will not
1327succeed in retrieving your mail because it does not yet know your login
1328name or password. Replace them in the program and it
1329shows you the first email the server has in store:
1330
1331@example
1332BEGIN @{
1333 POPService = "/inet/tcp/0/@var{emailhost}/pop3"
1334 RS = ORS = "\r\n"
1335 print "user @var{name}" |& POPService
1336 POPService |& getline
1337 print "pass @var{password}" |& POPService
1338 POPService |& getline
1339 print "retr 1" |& POPService
1340 POPService |& getline
1341 if ($1 != "+OK") exit
1342 print "quit" |& POPService
1343 RS = "\r\n\\.\r\n"
1344 POPService |& getline
1345 print $0
1346 close(POPService)
1347@}
1348@end example
1349
1350@c @cindex RFC 1939
1351@cindex record separators, POP and
1352@cindex @code{RS} variable, POP and
1353@cindex @code{ORS} variable, POP and
1354@cindex POP (Post Office Protocol)
1355The record separators @code{RS} and @code{ORS} are redefined because the
1356protocol (POP) requires CR-LF to separate lines. After identifying
1357yourself to the email service, the command @samp{retr 1} instructs the
1358service to send the first of all your email messages in line. If the service
1359replies with something other than @samp{+OK}, the program exits; maybe there
1360is no email. Otherwise, the program first announces that it intends to finish
1361reading email, and then redefines @code{RS} in order to read the entire
1362email as multiline input in one record. From the POP RFC, we know that the body
1363of the email always ends with a single line containing a single dot.
1364The program looks for this using @samp{RS = "\r\n\\.\r\n"}.
1365When it finds this sequence in the mail message, it quits.
1366You can invoke this program as often as you like; it does not delete the
1367message it reads, but instead leaves it on the server.
1368
1369@node Web page, Primitive Service, Email, Using Networking
1370@section Reading a Web Page
1371@cindex web pages
1372@cindex HTTP (Hypertext Transfer Protocol)
1373@cindex Hypertext Transfer Protocol, See HTTP
1374@c @cindex RFC 2068
1375@c @cindex RFC 2616
1376
1377Retrieving a web page from a web server is as simple as
1378retrieving email from an email server. We only have to use a
1379similar, but not identical, protocol and a different port. The name of the
1380protocol is HyperText Transfer Protocol (HTTP) and the port number is usually
138180. As in the preceding @value{SECTION}, ask your administrator about the
1382name of your local web server or proxy web server and its port number
1383for HTTP requests.
1384
1385@ignore
1386@c Chuck says this stuff isn't necessary
1387More detailed information about HTTP can be found at
1388the home of the web protocols,@footnote{@uref{http://www.w3.org/pub/WWW/Protocols}}
1389including the specification of HTTP in RFC 2068. The protocol specification
1390in RFC 2068 is concise and you can get it for free. If you need more
1391explanation and you are willing to pay for a book, you might be
1392interested in one of these books:
1393
1394@enumerate
1395
1396@item
1397When we started writing web clients and servers with @command{gawk},
1398the only book available with details about HTTP was the one by Paul Hethmon
1399called
1400@cite{Illustrated Guide to HTTP}.@footnote{@uref{http://www.browsebooks.com/Hethmon/?882}}
1401Hethmon not only describes HTTP,
1402he also implements a simple web server in C++.
1403
1404@item
1405Since July 2000, O'Reilly offers the book by Clinton Wong called
1406@cite{HTTP Pocket Reference}.@footnote{@uref{http://www.oreilly.com/catalog/httppr}}
1407It only has 75 pages but its
1408focus definitely is HTTP. This pocket reference is not a replacement
1409for the RFC, but I wish I had had it back in 1997 when I started writing
1410scripts to handle HTTP.
1411
1412@item
1413Another small booklet about HTTP is the one by Toexcell Incorporated Staff,
1414ISBN 1-58348-270-9, called
1415@cite{Hypertext Transfer Protocol Http 1.0 Specifications}
1416
1417@end enumerate
1418@end ignore
1419
1420The following program employs a rather crude approach toward retrieving a
1421web page. It uses the prehistoric syntax of HTTP 0.9, which almost all
1422web servers still support. The most noticeable thing about it is that the
1423program directs the request to the local proxy server whose name you insert
1424in the special @value{FN} (which in turn calls @samp{www.yahoo.com}):
1425
1426@example
1427BEGIN @{
1428 RS = ORS = "\r\n"
1429 HttpService = "/inet/tcp/0/@var{proxy}/80"
1430 print "GET http://www.yahoo.com" |& HttpService
1431 while ((HttpService |& getline) > 0)
1432 print $0
1433 close(HttpService)
1434@}
1435@end example
1436
1437@c @cindex RFC 1945
1438@cindex record separators, HTTP and
1439@cindex @code{RS} variable, HTTP and
1440@cindex @code{ORS} variable, HTTP and
1441@cindex HTTP (Hypertext Transfer Protocol), record separators and
1442@cindex HTML (Hypertext Markup Language)
1443@cindex Hypertext Markup Language (HTML)
1444Again, lines are separated by a redefined @code{RS} and @code{ORS}.
1445The @code{GET} request that we send to the server is the only kind of
1446HTTP request that existed when the web was created in the early 1990s.
1447HTTP calls this @code{GET} request a ``method,'' which tells the
1448service to transmit a web page (here the home page of the Yahoo! search
1449engine). Version 1.0 added the request methods @code{HEAD} and
1450@code{POST}. The current version of HTTP is 1.1,@footnote{Version 1.0 of
1451HTTP was defined in RFC 1945. HTTP 1.1 was initially specified in RFC
14522068. In June 1999, RFC 2068 was made obsolete by RFC 2616, an update
1453without any substantial changes.} and knows the additional request
1454methods @code{OPTIONS}, @code{PUT}, @code{DELETE}, and @code{TRACE}.
1455You can fill in any valid web address, and the program prints the
1456HTML code of that page to your screen.
1457
1458Notice the similarity between the responses of the POP and HTTP
1459services. First, you get a header that is terminated by an empty line, and
1460then you get the body of the page in HTML. The lines of the headers also
1461have the same form as in POP. There is the name of a parameter,
1462then a colon, and finally the value of that parameter.
1463
1464@cindex CGI (Common Gateway Interface), dynamic web pages and
1465@cindex Common Gateway Interface, See CGI
1466@cindex GIF image format
1467@cindex PNG image format
1468@cindex images, retrieving over networks
1469Images (@file{.png} or @file{.gif} files) can also be retrieved this way,
1470but then you
1471get binary data that should be redirected into a file. Another
1472application is calling a CGI (Common Gateway Interface) script on some
1473server. CGI scripts are used when the contents of a web page are not
1474constant, but generated instantly at the moment you send a request
1475for the page. For example, to get a detailed report about the current
1476quotes of Motorola stock shares, call a CGI script at Yahoo! with
1477the following:
1478
1479@example
1480get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
1481print get |& HttpService
1482@end example
1483
1484You can also request weather reports this way.
1485@ignore
1486@cindex Boutell, Thomas
1487A good book to go on with is
1488the
1489@cite{HTML Source Book}.@footnote{@uref{http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/book.html}}
1490There are also some books on CGI programming
1491like @cite{CGI Programming in C & Perl},
1492by Thomas Boutell@footnote{@uref{http://cseng.aw.com/bookdetail.qry?ISBN=0-201-42219-0&ptype=0}},
1493and @cite{The CGI Book}.@footnote{@uref{http://www.cgibook.com}}
1494Another good source is @cite{The CGI Resource Index}}.@footnote{@uref{http://www.cgi-resources.com}}
1495@end ignore
1496
1497@node Primitive Service, Interacting Service, Web page, Using Networking
1498@section A Primitive Web Service
1499@c STARTOFRANGE webser
1500@cindex web service
1501Now we know enough about HTTP to set up a primitive web service that just
1502says @code{"Hello, world"} when someone connects to it with a browser.
1503Compared
1504to the situation in the preceding @value{SECTION}, our program changes the role. It
1505tries to behave just like the server we have observed. Since we are setting
1506up a server here, we have to insert the port number in the @samp{localport}
1507field of the special @value{FN}. The other two fields (@var{hostname} and
1508@var{remoteport}) have to contain a @samp{0} because we do not know in
1509advance which host will connect to our service.
1510
1511In the early 1990s, all a server had to do was send an HTML document and
1512close the connection. Here, we adhere to the modern syntax of HTTP.
1513The steps are as follows:
1514
1515@enumerate 1
1516@item
1517Send a status line telling the web browser that everything
1518is okay.
1519
1520@item
1521Send a line to tell the browser how many bytes follow in the
1522body of the message. This was not necessary earlier because both
1523parties knew that the document ended when the connection closed. Nowadays
1524it is possible to stay connected after the transmission of one web page.
1525This is to avoid the network traffic necessary for repeatedly establishing
1526TCP connections for requesting several images. Thus, there is the need to tell
1527the receiving party how many bytes will be sent. The header is terminated
1528as usual with an empty line.
1529
1530@item
1531Send the @code{"Hello, world"} body
1532in HTML.
1533The useless @code{while} loop swallows the request of the browser.
1534We could actually omit the loop, and on most machines the program would still
1535work.
1536First, start the following program:
1537@end enumerate
1538
1539@example
1540@c file eg/network/hello-serv.awk
1541BEGIN @{
1542 RS = ORS = "\r\n"
1543 HttpService = "/inet/tcp/8080/0/0"
1544 Hello = "<HTML><HEAD>" \
1545 "<TITLE>A Famous Greeting</TITLE></HEAD>" \
1546 "<BODY><H1>Hello, world</H1></BODY></HTML>"
1547 Len = length(Hello) + length(ORS)
1548 print "HTTP/1.0 200 OK" |& HttpService
1549 print "Content-Length: " Len ORS |& HttpService
1550 print Hello |& HttpService
1551 while ((HttpService |& getline) > 0)
1552 continue;
1553 close(HttpService)
1554@}
1555@c endfile
1556@end example
1557
1558Now, on the same machine, start your favorite browser and let it point to
1559@uref{http://localhost:8080} (the browser needs to know on which port
1560our server is listening for requests). If this does not work, the browser
1561probably tries to connect to a proxy server that does not know your machine.
1562If so, change the browser's configuration so that the browser does not try to
1563use a proxy to connect to your machine.
1564
1565@node Interacting Service, Simple Server, Primitive Service, Using Networking
1566@section A Web Service with Interaction
1567@cindex @command{gawk}, web and, See web service
1568@cindex web browsers, See web service
1569@c comma is part of primary
1570@cindex HTTP server, core logic
1571@cindex servers, HTTP
1572@ifinfo
1573This node shows how to set up a simple web server.
1574The subnode is a library file that we will use with all the examples in
1575@ref{Some Applications and Techniques}.
1576@end ifinfo
1577
1578@menu
1579* CGI Lib:: A simple CGI library.
1580@end menu
1581
1582Setting up a web service that allows user interaction is more difficult and
1583shows us the limits of network access in @command{gawk}. In this @value{SECTION},
1584we develop a main program (a @code{BEGIN} pattern and its action)
1585that will become the core of event-driven execution controlled by a
1586graphical user interface (GUI).
1587Each HTTP event that the user triggers by some action within the browser
1588is received in this central procedure. Parameters and menu choices are
1589extracted from this request, and an appropriate measure is taken according to
1590the user's choice.
1591For example:
1592
1593@cindex HTTP server, core logic
1594@example
1595BEGIN @{
1596 if (MyHost == "") @{
1597 "uname -n" | getline MyHost
1598 close("uname -n")
1599 @}
1600 if (MyPort == 0) MyPort = 8080
1601 HttpService = "/inet/tcp/" MyPort "/0/0"
1602 MyPrefix = "http://" MyHost ":" MyPort
1603 SetUpServer()
1604 while ("awk" != "complex") @{
1605 # header lines are terminated this way
1606 RS = ORS = "\r\n"
1607 Status = 200 # this means OK
1608 Reason = "OK"
1609 Header = TopHeader
1610 Document = TopDoc
1611 Footer = TopFooter
1612 if (GETARG["Method"] == "GET") @{
1613 HandleGET()
1614 @} else if (GETARG["Method"] == "HEAD") @{
1615 # not yet implemented
1616 @} else if (GETARG["Method"] != "") @{
1617 print "bad method", GETARG["Method"]
1618 @}
1619 Prompt = Header Document Footer
1620 print "HTTP/1.0", Status, Reason |& HttpService
1621 print "Connection: Close" |& HttpService
1622 print "Pragma: no-cache" |& HttpService
1623 len = length(Prompt) + length(ORS)
1624 print "Content-length:", len |& HttpService
1625 print ORS Prompt |& HttpService
1626 # ignore all the header lines
1627 while ((HttpService |& getline) > 0)
1628 ;
1629 # stop talking to this client
1630 close(HttpService)
1631 # wait for new client request
1632 HttpService |& getline
1633 # do some logging
1634 print systime(), strftime(), $0
1635 # read request parameters
1636 CGI_setup($1, $2, $3)
1637 @}
1638@}
1639@end example
1640
1641This web server presents menu choices in the form of HTML links.
1642Therefore, it has to tell the browser the name of the host it is
1643residing on. When starting the server, the user may supply the name
1644of the host from the command line with @samp{gawk -v MyHost="Rumpelstilzchen"}.
1645If the user does not do this, the server looks up the name of the host it is
1646running on for later use as a web address in HTML documents. The same
1647applies to the port number. These values are inserted later into the
1648HTML content of the web pages to refer to the home system.
1649
1650Each server that is built around this core has to initialize some
1651application-dependent variables (such as the default home page) in a procedure
1652@code{SetUpServer}, which is called immediately before entering the
1653infinite loop of the server. For now, we will write an instance that
1654initiates a trivial interaction. With this home page, the client user
1655can click on two possible choices, and receive the current date either
1656in human-readable format or in seconds since 1970:
1657
1658@example
1659function SetUpServer() @{
1660 TopHeader = "<HTML><HEAD>"
1661 TopHeader = TopHeader \
1662 "<title>My name is GAWK, GNU AWK</title></HEAD>"
1663 TopDoc = "<BODY><h2>\
1664 Do you prefer your date <A HREF=" MyPrefix \
1665 "/human>human</A> or \
1666 <A HREF=" MyPrefix "/POSIX>POSIXed</A>?</h2>" ORS ORS
1667 TopFooter = "</BODY></HTML>"
1668@}
1669@end example
1670
1671On the first run through the main loop, the default line terminators are
1672set and the default home page is copied to the actual home page. Since this
1673is the first run, @code{GETARG["Method"]} is not initialized yet, hence the
1674case selection over the method does nothing. Now that the home page is
1675initialized, the server can start communicating to a client browser.
1676
1677@c @cindex RFC 2068
1678It does so by printing the HTTP header into the network connection
1679(@samp{print @dots{} |& HttpService}). This command blocks execution of
1680the server script until a client connects. If this server
1681script is compared with the primitive one we wrote before, you will notice
1682two additional lines in the header. The first instructs the browser
1683to close the connection after each request. The second tells the
1684browser that it should never try to @emph{remember} earlier requests
1685that had identical web addresses (no caching). Otherwise, it could happen
1686that the browser retrieves the time of day in the previous example just once,
1687and later it takes the web page from the cache, always displaying the same
1688time of day although time advances each second.
1689
1690Having supplied the initial home page to the browser with a valid document
1691stored in the parameter @code{Prompt}, it closes the connection and waits
1692for the next request. When the request comes, a log line is printed that
1693allows us to see which request the server receives. The final step in the
1694loop is to call the function @code{CGI_setup}, which reads all the lines
1695of the request (coming from the browser), processes them, and stores the
1696transmitted parameters in the array @code{PARAM}. The complete
1697text of these application-independent functions can be found in
1698@ref{CGI Lib, ,A Simple CGI Library}.
1699For now, we use a simplified version of @code{CGI_setup}:
1700
1701@example
1702function CGI_setup( method, uri, version, i) @{
1703 delete GETARG; delete MENU; delete PARAM
1704 GETARG["Method"] = $1
1705 GETARG["URI"] = $2
1706 GETARG["Version"] = $3
1707 i = index($2, "?")
1708 # is there a "?" indicating a CGI request?
1709@group
1710 if (i > 0) @{
1711 split(substr($2, 1, i-1), MENU, "[/:]")
1712 split(substr($2, i+1), PARAM, "&")
1713 for (i in PARAM) @{
1714 j = index(PARAM[i], "=")
1715 GETARG[substr(PARAM[i], 1, j-1)] = \
1716 substr(PARAM[i], j+1)
1717 @}
1718 @} else @{ # there is no "?", no need for splitting PARAMs
1719 split($2, MENU, "[/:]")
1720 @}
1721@end group
1722@}
1723@end example
1724
1725At first, the function clears all variables used for
1726global storage of request parameters. The rest of the function serves
1727the purpose of filling the global parameters with the extracted new values.
1728To accomplish this, the name of the requested resource is split into
1729parts and stored for later evaluation. If the request contains a @samp{?},
1730then the request has CGI variables seamlessly appended to the web address.
1731Everything in front of the @samp{?} is split up into menu items, and
1732everything behind the @samp{?} is a list of @samp{@var{variable}=@var{value}} pairs
1733(separated by @samp{&}) that also need splitting. This way, CGI variables are
1734isolated and stored. This procedure lacks recognition of special characters
1735that are transmitted in coded form@footnote{As defined in RFC 2068.}. Here, any
1736optional request header and body parts are ignored. We do not need
1737header parameters and the request body. However, when refining our approach or
1738working with the @code{POST} and @code{PUT} methods, reading the header
1739and body
1740becomes inevitable. Header parameters should then be stored in a global
1741array as well as the body.
1742
1743On each subsequent run through the main loop, one request from a browser is
1744received, evaluated, and answered according to the user's choice. This can be
1745done by letting the value of the HTTP method guide the main loop into
1746execution of the procedure @code{HandleGET}, which evaluates the user's
1747choice. In this case, we have only one hierarchical level of menus,
1748but in the general case,
1749menus are nested.
1750The menu choices at each level are
1751separated by @samp{/}, just as in @value{FN}s. Notice how simple it is to
1752construct menus of arbitrary depth:
1753
1754@example
1755function HandleGET() @{
1756 if ( MENU[2] == "human") @{
1757 Footer = strftime() TopFooter
1758 @} else if (MENU[2] == "POSIX") @{
1759 Footer = systime() TopFooter
1760 @}
1761@}
1762@end example
1763
1764The disadvantage of this approach is that our server is slow and can
1765handle only one request at a time. Its main advantage, however, is that
1766the server
1767consists of just one @command{gawk} program. No need for installing an
1768@command{httpd}, and no need for static separate HTML files, CGI scripts, or
1769@code{root} privileges. This is rapid prototyping.
1770This program can be started on the same host that runs your browser.
1771Then let your browser point to @uref{http://localhost:8080}.
1772
1773@cindex XBM image format
1774@cindex images, in web pages
1775@cindex web pages, images in
1776@cindex GNUPlot utility
1777It is also possible to include images into the HTML pages.
1778Most browsers support the not very well-known
1779@file{.xbm} format,
1780which may contain only
1781monochrome pictures but is an ASCII format. Binary images are possible but
1782not so easy to handle. Another way of including images is to generate them
1783with a tool such as GNUPlot,
1784by calling the tool with the @code{system} function or through a pipe.
1785
1786@node CGI Lib, , Interacting Service, Interacting Service
1787@subsection A Simple CGI Library
1788@quotation
1789@i{HTTP is like being married: you have to be able to handle whatever
1790you're given, while being very careful what you send back.}@*
1791Phil Smith III,@*
1792@uref{http://www.netfunny.com/rhf/jokes/99/Mar/http.html}
1793@end quotation
1794
1795@c STARTOFRANGE cgilib
1796@cindex CGI (Common Gateway Interface), library
1797In @ref{Interacting Service, ,A Web Service with Interaction},
1798we saw the function @code{CGI_setup} as part of the web server
1799``core logic'' framework. The code presented there handles almost
1800everything necessary for CGI requests.
1801One thing it doesn't do is handle encoded characters in the requests.
1802For example, an @samp{&} is encoded as a percent sign followed by
1803the hexadecimal value: @samp{%26}. These encoded values should be
1804decoded.
1805Following is a simple library to perform these tasks.
1806This code is used for all web server examples
1807used throughout the rest of this @value{DOCUMENT}.
1808If you want to use it for your own web server, store the source code
1809into a file named @file{inetlib.awk}. Then you can include
1810these functions into your code by placing the following statement
1811into your program
1812(on the first line of your script):
1813
1814@example
1815@@include inetlib.awk
1816@end example
1817
1818@noindent
1819But beware, this mechanism is
1820only possible if you invoke your web server script with @command{igawk}
1821instead of the usual @command{awk} or @command{gawk}.
1822Here is the code:
1823
1824@example
1825@c file eg/network/coreserv.awk
1826# CGI Library and core of a web server
1827@c endfile
1828@ignore
1829@c file eg/network/coreserv.awk
1830#
1831# Juergen Kahrs, Juergen.Kahrs@@vr-web.de
1832# with Arnold Robbins, arnold@@gnu.org
1833# September 2000
1834
1835@c endfile
1836@end ignore
1837@c file eg/network/coreserv.awk
1838# Global arrays
1839# GETARG --- arguments to CGI GET command
1840# MENU --- menu items (path names)
1841# PARAM --- parameters of form x=y
1842
1843# Optional variable MyHost contains host address
1844# Optional variable MyPort contains port number
1845# Needs TopHeader, TopDoc, TopFooter
1846# Sets MyPrefix, HttpService, Status, Reason
1847
1848BEGIN @{
1849 if (MyHost == "") @{
1850 "uname -n" | getline MyHost
1851 close("uname -n")
1852 @}
1853 if (MyPort == 0) MyPort = 8080
1854 HttpService = "/inet/tcp/" MyPort "/0/0"
1855 MyPrefix = "http://" MyHost ":" MyPort
1856 SetUpServer()
1857 while ("awk" != "complex") @{
1858 # header lines are terminated this way
1859 RS = ORS = "\r\n"
1860 Status = 200 # this means OK
1861 Reason = "OK"
1862 Header = TopHeader
1863 Document = TopDoc
1864 Footer = TopFooter
1865 if (GETARG["Method"] == "GET") @{
1866 HandleGET()
1867 @} else if (GETARG["Method"] == "HEAD") @{
1868 # not yet implemented
1869 @} else if (GETARG["Method"] != "") @{
1870 print "bad method", GETARG["Method"]
1871 @}
1872 Prompt = Header Document Footer
1873 print "HTTP/1.0", Status, Reason |& HttpService
1874 print "Connection: Close" |& HttpService
1875 print "Pragma: no-cache" |& HttpService
1876 len = length(Prompt) + length(ORS)
1877 print "Content-length:", len |& HttpService
1878 print ORS Prompt |& HttpService
1879 # ignore all the header lines
1880 while ((HttpService |& getline) > 0)
1881 continue
1882 # stop talking to this client
1883 close(HttpService)
1884 # wait for new client request
1885 HttpService |& getline
1886 # do some logging
1887 print systime(), strftime(), $0
1888 CGI_setup($1, $2, $3)
1889 @}
1890@}
1891
1892function CGI_setup( method, uri, version, i)
1893@{
1894 delete GETARG
1895 delete MENU
1896 delete PARAM
1897 GETARG["Method"] = method
1898 GETARG["URI"] = uri
1899 GETARG["Version"] = version
1900
1901 i = index(uri, "?")
1902 if (i > 0) @{ # is there a "?" indicating a CGI request?
1903 split(substr(uri, 1, i-1), MENU, "[/:]")
1904 split(substr(uri, i+1), PARAM, "&")
1905 for (i in PARAM) @{
1906 PARAM[i] = _CGI_decode(PARAM[i])
1907 j = index(PARAM[i], "=")
1908 GETARG[substr(PARAM[i], 1, j-1)] = \
1909 substr(PARAM[i], j+1)
1910 @}
1911 @} else @{ # there is no "?", no need for splitting PARAMs
1912 split(uri, MENU, "[/:]")
1913 @}
1914 for (i in MENU) # decode characters in path
1915 if (i > 4) # but not those in host name
1916 MENU[i] = _CGI_decode(MENU[i])
1917@}
1918@c endfile
1919@end example
1920
1921This isolates details in a single function, @code{CGI_setup}.
1922Decoding of encoded characters is pushed off to a helper function,
1923@code{_CGI_decode}. The use of the leading underscore (@samp{_}) in
1924the function name is intended to indicate that it is an ``internal''
1925function, although there is nothing to enforce this:
1926
1927@example
1928@c file eg/network/coreserv.awk
1929function _CGI_decode(str, hexdigs, i, pre, code1, code2,
1930 val, result)
1931@{
1932 hexdigs = "123456789abcdef"
1933
1934 i = index(str, "%")
1935 if (i == 0) # no work to do
1936 return str
1937
1938 do @{
1939 pre = substr(str, 1, i-1) # part before %xx
1940 code1 = substr(str, i+1, 1) # first hex digit
1941 code2 = substr(str, i+2, 1) # second hex digit
1942 str = substr(str, i+3) # rest of string
1943
1944 code1 = tolower(code1)
1945 code2 = tolower(code2)
1946 val = index(hexdigs, code1) * 16 \
1947 + index(hexdigs, code2)
1948
1949 result = result pre sprintf("%c", val)
1950 i = index(str, "%")
1951 @} while (i != 0)
1952 if (length(str) > 0)
1953 result = result str
1954 return result
1955@}
1956@c endfile
1957@end example
1958
1959This works by splitting the string apart around an encoded character.
1960The two digits are converted to lowercase characters and looked up in a string
1961of hex digits. Note that @code{0} is not in the string on purpose;
1962@code{index} returns zero when it's not found, automatically giving
1963the correct value! Once the hexadecimal value is converted from
1964characters in a string into a numerical value, @code{sprintf}
1965converts the value back into a real character.
1966The following is a simple test harness for the above functions:
1967
1968@example
1969@c file eg/network/testserv.awk
1970BEGIN @{
1971 CGI_setup("GET",
1972 "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
1973 "&percent=a %25 sign",
1974 "1.0")
1975 for (i in MENU)
1976 printf "MENU[\"%s\"] = %s\n", i, MENU[i]
1977 for (i in PARAM)
1978 printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
1979 for (i in GETARG)
1980 printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
1981@}
1982@c endfile
1983@end example
1984
1985And this is the result when we run it:
1986
1987@c artificial line wrap in last output line
1988@example
1989$ gawk -f testserv.awk
1990@print{} MENU["4"] = www.gnu.org
1991@print{} MENU["5"] = cgi-bin
1992@print{} MENU["6"] = foo
1993@print{} MENU["1"] = http
1994@print{} MENU["2"] =
1995@print{} MENU["3"] =
1996@print{} PARAM["1"] = p1=stuff
1997@print{} PARAM["2"] = p2=stuff&junk
1998@print{} PARAM["3"] = percent=a % sign
1999@print{} GETARG["p1"] = stuff
2000@print{} GETARG["percent"] = a % sign
2001@print{} GETARG["p2"] = stuff&junk
2002@print{} GETARG["Method"] = GET
2003@print{} GETARG["Version"] = 1.0
2004@print{} GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
2005p2=stuff%26junk&percent=a %25 sign
2006@end example
2007
2008@node Simple Server, Caveats, Interacting Service, Using Networking
2009@section A Simple Web Server
2010@c STARTOFRANGE webserx
2011@cindex web servers
2012@c STARTOFRANGE serweb
2013@cindex servers, web
2014In the preceding @value{SECTION}, we built the core logic for event-driven GUIs.
2015In this @value{SECTION}, we finally extend the core to a real application.
2016No one would actually write a commercial web server in @command{gawk}, but
2017it is instructive to see that it is feasible in principle.
2018
2019@cindex ELIZA program
2020@cindex Weizenbaum, Joseph
2021The application is ELIZA, the famous program by Joseph Weizenbaum that
2022mimics the behavior of a professional psychotherapist when talking to you.
2023Weizenbaum would certainly object to this description, but this is part of
2024the legend around ELIZA.
2025Take the site-independent core logic and append the following code:
2026
2027@example
2028@c file eg/network/eliza.awk
2029function SetUpServer() @{
2030 SetUpEliza()
2031 TopHeader = \
2032 "<HTML><title>An HTTP-based System with GAWK</title>\
2033 <HEAD><META HTTP-EQUIV=\"Content-Type\"\
2034 CONTENT=\"text/html; charset=iso-8859-1\"></HEAD>\
2035 <BODY BGCOLOR=\"#ffffff\" TEXT=\"#000000\"\
2036 LINK=\"#0000ff\" VLINK=\"#0000ff\"\
2037 ALINK=\"#0000ff\"> <A NAME=\"top\">"
2038 TopDoc = "\
2039 <h2>Please choose one of the following actions:</h2>\
2040 <UL>\
2041 <LI>\
2042 <A HREF=" MyPrefix "/AboutServer>About this server</A>\
2043 </LI><LI>\
2044 <A HREF=" MyPrefix "/AboutELIZA>About Eliza</A></LI>\
2045 <LI>\
2046 <A HREF=" MyPrefix \
2047 "/StartELIZA>Start talking to Eliza</A></LI></UL>"
2048 TopFooter = "</BODY></HTML>"
2049@}
2050@c endfile
2051@end example
2052
2053@code{SetUpServer} is similar to the previous example,
2054except for calling another function, @code{SetUpEliza}.
2055This approach can be used to implement other kinds of servers.
2056The only changes needed to do so are hidden in the functions
2057@code{SetUpServer} and @code{HandleGET}. Perhaps it might be necessary to
2058implement other HTTP methods.
2059The @command{igawk} program that comes with @command{gawk}
2060may be useful for this process.
2061
2062When extending this example to a complete application, the first
2063thing to do is to implement the function @code{SetUpServer} to
2064initialize the HTML pages and some variables. These initializations
2065determine the way your HTML pages look (colors, titles, menu
2066items, etc.).
2067
2068The function @code{HandleGET} is a nested case selection that decides
2069which page the user wants to see next. Each nesting level refers to a menu
2070level of the GUI. Each case implements a certain action of the menu. On the
2071deepest level of case selection, the handler essentially knows what the
2072user wants and stores the answer into the variable that holds the HTML
2073page contents:
2074
2075@smallexample
2076@c file eg/network/eliza.awk
2077function HandleGET() @{
2078 # A real HTTP server would treat some parts of the URI as a file name.
2079 # We take parts of the URI as menu choices and go on accordingly.
2080 if(MENU[2] == "AboutServer") @{
2081 Document = "This is not a CGI script.\
2082 This is an httpd, an HTML file, and a CGI script all \
2083 in one GAWK script. It needs no separate www-server, \
2084 no installation, and no root privileges.\
2085 <p>To run it, do this:</p><ul>\
2086 <li> start this script with \"gawk -f httpserver.awk\",</li>\
2087 <li> and on the same host let your www browser open location\
2088 \"http://localhost:8080\"</li>\
2089 </ul>\<p>\ Details of HTTP come from:</p><ul>\
2090 <li>Hethmon: Illustrated Guide to HTTP</p>\
2091 <li>RFC 2068</li></ul><p>JK 14.9.1997</p>"
2092 @} else if (MENU[2] == "AboutELIZA") @{
2093 Document = "This is an implementation of the famous ELIZA\
2094 program by Joseph Weizenbaum. It is written in GAWK and\
2095/bin/sh: expad: command not found
2096 @} else if (MENU[2] == "StartELIZA") @{
2097 gsub(/\+/, " ", GETARG["YouSay"])
2098 # Here we also have to substitute coded special characters
2099 Document = "<form method=GET>" \
2100 "<h3>" ElizaSays(GETARG["YouSay"]) "</h3>\
2101 <p><input type=text name=YouSay value=\"\" size=60>\
2102 <br><input type=submit value=\"Tell her about it\"></p></form>"
2103 @}
2104@}
2105@c endfile
2106@end smallexample
2107
2108Now we are down to the heart of ELIZA, so you can see how it works.
2109Initially the user does not say anything; then ELIZA resets its money
2110counter and asks the user to tell what comes to mind open heartedly.
2111The subsequent answers are converted to uppercase characters and stored for
2112later comparison. ELIZA presents the bill when being confronted with
2113a sentence that contains the phrase ``shut up.'' Otherwise, it looks for
2114keywords in the sentence, conjugates the rest of the sentence, remembers
2115the keyword for later use, and finally selects an answer from the set of
2116possible answers:
2117
2118@smallexample
2119@c file eg/network/eliza.awk
2120function ElizaSays(YouSay) @{
2121 if (YouSay == "") @{
2122 cost = 0
2123 answer = "HI, IM ELIZA, TELL ME YOUR PROBLEM"
2124 @} else @{
2125 q = toupper(YouSay)
2126 gsub("'", "", q)
2127 if(q == qold) @{
2128 answer = "PLEASE DONT REPEAT YOURSELF !"
2129 @} else @{
2130 if (index(q, "SHUT UP") > 0) @{
2131 answer = "WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $"\
2132 int(100*rand()+30+cost/100)
2133 @} else @{
2134 qold = q
2135 w = "-" # no keyword recognized yet
2136 for (i in k) @{ # search for keywords
2137 if (index(q, i) > 0) @{
2138 w = i
2139 break
2140 @}
2141 @}
2142 if (w == "-") @{ # no keyword, take old subject
2143 w = wold
2144 subj = subjold
2145 @} else @{ # find subject
2146 subj = substr(q, index(q, w) + length(w)+1)
2147 wold = w
2148 subjold = subj # remember keyword and subject
2149 @}
2150 for (i in conj)
2151 gsub(i, conj[i], q) # conjugation
2152 # from all answers to this keyword, select one randomly
2153 answer = r[indices[int(split(k[w], indices) * rand()) + 1]]
2154 # insert subject into answer
2155 gsub("_", subj, answer)
2156 @}
2157 @}
2158 @}
2159 cost += length(answer) # for later payment : 1 cent per character
2160 return answer
2161@}
2162@c endfile
2163@end smallexample
2164
2165In the long but simple function @code{SetUpEliza}, you can see tables
2166for conjugation, keywords, and answers.@footnote{The version shown
2167here is abbreviated. The full version comes with the @command{gawk}
2168distribution.} The associative array @code{k}
2169contains indices into the array of answers @code{r}. To choose an
2170answer, ELIZA just picks an index randomly:
2171
2172@example
2173@c file eg/network/eliza.awk
2174function SetUpEliza() @{
2175 srand()
2176 wold = "-"
2177 subjold = " "
2178
2179 # table for conjugation
2180 conj[" ARE " ] = " AM "
2181 conj["WERE " ] = "WAS "
2182 conj[" YOU " ] = " I "
2183 conj["YOUR " ] = "MY "
2184 conj[" IVE " ] =\
2185 conj[" I HAVE " ] = " YOU HAVE "
2186 conj[" YOUVE " ] =\
2187 conj[" YOU HAVE "] = " I HAVE "
2188 conj[" IM " ] =\
2189 conj[" I AM " ] = " YOU ARE "
2190 conj[" YOURE " ] =\
2191 conj[" YOU ARE " ] = " I AM "
2192
2193 # table of all answers
2194 r[1] = "DONT YOU BELIEVE THAT I CAN _"
2195 r[2] = "PERHAPS YOU WOULD LIKE TO BE ABLE TO _ ?"
2196@c endfile
2197 @dots{}
2198@end example
2199@ignore
2200@c file eg/network/eliza.awk
2201 r[3] = "YOU WANT ME TO BE ABLE TO _ ?"
2202 r[4] = "PERHAPS YOU DONT WANT TO _ "
2203 r[5] = "DO YOU WANT TO BE ABLE TO _ ?"
2204 r[6] = "WHAT MAKES YOU THINK I AM _ ?"
2205 r[7] = "DOES IT PLEASE YOU TO BELIEVE I AM _ ?"
2206 r[8] = "PERHAPS YOU WOULD LIKE TO BE _ ?"
2207 r[9] = "DO YOU SOMETIMES WISH YOU WERE _ ?"
2208 r[10] = "DONT YOU REALLY _ ?"
2209 r[11] = "WHY DONT YOU _ ?"
2210 r[12] = "DO YOU WISH TO BE ABLE TO _ ?"
2211 r[13] = "DOES THAT TROUBLE YOU ?"
2212 r[14] = "TELL ME MORE ABOUT SUCH FEELINGS"
2213 r[15] = "DO YOU OFTEN FEEL _ ?"
2214 r[16] = "DO YOU ENJOY FEELING _ ?"
2215 r[17] = "DO YOU REALLY BELIEVE I DONT _ ?"
2216 r[18] = "PERHAPS IN GOOD TIME I WILL _ "
2217 r[19] = "DO YOU WANT ME TO _ ?"
2218 r[20] = "DO YOU THINK YOU SHOULD BE ABLE TO _ ?"
2219 r[21] = "WHY CANT YOU _ ?"
2220 r[22] = "WHY ARE YOU INTERESTED IN WHETHER OR NOT I AM _ ?"
2221 r[23] = "WOULD YOU PREFER IF I WERE NOT _ ?"
2222 r[24] = "PERHAPS IN YOUR FANTASIES I AM _ "
2223 r[25] = "HOW DO YOU KNOW YOU CANT _ ?"
2224 r[26] = "HAVE YOU TRIED ?"
2225 r[27] = "PERHAPS YOU CAN NOW _ "
2226 r[28] = "DID YOU COME TO ME BECAUSE YOU ARE _ ?"
2227 r[29] = "HOW LONG HAVE YOU BEEN _ ?"
2228 r[30] = "DO YOU BELIEVE ITS NORMAL TO BE _ ?"
2229 r[31] = "DO YOU ENJOY BEING _ ?"
2230 r[32] = "WE WERE DISCUSSING YOU -- NOT ME"
2231 r[33] = "Oh, I _"
2232 r[34] = "YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?"
2233 r[35] = "WHAT WOULD IT MEAN TO YOU, IF YOU GOT _ ?"
2234 r[36] = "WHY DO YOU WANT _ ?"
2235 r[37] = "SUPPOSE YOU SOON GOT _"
2236 r[38] = "WHAT IF YOU NEVER GOT _ ?"
2237 r[39] = "I SOMETIMES ALSO WANT _"
2238 r[40] = "WHY DO YOU ASK ?"
2239 r[41] = "DOES THAT QUESTION INTEREST YOU ?"
2240 r[42] = "WHAT ANSWER WOULD PLEASE YOU THE MOST ?"
2241 r[43] = "WHAT DO YOU THINK ?"
2242 r[44] = "ARE SUCH QUESTIONS IN YOUR MIND OFTEN ?"
2243 r[45] = "WHAT IS IT THAT YOU REALLY WANT TO KNOW ?"
2244 r[46] = "HAVE YOU ASKED ANYONE ELSE ?"
2245 r[47] = "HAVE YOU ASKED SUCH QUESTIONS BEFORE ?"
2246 r[48] = "WHAT ELSE COMES TO MIND WHEN YOU ASK THAT ?"
2247 r[49] = "NAMES DON'T INTEREST ME"
2248 r[50] = "I DONT CARE ABOUT NAMES -- PLEASE GO ON"
2249 r[51] = "IS THAT THE REAL REASON ?"
2250 r[52] = "DONT ANY OTHER REASONS COME TO MIND ?"
2251 r[53] = "DOES THAT REASON EXPLAIN ANYTHING ELSE ?"
2252 r[54] = "WHAT OTHER REASONS MIGHT THERE BE ?"
2253 r[55] = "PLEASE DON'T APOLOGIZE !"
2254 r[56] = "APOLOGIES ARE NOT NECESSARY"
2255 r[57] = "WHAT FEELINGS DO YOU HAVE WHEN YOU APOLOGIZE ?"
2256 r[58] = "DON'T BE SO DEFENSIVE"
2257 r[59] = "WHAT DOES THAT DREAM SUGGEST TO YOU ?"
2258 r[60] = "DO YOU DREAM OFTEN ?"
2259 r[61] = "WHAT PERSONS APPEAR IN YOUR DREAMS ?"
2260 r[62] = "ARE YOU DISTURBED BY YOUR DREAMS ?"
2261 r[63] = "HOW DO YOU DO ... PLEASE STATE YOUR PROBLEM"
2262 r[64] = "YOU DON'T SEEM QUITE CERTAIN"
2263 r[65] = "WHY THE UNCERTAIN TONE ?"
2264 r[66] = "CAN'T YOU BE MORE POSITIVE ?"
2265 r[67] = "YOU AREN'T SURE ?"
2266 r[68] = "DON'T YOU KNOW ?"
2267 r[69] = "WHY NO _ ?"
2268 r[70] = "DON'T SAY NO, IT'S ALWAYS SO NEGATIVE"
2269 r[71] = "WHY NOT ?"
2270 r[72] = "ARE YOU SURE ?"
2271 r[73] = "WHY NO ?"
2272 r[74] = "WHY ARE YOU CONCERNED ABOUT MY _ ?"
2273 r[75] = "WHAT ABOUT YOUR OWN _ ?"
2274 r[76] = "CAN'T YOU THINK ABOUT A SPECIFIC EXAMPLE ?"
2275 r[77] = "WHEN ?"
2276 r[78] = "WHAT ARE YOU THINKING OF ?"
2277 r[79] = "REALLY, ALWAYS ?"
2278 r[80] = "DO YOU REALLY THINK SO ?"
2279 r[81] = "BUT YOU ARE NOT SURE YOU _ "
2280 r[82] = "DO YOU DOUBT YOU _ ?"
2281 r[83] = "IN WHAT WAY ?"
2282 r[84] = "WHAT RESEMBLANCE DO YOU SEE ?"
2283 r[85] = "WHAT DOES THE SIMILARITY SUGGEST TO YOU ?"
2284 r[86] = "WHAT OTHER CONNECTION DO YOU SEE ?"
2285 r[87] = "COULD THERE REALLY BE SOME CONNECTIONS ?"
2286 r[88] = "HOW ?"
2287 r[89] = "YOU SEEM QUITE POSITIVE"
2288 r[90] = "ARE YOU SURE ?"
2289 r[91] = "I SEE"
2290 r[92] = "I UNDERSTAND"
2291 r[93] = "WHY DO YOU BRING UP THE TOPIC OF FRIENDS ?"
2292 r[94] = "DO YOUR FRIENDS WORRY YOU ?"
2293 r[95] = "DO YOUR FRIENDS PICK ON YOU ?"
2294 r[96] = "ARE YOU SURE YOU HAVE ANY FRIENDS ?"
2295 r[97] = "DO YOU IMPOSE ON YOUR FRIENDS ?"
2296 r[98] = "PERHAPS YOUR LOVE FOR FRIENDS WORRIES YOU"
2297 r[99] = "DO COMPUTERS WORRY YOU ?"
2298 r[100] = "ARE YOU TALKING ABOUT ME IN PARTICULAR ?"
2299 r[101] = "ARE YOU FRIGHTENED BY MACHINES ?"
2300 r[102] = "WHY DO YOU MENTION COMPUTERS ?"
2301 r[103] = "WHAT DO YOU THINK MACHINES HAVE TO DO WITH YOUR PROBLEMS ?"
2302 r[104] = "DON'T YOU THINK COMPUTERS CAN HELP PEOPLE ?"
2303 r[105] = "WHAT IS IT ABOUT MACHINES THAT WORRIES YOU ?"
2304 r[106] = "SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS ?"
2305 r[107] = "WHAT DOES THAT SUGGEST TO YOU ?"
2306 r[108] = "I SEE"
2307 r[109] = "IM NOT SURE I UNDERSTAND YOU FULLY"
2308 r[110] = "COME COME ELUCIDATE YOUR THOUGHTS"
2309 r[111] = "CAN YOU ELABORATE ON THAT ?"
2310 r[112] = "THAT IS QUITE INTERESTING"
2311 r[113] = "WHY DO YOU HAVE PROBLEMS WITH MONEY ?"
2312 r[114] = "DO YOU THINK MONEY IS EVERYTHING ?"
2313 r[115] = "ARE YOU SURE THAT MONEY IS THE PROBLEM ?"
2314 r[116] = "I THINK WE WANT TO TALK ABOUT YOU, NOT ABOUT ME"
2315 r[117] = "WHAT'S ABOUT ME ?"
2316 r[118] = "WHY DO YOU ALWAYS BRING UP MY NAME ?"
2317@c endfile
2318@end ignore
2319
2320@example
2321@c file eg/network/eliza.awk
2322 # table for looking up answers that
2323 # fit to a certain keyword
2324 k["CAN YOU"] = "1 2 3"
2325 k["CAN I"] = "4 5"
2326 k["YOU ARE"] =\
2327 k["YOURE"] = "6 7 8 9"
2328@c endfile
2329 @dots{}
2330@end example
2331@ignore
2332@c file eg/network/eliza.awk
2333 k["I DONT"] = "10 11 12 13"
2334 k["I FEEL"] = "14 15 16"
2335 k["WHY DONT YOU"] = "17 18 19"
2336 k["WHY CANT I"] = "20 21"
2337 k["ARE YOU"] = "22 23 24"
2338 k["I CANT"] = "25 26 27"
2339 k["I AM"] =\
2340 k["IM "] = "28 29 30 31"
2341 k["YOU "] = "32 33 34"
2342 k["I WANT"] = "35 36 37 38 39"
2343 k["WHAT"] =\
2344 k["HOW"] =\
2345 k["WHO"] =\
2346 k["WHERE"] =\
2347 k["WHEN"] =\
2348 k["WHY"] = "40 41 42 43 44 45 46 47 48"
2349 k["NAME"] = "49 50"
2350 k["CAUSE"] = "51 52 53 54"
2351 k["SORRY"] = "55 56 57 58"
2352 k["DREAM"] = "59 60 61 62"
2353 k["HELLO"] =\
2354 k["HI "] = "63"
2355 k["MAYBE"] = "64 65 66 67 68"
2356 k[" NO "] = "69 70 71 72 73"
2357 k["YOUR"] = "74 75"
2358 k["ALWAYS"] = "76 77 78 79"
2359 k["THINK"] = "80 81 82"
2360 k["LIKE"] = "83 84 85 86 87 88 89"
2361 k["YES"] = "90 91 92"
2362 k["FRIEND"] = "93 94 95 96 97 98"
2363 k["COMPUTER"] = "99 100 101 102 103 104 105"
2364 k["-"] = "106 107 108 109 110 111 112"
2365 k["MONEY"] = "113 114 115"
2366 k["ELIZA"] = "116 117 118"
2367@c endfile
2368@end ignore
2369@example
2370@c file eg/network/eliza.awk
2371@}
2372@c endfile
2373@end example
2374
2375@cindex Humphrys, Mark
2376@cindex ELIZA program
2377Some interesting remarks and details (including the original source code
2378of ELIZA) are found on Mark Humphrys' home page. Yahoo! also has a
2379page with a collection of ELIZA-like programs. Many of them are written
2380in Java, some of them disclosing the Java source code, and a few even
2381explain how to modify the Java source code.
2382
2383@node Caveats, Challenges, Simple Server, Using Networking
2384@section Network Programming Caveats
2385
2386@cindex networks, @command{gawk} and, troubleshooting
2387@cindex @command{gawk}, networking, troubleshooting
2388@cindex troubleshooting, @command{gawk}, networks
2389By now it should be clear
2390that debugging a networked application is more
2391complicated than debugging a single-process single-hosted application.
2392The behavior of a networked application sometimes looks noncausal because
2393it is not reproducible in a strong sense. Whether a network application
2394works or not sometimes depends on the following:
2395
2396@itemize @bullet
2397@item
2398How crowded the underlying network is
2399
2400@item
2401If the party at the other end is running or not
2402
2403@item
2404The state of the party at the other end
2405@end itemize
2406
2407@cindex troubleshooting, networks, timeouts
2408The most difficult problems for a beginner arise from the hidden states of the
2409underlying network. After closing a TCP connection, it's often necessary to wait
2410a short while before reopening the connection. Even more difficult is the
2411establishment of a connection that previously ended with a ``broken pipe.''
2412Those connections have to ``time out'' for a minute or so
2413before they can reopen.
2414Check this with the command @samp{netstat -a}, which
2415provides a list of still ``active'' connections.
2416
2417@node Challenges, , Caveats, Using Networking
2418@section Where To Go From Here
2419
2420@cindex Loebner, Hugh
2421@cindex contest
2422Now, you have learned enough to build your own application. You could,
2423for example, take part in the
2424Loebner Contest
2425to win $100,000. The Loebner Prize is the first formal instantiation of
2426a Turing Test. Hugh Loebner agreed with The Cambridge Center for Behavioral
2427Studies to underwrite a contest designed to implement the Turing Test.
2428Dr.@: Loebner pledged a Grand Prize of $100,000 for the first computer whose
2429responses were indistinguishable from a human's. Each year an annual prize
2430of $2000 and a bronze medal is awarded to the @emph{most} human computer.
2431The winner of the annual contest is the best entry relative to other entries
2432that year, irrespective of how good it is in an absolute sense. Here is
2433an example of a conversation with the winning program of 1997:
2434
2435@smallexample
2436(c)1997 Cambridge Center For Behavioral Studies all rights reserved [CONVERSE]
2437[Intelligent Research Ltd.] Start at: [1997/04/29 08:59:40]
2438PROGRAM[10:39:20]Hi, my name is Catherine, what's yours?
2439JUDGE02[10:39:36]My name is Nathan.
2440PROGRAM[10:39:37]Well, it's nice to meet you, Nathan.
2441PROGRAM[10:39:45]Did you see that story on CNN last night about the lesbian
2442PROGRAM[10:40:03]couple who came out at a White House party on Sunday?
2443JUDGE02[10:40:45]No. I'm sorry I missed that. How did the Prez take it?
2444PROGRAM[10:40:47]Ellen Degeneres was one of them - she was kissing her lover
2445PROGRAM[10:41:08]in front of Bill Clinton. Do you think he was very
2446PROGRAM[10:41:21]embarrassed?
2447JUDGE02[10:41:37]You tell me. I didn't see it.
2448PROGRAM[10:41:38]Are you still a Friend of Bill Clinton?
2449JUDGE02[10:42:22]Yes. I would like to talk about something else. Do you know anything about sculpture.
2450PROGRAM[10:42:24]Well, I suppose he's doing his best like most of us, but he
2451PROGRAM[10:42:35]does seem to attract an UNUSUAL amount of bad publicity,
2452PROGRAM[10:42:51]wouldnt you say--can it all be bad luck?
2453JUDGE02[10:43:20]I asked you if you knew anything about sculpture.
2454PROGRAM[10:43:21]OK, but what happens now--does it all drift on, with
2455PROGRAM[10:43:36]everyone waiting to see how much his friends have said about
2456PROGRAM[10:43:47]him under threat of lots and lots of jail-or will something
2457PROGRAM[10:44:01]dramatic actually happen?
2458JUDGE02[10:44:19]Do you know what sculpture is?
2459PROGRAM[10:44:20]Well, youre more a fan of his than I am I guess---though I
2460PROGRAM[10:44:34]cant SWEAR what Id have done in that locked hotel room, I
2461PROGRAM[10:44:51]must say.
2462JUDGE02[10:45:26]Have you ever been to the Salmagundi Club?
2463PROGRAM[10:45:29]Never. to go back to the original discussion Time to get
2464PROGRAM[10:45:43]off this one I think and onto something more down to earth!
2465JUDGE02[10:46:20]What is more down to earth than art?
2466PROGRAM[10:46:21]I don't know what -- to be honest. -
2467PROGRAM[10:46:30]I suppose it depends on where you're coming from, but as the
2468PROGRAM[10:46:46]song goes in My Fair Lady, "Why can't a woman be more like a
2469PROGRAM[10:47:03]man? "
2470JUDGE02[10:47:52]I liked that song but I prefer "Why can't children be more like we are"
2471JUDGE02[10:48:10]from another play the name of which I cannot remember.
2472PROGRAM[10:48:12]What, if anything, do you enjoy about being a man?
2473JUDGE02[10:49:01]I would enjoy all aspects of life no matter what my gender.
2474PROGRAM[10:49:02]Do you think of yourself as being attractive?
2475@end smallexample
2476
2477@cindex Clinton, Bill
2478This program insists on always speaking about the same story around Bill
2479Clinton. You see, even a program with a rather narrow mind can behave so
2480much like a human being that it can win this prize. It is quite common to
2481let these programs talk to each other via network connections. But during the
2482competition itself, the program and its computer have to be present at the
2483place the competition is held. We all would love to see a @command{gawk}
2484program win in such an event. Maybe it is up to you to accomplish this?
2485
2486Some other ideas for useful networked applications:
2487@itemize @bullet
2488@item
2489Read the file @file{doc/awkforai.txt} in the @command{gawk} distribution.
2490It was written by Ronald P.@: Loui (Associate Professor of
2491Computer Science, at Washington University in St. Louis,
2492@email{loui@@ai.wustl.edu}) and summarizes why
2493he teaches @command{gawk} to students of Artificial Intelligence. Here are
2494some passages from the text:
2495
2496@cindex AI
2497@cindex PROLOG
2498@cindex Loui, Ronald
2499@cindex agent
2500@quotation
2501The GAWK manual can
2502be consumed in a single lab session and the language can be mastered by
2503the next morning by the average student. GAWK's automatic
2504initialization, implicit coercion, I/O support and lack of pointers
2505forgive many of the mistakes that young programmers are likely to make.
2506Those who have seen C but not mastered it are happy to see that GAWK
2507retains some of the same sensibilities while adding what must be
2508regarded as spoonsful of syntactic sugar.@*
2509@dots{}@*
2510@cindex robot
2511There are further simple answers. Probably the best is the fact that
2512increasingly, undergraduate AI programming is involving the Web. Oren
2513Etzioni (University of Washington, Seattle) has for a while been arguing
2514that the ``softbot'' is replacing the mechanical engineers' robot as the
2515most glamorous AI testbed. If the artifact whose behavior needs to be
2516controlled in an intelligent way is the software agent, then a language
2517that is well-suited to controlling the software environment is the
2518appropriate language. That would imply a scripting language. If the
2519robot is KAREL, then the right language is ``turn left; turn right.'' If
2520the robot is Netscape, then the right language is something that can
2521generate @samp{netscape -remote 'openURL(http://cs.wustl.edu/~loui)'} with
2522elan.@*
2523@dots{}@*
2524AI programming requires high-level thinking. There have always been a few
2525gifted programmers who can write high-level programs in assembly language.
2526Most however need the ambient abstraction to have a higher floor.@*
2527@dots{}@*
2528Second, inference is merely the expansion of notation. No matter whether
2529the logic that underlies an AI program is fuzzy, probabilistic, deontic,
2530defeasible, or deductive, the logic merely defines how strings can be
2531transformed into other strings. A language that provides the best
2532support for string processing in the end provides the best support for
2533logic, for the exploration of various logics, and for most forms of
2534symbolic processing that AI might choose to call ``reasoning'' instead of
2535``logic.'' The implication is that PROLOG, which saves the AI programmer
2536from having to write a unifier, saves perhaps two dozen lines of GAWK
2537code at the expense of strongly biasing the logic and representational
2538expressiveness of any approach.
2539@end quotation
2540
2541Now that @command{gawk} itself can connect to the Internet, it should be obvious
2542that it is suitable for writing intelligent web agents.
2543
2544@item
2545@command{awk} is strong at pattern recognition and string processing.
2546So, it is well suited to the classic problem of language translation.
2547A first try could be a program that knows the 100 most frequent English
2548words and their counterparts in German or French. The service could be
2549implemented by regularly reading email with the program above, replacing
2550each word by its translation and sending the translation back via SMTP.
2551Users would send English email to their translation service and get
2552back a translated email message in return. As soon as this works,
2553more effort can be spent on a real translation program.
2554
2555@item
2556Another dialogue-oriented application (on the verge
2557of ridicule) is the email ``support service.'' Troubled customers write an
2558email to an automatic @command{gawk} service that reads the email. It looks
2559for keywords in the mail and assembles a reply email accordingly. By carefully
2560investigating the email header, and repeating these keywords through the
2561reply email, it is rather simple to give the customer a feeling that
2562someone cares. Ideally, such a service would search a database of previous
2563cases for solutions. If none exists, the database could, for example, consist
2564of all the newsgroups, mailing lists and FAQs on the Internet.
2565@end itemize
2566
2567@node Some Applications and Techniques, Links, Using Networking, Top
2568@comment node-name, next, previous, up
2569
2570@chapter Some Applications and Techniques
2571In this @value{CHAPTER}, we look at a number of self-contained
2572scripts, with an emphasis on concise networking. Along the way, we
2573work towards creating building blocks that encapsulate often needed
2574functions of the networking world, show new techniques that
2575broaden the scope of problems that can be solved with @command{gawk}, and
2576explore leading edge technology that may shape the future of networking.
2577
2578We often refer to the site-independent core of the server that
2579we built in
2580@ref{Simple Server, ,A Simple Web Server}.
2581When building new and nontrivial servers, we
2582always copy this building block and append new instances of the two
2583functions @code{SetUpServer} and @code{HandleGET}.
2584
2585This makes a lot of sense, since
2586this scheme of event-driven
2587execution provides @command{gawk} with an interface to the most widely
2588accepted standard for GUIs: the web browser. Now, @command{gawk} can rival even
2589Tcl/Tk.
2590
2591@cindex Tcl/Tk, @command{gawk} and
2592Tcl and @command{gawk} have much in common. Both are simple scripting languages
2593that allow us to quickly solve problems with short programs. But Tcl has Tk
2594on top of it, and @command{gawk} had nothing comparable up to now. While Tcl
2595needs a large and ever-changing library (Tk, which was bound to the X Window
2596System until recently), @command{gawk} needs just the networking interface
2597and some kind of browser on the client's side. Besides better portability,
2598the most important advantage of this approach (embracing well-established
2599standards such HTTP and HTML) is that @emph{we do not need to change the
2600language}. We let others do the work of fighting over protocols and standards.
2601We can use HTML, JavaScript, VRML, or whatever else comes along to do our work.
2602
2603@menu
2604* PANIC:: An Emergency Web Server.
2605* GETURL:: Retrieving Web Pages.
2606* REMCONF:: Remote Configuration Of Embedded Systems.
2607* URLCHK:: Look For Changed Web Pages.
2608* WEBGRAB:: Extract Links From A Page.
2609* STATIST:: Graphing A Statistical Distribution.
2610* MAZE:: Walking Through A Maze In Virtual Reality.
2611* MOBAGWHO:: A Simple Mobile Agent.
2612* STOXPRED:: Stock Market Prediction As A Service.
2613* PROTBASE:: Searching Through A Protein Database.
2614@end menu
2615
2616@node PANIC, GETURL, Some Applications and Techniques, Some Applications and Techniques
2617@section PANIC: An Emergency Web Server
2618@cindex PANIC program
2619@cindex networks, See Also web pages
2620@cindex web service
2621At first glance, the @code{"Hello, world"} example in
2622@ref{Primitive Service, ,A Primitive Web Service},
2623seems useless. By adding just a few lines, we can turn it into something useful.
2624
2625The PANIC program tells everyone who connects that the local
2626site is not working. When a web server breaks down, it makes a difference
2627if customers get a strange ``network unreachable'' message, or a short message
2628telling them that the server has a problem. In such an emergency,
2629the hard disk and everything on it (including the regular web service) may
2630be unavailable. Rebooting the web server off a diskette makes sense in this
2631setting.
2632
2633To use the PANIC program as an emergency web server, all you need are the
2634@command{gawk} executable and the program below on a diskette. By default,
2635it connects to port 8080. A different value may be supplied on the
2636command line:
2637
2638@example
2639@c file eg/network/panic.awk
2640BEGIN @{
2641 RS = ORS = "\r\n"
2642 if (MyPort == 0) MyPort = 8080
2643 HttpService = "/inet/tcp/" MyPort "/0/0"
2644 Hello = "<HTML><HEAD><TITLE>Out Of Service</TITLE>" \
2645 "</HEAD><BODY><H1>" \
2646 "This site is temporarily out of service." \
2647 "</H1></BODY></HTML>"
2648 Len = length(Hello) + length(ORS)
2649 while ("awk" != "complex") @{
2650 print "HTTP/1.0 200 OK" |& HttpService
2651 print "Content-Length: " Len ORS |& HttpService
2652 print Hello |& HttpService
2653 while ((HttpService |& getline) > 0)
2654 continue;
2655 close(HttpService)
2656 @}
2657@}
2658@c endfile
2659@end example
2660
2661@node GETURL, REMCONF, PANIC, Some Applications and Techniques
2662@section GETURL: Retrieving Web Pages
2663@cindex GETURL program
2664@cindex web pages, retrieving
2665GETURL is a versatile building block for shell scripts that need to retrieve
2666files from the Internet. It takes a web address as a command-line parameter and
2667tries to retrieve the contents of this address. The contents are printed
2668to standard output, while the header is printed to @file{/dev/stderr}.
2669A surrounding shell script
2670could analyze the contents and extract the text or the links. An ASCII
2671browser could be written around GETURL. But more interestingly, web robots are
2672straightforward to write on top of GETURL. On the Internet, you can find
2673several programs of the same name that do the same job. They are usually
2674much more complex internally and at least 10 times longer.
2675
2676At first, GETURL checks if it was called with exactly one web address.
2677Then, it checks if the user chose to use a special proxy server whose name
2678is handed over in a variable. By default, it is assumed that the local
2679machine serves as proxy. GETURL uses the @code{GET} method by default
2680to access the web page. By handing over the name of a different method
2681(such as @code{HEAD}), it is possible to choose a different behavior. With
2682the @code{HEAD} method, the user does not receive the body of the page
2683content, but does receive the header:
2684
2685@example
2686@c file eg/network/geturl.awk
2687BEGIN @{
2688 if (ARGC != 2) @{
2689 print "GETURL - retrieve Web page via HTTP 1.0"
2690 print "IN:\n the URL as a command-line parameter"
2691 print "PARAM(S):\n -v Proxy=MyProxy"
2692 print "OUT:\n the page content on stdout"
2693 print " the page header on stderr"
2694 print "JK 16.05.1997"
2695 print "ADR 13.08.2000"
2696 exit
2697 @}
2698 URL = ARGV[1]; ARGV[1] = ""
2699 if (Proxy == "") Proxy = "127.0.0.1"
2700 if (ProxyPort == 0) ProxyPort = 80
2701 if (Method == "") Method = "GET"
2702 HttpService = "/inet/tcp/0/" Proxy "/" ProxyPort
2703 ORS = RS = "\r\n\r\n"
2704 print Method " " URL " HTTP/1.0" |& HttpService
2705 HttpService |& getline Header
2706 print Header > "/dev/stderr"
2707 while ((HttpService |& getline) > 0)
2708 printf "%s", $0
2709 close(HttpService)
2710@}
2711@c endfile
2712@end example
2713
2714This program can be changed as needed, but be careful with the last lines.
2715Make sure transmission of binary data is not corrupted by additional line
2716breaks. Even as it is now, the byte sequence @code{"\r\n\r\n"} would
2717disappear if it were contained in binary data. Don't get caught in a
2718trap when trying a quick fix on this one.
2719
2720@node REMCONF, URLCHK, GETURL, Some Applications and Techniques
2721@section REMCONF: Remote Configuration of Embedded Systems
2722@cindex REMCONF program
2723@cindex Linux
2724@cindex GNU/Linux
2725@cindex Yahoo!
2726Today, you often find powerful processors in embedded systems. Dedicated
2727network routers and controllers for all kinds of machinery are examples
2728of embedded systems. Processors like the Intel 80x86 or the AMD Elan are
2729able to run multitasking operating systems, such as XINU or GNU/Linux
2730in embedded PCs. These systems are small and usually do not have
2731a keyboard or a display. Therefore it is difficult to set up their
2732configuration. There are several widespread ways to set them up:
2733
2734@itemize @bullet
2735@item
2736DIP switches
2737
2738@item
2739Read Only Memories such as EPROMs
2740
2741@item
2742Serial lines or some kind of keyboard
2743
2744@item
2745Network connections via @command{telnet} or SNMP
2746
2747@item
2748HTTP connections with HTML GUIs
2749@end itemize
2750
2751In this @value{SECTION}, we look at a solution that uses HTTP connections
2752to control variables of an embedded system that are stored in a file.
2753Since embedded systems have tight limits on resources like memory,
2754it is difficult to employ advanced techniques such as SNMP and HTTP
2755servers. @command{gawk} fits in quite nicely with its single executable
2756which needs just a short script to start working.
2757The following program stores the variables in a file, and a concurrent
2758process in the embedded system may read the file. The program uses the
2759site-independent part of the simple web server that we developed in
2760@ref{Interacting Service, ,A Web Service with Interaction}.
2761As mentioned there, all we have to do is to write two new procedures
2762@code{SetUpServer} and @code{HandleGET}:
2763
2764@smallexample
2765@c file eg/network/remconf.awk
2766function SetUpServer() @{
2767 TopHeader = "<HTML><title>Remote Configuration</title>"
2768 TopDoc = "<BODY>\
2769 <h2>Please choose one of the following actions:</h2>\
2770 <UL>\
2771 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
2772 <LI><A HREF=" MyPrefix "/ReadConfig>Read Configuration</A></LI>\
2773 <LI><A HREF=" MyPrefix "/CheckConfig>Check Configuration</A></LI>\
2774 <LI><A HREF=" MyPrefix "/ChangeConfig>Change Configuration</A></LI>\
2775 <LI><A HREF=" MyPrefix "/SaveConfig>Save Configuration</A></LI>\
2776 </UL>"
2777 TopFooter = "</BODY></HTML>"
2778 if (ConfigFile == "") ConfigFile = "config.asc"
2779@}
2780@c endfile
2781@end smallexample
2782
2783The function @code{SetUpServer} initializes the top level HTML texts
2784as usual. It also initializes the name of the file that contains the
2785configuration parameters and their values. In case the user supplies
2786a name from the command line, that name is used. The file is expected to
2787contain one parameter per line, with the name of the parameter in
2788column one and the value in column two.
2789
2790The function @code{HandleGET} reflects the structure of the menu
2791tree as usual. The first menu choice tells the user what this is all
2792about. The second choice reads the configuration file line by line
2793and stores the parameters and their values. Notice that the record
2794separator for this file is @code{"\n"}, in contrast to the record separator
2795for HTTP. The third menu choice builds an HTML table to show
2796the contents of the configuration file just read. The fourth choice
2797does the real work of changing parameters, and the last one just saves
2798the configuration into a file:
2799
2800@smallexample
2801@c file eg/network/remconf.awk
2802function HandleGET() @{
2803 if(MENU[2] == "AboutServer") @{
2804 Document = "This is a GUI for remote configuration of an\
2805 embedded system. It is is implemented as one GAWK script."
2806 @} else if (MENU[2] == "ReadConfig") @{
2807 RS = "\n"
2808 while ((getline < ConfigFile) > 0)
2809 config[$1] = $2;
2810 close(ConfigFile)
2811 RS = "\r\n"
2812 Document = "Configuration has been read."
2813 @} else if (MENU[2] == "CheckConfig") @{
2814 Document = "<TABLE BORDER=1 CELLPADDING=5>"
2815 for (i in config)
2816 Document = Document "<TR><TD>" i "</TD>" \
2817 "<TD>" config[i] "</TD></TR>"
2818 Document = Document "</TABLE>"
2819 @} else if (MENU[2] == "ChangeConfig") @{
2820 if ("Param" in GETARG) @{ # any parameter to set?
2821 if (GETARG["Param"] in config) @{ # is parameter valid?
2822 config[GETARG["Param"]] = GETARG["Value"]
2823 Document = (GETARG["Param"] " = " GETARG["Value"] ".")
2824 @} else @{
2825 Document = "Parameter <b>" GETARG["Param"] "</b> is invalid."
2826 @}
2827 @} else @{
2828 Document = "<FORM method=GET><h4>Change one parameter</h4>\
2829 <TABLE BORDER CELLPADDING=5>\
2830 <TR><TD>Parameter</TD><TD>Value</TD></TR>\
2831 <TR><TD><input type=text name=Param value=\"\" size=20></TD>\
2832 <TD><input type=text name=Value value=\"\" size=40></TD>\
2833 </TR></TABLE><input type=submit value=\"Set\"></FORM>"
2834 @}
2835 @} else if (MENU[2] == "SaveConfig") @{
2836 for (i in config)
2837 printf("%s %s\n", i, config[i]) > ConfigFile
2838 close(ConfigFile)
2839 Document = "Configuration has been saved."
2840 @}
2841@}
2842@c endfile
2843@end smallexample
2844
2845@cindex MiniSQL
2846We could also view the configuration file as a database. From this
2847point of view, the previous program acts like a primitive database server.
2848Real SQL database systems also make a service available by providing
2849a TCP port that clients can connect to. But the application level protocols
2850they use are usually proprietary and also change from time to time.
2851This is also true for the protocol that
2852MiniSQL uses.
2853
2854@node URLCHK, WEBGRAB, REMCONF, Some Applications and Techniques
2855@section URLCHK: Look for Changed Web Pages
2856@cindex URLCHK program
2857Most people who make heavy use of Internet resources have a large
2858bookmark file with pointers to interesting web sites. It is impossible
2859to regularly check by hand if any of these sites have changed. A program
2860is needed to automatically look at the headers of web pages and tell
2861which ones have changed. URLCHK does the comparison after using GETURL
2862with the @code{HEAD} method to retrieve the header.
2863
2864Like GETURL, this program first checks that it is called with exactly
2865one command-line parameter. URLCHK also takes the same command-line variables
2866@code{Proxy} and @code{ProxyPort} as GETURL,
2867because these variables are handed over to GETURL for each URL
2868that gets checked. The one and only parameter is the name of a file that
2869contains one line for each URL. In the first column, we find the URL, and
2870the second and third columns hold the length of the URL's body when checked
2871for the two last times. Now, we follow this plan:
2872
2873@enumerate
2874@item
2875Read the URLs from the file and remember their most recent lengths
2876
2877@item
2878Delete the contents of the file
2879
2880@item
2881For each URL, check its new length and write it into the file
2882
2883@item
2884If the most recent and the new length differ, tell the user
2885@end enumerate
2886
2887It may seem a bit peculiar to read the URLs from a file together
2888with their two most recent lengths, but this approach has several
2889advantages. You can call the program again and again with the same
2890file. After running the program, you can regenerate the changed URLs
2891by extracting those lines that differ in their second and third columns:
2892
2893@c inspired by URLCHK in iX 5/97 166.
2894@smallexample
2895@c file eg/network/urlchk.awk
2896BEGIN @{
2897 if (ARGC != 2) @{
2898 print "URLCHK - check if URLs have changed"
2899 print "IN:\n the file with URLs as a command-line parameter"
2900 print " file contains URL, old length, new length"
2901 print "PARAMS:\n -v Proxy=MyProxy -v ProxyPort=8080"
2902 print "OUT:\n same as file with URLs"
2903 print "JK 02.03.1998"
2904 exit
2905 @}
2906 URLfile = ARGV[1]; ARGV[1] = ""
2907 if (Proxy != "") Proxy = " -v Proxy=" Proxy
2908 if (ProxyPort != "") ProxyPort = " -v ProxyPort=" ProxyPort
2909 while ((getline < URLfile) > 0)
2910 Length[$1] = $3 + 0
2911 close(URLfile) # now, URLfile is read in and can be updated
2912 GetHeader = "gawk " Proxy ProxyPort " -v Method=\"HEAD\" -f geturl.awk "
2913 for (i in Length) @{
2914 GetThisHeader = GetHeader i " 2>&1"
2915 while ((GetThisHeader | getline) > 0)
2916 if (toupper($0) ~ /CONTENT-LENGTH/) NewLength = $2 + 0
2917 close(GetThisHeader)
2918 print i, Length[i], NewLength > URLfile
2919 if (Length[i] != NewLength) # report only changed URLs
2920 print i, Length[i], NewLength
2921 @}
2922 close(URLfile)
2923@}
2924@c endfile
2925@end smallexample
2926
2927Another thing that may look strange is the way GETURL is called.
2928Before calling GETURL, we have to check if the proxy variables need
2929to be passed on. If so, we prepare strings that will become part
2930of the command line later. In @code{GetHeader}, we store these strings
2931together with the longest part of the command line. Later, in the loop
2932over the URLs, @code{GetHeader} is appended with the URL and a redirection
2933operator to form the command that reads the URL's header over the Internet.
2934GETURL always produces the headers over @file{/dev/stderr}. That is
2935the reason why we need the redirection operator to have the header
2936piped in.
2937
2938This program is not perfect because it assumes that changing URLs
2939results in changed lengths, which is not necessarily true. A more
2940advanced approach is to look at some other header line that
2941holds time information. But, as always when things get a bit more
2942complicated, this is left as an exercise to the reader.
2943
2944@node WEBGRAB, STATIST, URLCHK, Some Applications and Techniques
2945@section WEBGRAB: Extract Links from a Page
2946@cindex WEBGRAB program
2947@c Inspired by iX 1/98 157.
2948@cindex robot
2949Sometimes it is necessary to extract links from web pages.
2950Browsers do it, web robots do it, and sometimes even humans do it.
2951Since we have a tool like GETURL at hand, we can solve this problem with
2952some help from the Bourne shell:
2953
2954@example
2955@c file eg/network/webgrab.awk
2956BEGIN @{ RS = "http://[#%&\\+\\-\\./0-9\\:;\\?A-Z_a-z\\~]*" @}
2957RT != "" @{
2958 command = ("gawk -v Proxy=MyProxy -f geturl.awk " RT \
2959 " > doc" NR ".html")
2960 print command
2961@}
2962@c endfile
2963@end example
2964
2965Notice that the regular expression for URLs is rather crude. A precise
2966regular expression is much more complex. But this one works
2967rather well. One problem is that it is unable to find internal links of
2968an HTML document. Another problem is that
2969@samp{ftp}, @samp{telnet}, @samp{news}, @samp{mailto}, and other kinds
2970of links are missing in the regular expression.
2971However, it is straightforward to add them, if doing so is necessary for other tasks.
2972
2973This program reads an HTML file and prints all the HTTP links that it finds.
2974It relies on @command{gawk}'s ability to use regular expressions as record
2975separators. With @code{RS} set to a regular expression that matches links,
2976the second action is executed each time a non-empty link is found.
2977We can find the matching link itself in @code{RT}.
2978
2979The action could use the @code{system} function to let another GETURL
2980retrieve the page, but here we use a different approach.
2981This simple program prints shell commands that can be piped into @command{sh}
2982for execution. This way it is possible to first extract
2983the links, wrap shell commands around them, and pipe all the shell commands
2984into a file. After editing the file, execution of the file retrieves
2985exactly those files that we really need. In case we do not want to edit,
2986we can retrieve all the pages like this:
2987
2988@smallexample
2989gawk -f geturl.awk http://www.suse.de | gawk -f webgrab.awk | sh
2990@end smallexample
2991
2992@cindex Microsoft Windows
2993After this, you will find the contents of all referenced documents in
2994files named @file{doc*.html} even if they do not contain HTML code.
2995The most annoying thing is that we always have to pass the proxy to
2996GETURL. If you do not like to see the headers of the web pages
2997appear on the screen, you can redirect them to @file{/dev/null}.
2998Watching the headers appear can be quite interesting, because
2999it reveals
3000interesting details such as which web server the companies use.
3001Now, it is clear how the clever marketing people
3002use web robots to determine the
3003market shares
3004of Microsoft and Netscape in the web server market.
3005
3006Port 80 of any web server is like a small hole in a repellent firewall.
3007After attaching a browser to port 80, we usually catch a glimpse
3008of the bright side of the server (its home page). With a tool like GETURL
3009at hand, we are able to discover some of the more concealed
3010or even ``indecent'' services (i.e., lacking conformity to standards of quality).
3011It can be exciting to see the fancy CGI scripts that lie
3012there, revealing the inner workings of the server, ready to be called:
3013
3014@itemize @bullet
3015@item
3016With a command such as:
3017
3018@example
3019gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/
3020@end example
3021
3022some servers give you a directory listing of the CGI files.
3023Knowing the names, you can try to call some of them and watch
3024for useful results. Sometimes there are executables in such directories
3025(such as Perl interpreters) that you may call remotely. If there are
3026subdirectories with configuration data of the web server, this can also
3027be quite interesting to read.
3028
3029@item
3030@cindex apache
3031The well-known Apache web server usually has its CGI files in the
3032directory @file{/cgi-bin}. There you can often find the scripts
3033@file{test-cgi} and @file{printenv}. Both tell you some things
3034about the current connection and the installation of the web server.
3035Just call:
3036
3037@smallexample
3038gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/test-cgi
3039gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/printenv
3040@end smallexample
3041
3042@item
3043Sometimes it is even possible to retrieve system files like the web
3044server's log file---possibly containing customer data---or even the file
3045@file{/etc/passwd}.
3046(We don't recommend this!)
3047@end itemize
3048
3049@strong{Caution:}
3050Although this may sound funny or simply irrelevant, we are talking about
3051severe security holes. Try to explore your own system this way and make
3052sure that none of the above reveals too much information about your system.
3053
3054@node STATIST, MAZE, WEBGRAB, Some Applications and Techniques
3055@section STATIST: Graphing a Statistical Distribution
3056@cindex STATIST program
3057
3058@cindex GNUPlot utility
3059@cindex image format
3060@cindex GIF image format
3061@cindex PNG image format
3062@cindex PS image format
3063@cindex Boutell, Thomas
3064@iftex
3065@image{statist,3in}
3066@end iftex
3067In the HTTP server examples we've shown thus far, we never present an image
3068to the browser and its user. Presenting images is one task. Generating
3069images that reflect some user input and presenting these dynamically
3070generated images is another. In this @value{SECTION}, we use GNUPlot
3071for generating @file{.png}, @file{.ps}, or @file{.gif}
3072files.@footnote{Due to licensing problems, the default
3073installation of GNUPlot disables the generation of @file{.gif} files.
3074If your installed version does not accept @samp{set term gif},
3075just download and install the most recent version of GNUPlot and the
3076@uref{http://www.boutell.com/gd/, GD library}
3077by Thomas Boutell.
3078Otherwise you still have the chance to generate some
3079ASCII-art style images with GNUPlot by using @samp{set term dumb}.
3080(We tried it and it worked.)}
3081
3082The program we develop takes the statistical parameters of two samples
3083and computes the t-test statistics. As a result, we get the probabilities
3084that the means and the variances of both samples are the same. In order to
3085let the user check plausibility, the program presents an image of the
3086distributions. The statistical computation follows
3087@cite{Numerical Recipes in C: The Art of Scientific Computing}
3088by William H.@: Press, Saul A.@: Teukolsky, William T.@: Vetterling, and Brian P. Flannery.
3089Since @command{gawk} does not have a built-in function
3090for the computation of the beta function, we use the @code{ibeta} function
3091of GNUPlot. As a side effect, we learn how to use GNUPlot as a
3092sophisticated calculator. The comparison of means is done as in @code{tutest},
3093paragraph 14.2, page 613, and the comparison of variances is done as in @code{ftest},
3094page 611 in @cite{Numerical Recipes}.
3095@cindex Numerical Recipes
3096
3097As usual, we take the site-independent code for servers and append
3098our own functions @code{SetUpServer} and @code{HandleGET}:
3099
3100@smallexample
3101@c file eg/network/statist.awk
3102function SetUpServer() @{
3103 TopHeader = "<HTML><title>Statistics with GAWK</title>"
3104 TopDoc = "<BODY>\
3105 <h2>Please choose one of the following actions:</h2>\
3106 <UL>\
3107 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
3108 <LI><A HREF=" MyPrefix "/EnterParameters>Enter Parameters</A></LI>\
3109 </UL>"
3110 TopFooter = "</BODY></HTML>"
3111 GnuPlot = "gnuplot 2>&1"
3112 m1=m2=0; v1=v2=1; n1=n2=10
3113@}
3114@c endfile
3115@end smallexample
3116
3117Here, you see the menu structure that the user sees. Later, we
3118will see how the program structure of the @code{HandleGET} function
3119reflects the menu structure. What is missing here is the link for the
3120image we generate. In an event-driven environment, request,
3121generation, and delivery of images are separated.
3122
3123Notice the way we initialize the @code{GnuPlot} command string for
3124the pipe. By default,
3125GNUPlot outputs the generated image via standard output, as well as
3126the results of @code{print}(ed) calculations via standard error.
3127The redirection causes standard error to be mixed into standard
3128output, enabling us to read results of calculations with @code{getline}.
3129By initializing the statistical parameters with some meaningful
3130defaults, we make sure the user gets an image the first time
3131he uses the program.
3132
3133@cindex JavaScript
3134Following is the rather long function @code{HandleGET}, which
3135implements the contents of this service by reacting to the different
3136kinds of requests from the browser. Before you start playing with
3137this script, make sure that your browser supports JavaScript and that it also
3138has this option switched on. The script uses a short snippet of
3139JavaScript code for delayed opening of a window with an image.
3140A more detailed explanation follows:
3141
3142@smallexample
3143@c file eg/network/statist.awk
3144function HandleGET() @{
3145 if(MENU[2] == "AboutServer") @{
3146 Document = "This is a GUI for a statistical computation.\
3147 It compares means and variances of two distributions.\
3148 It is implemented as one GAWK script and uses GNUPLOT."
3149 @} else if (MENU[2] == "EnterParameters") @{
3150 Document = ""
3151 if ("m1" in GETARG) @{ # are there parameters to compare?
3152 Document = Document "<SCRIPT LANGUAGE=\"JavaScript\">\
3153 setTimeout(\"window.open(\\\"" MyPrefix "/Image" systime()\
3154 "\\\",\\\"dist\\\", \\\"status=no\\\");\", 1000); </SCRIPT>"
3155 m1 = GETARG["m1"]; v1 = GETARG["v1"]; n1 = GETARG["n1"]
3156 m2 = GETARG["m2"]; v2 = GETARG["v2"]; n2 = GETARG["n2"]
3157 t = (m1-m2)/sqrt(v1/n1+v2/n2)
3158 df = (v1/n1+v2/n2)*(v1/n1+v2/n2)/((v1/n1)*(v1/n1)/(n1-1) \
3159 + (v2/n2)*(v2/n2) /(n2-1))
3160 if (v1>v2) @{
3161 f = v1/v2
3162 df1 = n1 - 1
3163 df2 = n2 - 1
3164 @} else @{
3165 f = v2/v1
3166 df1 = n2 - 1
3167 df2 = n1 - 1
3168 @}
3169 print "pt=ibeta(" df/2 ",0.5," df/(df+t*t) ")" |& GnuPlot
3170 print "pF=2.0*ibeta(" df2/2 "," df1/2 "," \
3171 df2/(df2+df1*f) ")" |& GnuPlot
3172 print "print pt, pF" |& GnuPlot
3173 RS="\n"; GnuPlot |& getline; RS="\r\n" # $1 is pt, $2 is pF
3174 print "invsqrt2pi=1.0/sqrt(2.0*pi)" |& GnuPlot
3175 print "nd(x)=invsqrt2pi/sd*exp(-0.5*((x-mu)/sd)**2)" |& GnuPlot
3176 print "set term png small color" |& GnuPlot
3177 #print "set term postscript color" |& GnuPlot
3178 #print "set term gif medium size 320,240" |& GnuPlot
3179 print "set yrange[-0.3:]" |& GnuPlot
3180 print "set label 'p(m1=m2) =" $1 "' at 0,-0.1 left" |& GnuPlot
3181 print "set label 'p(v1=v2) =" $2 "' at 0,-0.2 left" |& GnuPlot
3182 print "plot mu=" m1 ",sd=" sqrt(v1) ", nd(x) title 'sample 1',\
3183 mu=" m2 ",sd=" sqrt(v2) ", nd(x) title 'sample 2'" |& GnuPlot
3184 print "quit" |& GnuPlot
3185 GnuPlot |& getline Image
3186 while ((GnuPlot |& getline) > 0)
3187 Image = Image RS $0
3188 close(GnuPlot)
3189 @}
3190 Document = Document "\
3191 <h3>Do these samples have the same Gaussian distribution?</h3>\
3192 <FORM METHOD=GET> <TABLE BORDER CELLPADDING=5>\
3193 <TR>\
3194 <TD>1. Mean </TD>
3195 <TD><input type=text name=m1 value=" m1 " size=8></TD>\
3196 <TD>1. Variance</TD>
3197 <TD><input type=text name=v1 value=" v1 " size=8></TD>\
3198 <TD>1. Count </TD>
3199 <TD><input type=text name=n1 value=" n1 " size=8></TD>\
3200 </TR><TR>\
3201 <TD>2. Mean </TD>
3202 <TD><input type=text name=m2 value=" m2 " size=8></TD>\
3203 <TD>2. Variance</TD>
3204 <TD><input type=text name=v2 value=" v2 " size=8></TD>\
3205 <TD>2. Count </TD>
3206 <TD><input type=text name=n2 value=" n2 " size=8></TD>\
3207 </TR> <input type=submit value=\"Compute\">\
3208 </TABLE></FORM><BR>"
3209 @} else if (MENU[2] ~ "Image") @{
3210 Reason = "OK" ORS "Content-type: image/png"
3211 #Reason = "OK" ORS "Content-type: application/x-postscript"
3212 #Reason = "OK" ORS "Content-type: image/gif"
3213 Header = Footer = ""
3214 Document = Image
3215 @}
3216@}
3217@c endfile
3218@end smallexample
3219
3220@cindex PostScript
3221As usual, we give a short description of the service in the first
3222menu choice. The third menu choice shows us that generation and
3223presentation of an image are two separate actions. While the latter
3224takes place quite instantly in the third menu choice, the former
3225takes place in the much longer second choice. Image data passes from the
3226generating action to the presenting action via the variable @code{Image}
3227that contains a complete @file{.png} image, which is otherwise stored
3228in a file. If you prefer @file{.ps} or @file{.gif} images over the
3229default @file{.png} images, you may select these options by uncommenting
3230the appropriate lines. But remember to do so in two places: when
3231telling GNUPlot which kind of images to generate, and when transmitting the
3232image at the end of the program.
3233
3234Looking at the end of the program,
3235the way we pass the @samp{Content-type} to the browser is a bit unusual.
3236It is appended to the @samp{OK} of the first header line
3237to make sure the type information becomes part of the header.
3238The other variables that get transmitted across the network are
3239made empty, because in this case we do not have an HTML document to
3240transmit, but rather raw image data to contain in the body.
3241
3242Most of the work is done in the second menu choice. It starts with a
3243strange JavaScript code snippet. When first implementing this server,
3244we used a short @code{@w{"<IMG SRC="} MyPrefix "/Image>"} here. But then
3245browsers got smarter and tried to improve on speed by requesting the
3246image and the HTML code at the same time. When doing this, the browser
3247tries to build up a connection for the image request while the request for
3248the HTML text is not yet completed. The browser tries to connect
3249to the @command{gawk} server on port 8080 while port 8080 is still in use for
3250transmission of the HTML text. The connection for the image cannot be
3251built up, so the image appears as ``broken'' in the browser window.
3252We solved this problem by telling the browser to open a separate window
3253for the image, but only after a delay of 1000 milliseconds.
3254By this time, the server should be ready for serving the next request.
3255
3256But there is one more subtlety in the JavaScript code.
3257Each time the JavaScript code opens a window for the image, the
3258name of the image is appended with a timestamp (@code{systime}).
3259Why this constant change of name for the image? Initially, we always named
3260the image @code{Image}, but then the Netscape browser noticed the name
3261had @emph{not} changed since the previous request and displayed the
3262previous image (caching behavior). The server core
3263is implemented so that browsers are told @emph{not} to cache anything.
3264Obviously HTTP requests do not always work as expected. One way to
3265circumvent the cache of such overly smart browsers is to change the
3266name of the image with each request. These three lines of JavaScript
3267caused us a lot of trouble.
3268
3269The rest can be broken
3270down into two phases. At first, we check if there are statistical
3271parameters. When the program is first started, there usually are no
3272parameters because it enters the page coming from the top menu.
3273Then, we only have to present the user a form that he can use to change
3274statistical parameters and submit them. Subsequently, the submission of
3275the form causes the execution of the first phase because @emph{now}
3276there @emph{are} parameters to handle.
3277
3278Now that we have parameters, we know there will be an image available.
3279Therefore we insert the JavaScript code here to initiate the opening
3280of the image in a separate window. Then,
3281we prepare some variables that will be passed to GNUPlot for calculation
3282of the probabilities. Prior to reading the results, we must temporarily
3283change @code{RS} because GNUPlot separates lines with newlines.
3284After instructing GNUPlot to generate a @file{.png} (or @file{.ps} or
3285@file{.gif}) image, we initiate the insertion of some text,
3286explaining the resulting probabilities. The final @samp{plot} command
3287actually generates the image data. This raw binary has to be read in carefully
3288without adding, changing, or deleting a single byte. Hence the unusual
3289initialization of @code{Image} and completion with a @code{while} loop.
3290
3291When using this server, it soon becomes clear that it is far from being
3292perfect. It mixes source code of six scripting languages or protocols:
3293
3294@itemize @bullet
3295@item GNU @command{awk} implements a server for the protocol:
3296@item HTTP which transmits:
3297@item HTML text which contains a short piece of:
3298@item JavaScript code opening a separate window.
3299@item A Bourne shell script is used for piping commands into:
3300@item GNUPlot to generate the image to be opened.
3301@end itemize
3302
3303After all this work, the GNUPlot image opens in the JavaScript window
3304where it can be viewed by the user.
3305
3306It is probably better not to mix up so many different languages.
3307The result is not very readable. Furthermore, the
3308statistical part of the server does not take care of invalid input.
3309Among others, using negative variances will cause invalid results.
3310
3311@node MAZE, MOBAGWHO, STATIST, Some Applications and Techniques
3312@section MAZE: Walking Through a Maze In Virtual Reality
3313@cindex MAZE
3314@cindex VRML
3315@c VRML in iX 11/96 134.
3316@quotation
3317@cindex Perlis, Alan
3318@i{In the long run, every program becomes rococo, and then rubble.}@*
3319Alan Perlis
3320@end quotation
3321
3322By now, we know how to present arbitrary @samp{Content-type}s to a browser.
3323In this @value{SECTION}, our server will present a 3D world to our browser.
3324The 3D world is described in a scene description language (VRML,
3325Virtual Reality Modeling Language) that allows us to travel through a
3326perspective view of a 2D maze with our browser. Browsers with a
3327VRML plugin enable exploration of this technology. We could do
3328one of those boring @samp{Hello world} examples here, that are usually
3329presented when introducing novices to
3330VRML. If you have never written
3331any VRML code, have a look at
3332the VRML FAQ.
3333Presenting a static VRML scene is a bit trivial; in order to expose
3334@command{gawk}'s new capabilities, we will present a dynamically generated
3335VRML scene. The function @code{SetUpServer} is very simple because it
3336only sets the default HTML page and initializes the random number
3337generator. As usual, the surrounding server lets you browse the maze.
3338
3339@smallexample
3340@c file eg/network/maze.awk
3341function SetUpServer() @{
3342 TopHeader = "<HTML><title>Walk through a maze</title>"
3343 TopDoc = "\
3344 <h2>Please choose one of the following actions:</h2>\
3345 <UL>\
3346 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A>\
3347 <LI><A HREF=" MyPrefix "/VRMLtest>Watch a simple VRML scene</A>\
3348 </UL>"
3349 TopFooter = "</HTML>"
3350 srand()
3351@}
3352@c endfile
3353@end smallexample
3354
3355The function @code{HandleGET} is a bit longer because it first computes
3356the maze and afterwards generates the VRML code that is sent across
3357the network. As shown in the STATIST example
3358(@pxref{STATIST}),
3359we set the type of the
3360content to VRML and then store the VRML representation of the maze as the
3361page content. We assume that the maze is stored in a 2D array. Initially,
3362the maze consists of walls only. Then, we add an entry and an exit to the
3363maze and let the rest of the work be done by the function @code{MakeMaze}.
3364Now, only the wall fields are left in the maze. By iterating over the these
3365fields, we generate one line of VRML code for each wall field.
3366
3367@smallexample
3368@c file eg/network/maze.awk
3369function HandleGET() @{
3370 if (MENU[2] == "AboutServer") @{
3371 Document = "If your browser has a VRML 2 plugin,\
3372 this server shows you a simple VRML scene."
3373 @} else if (MENU[2] == "VRMLtest") @{
3374 XSIZE = YSIZE = 11 # initially, everything is wall
3375 for (y = 0; y < YSIZE; y++)
3376 for (x = 0; x < XSIZE; x++)
3377 Maze[x, y] = "#"
3378 delete Maze[0, 1] # entry is not wall
3379 delete Maze[XSIZE-1, YSIZE-2] # exit is not wall
3380 MakeMaze(1, 1)
3381 Document = "\
3382#VRML V2.0 utf8\n\
3383Group @{\n\
3384 children [\n\
3385 PointLight @{\n\
3386 ambientIntensity 0.2\n\
3387 color 0.7 0.7 0.7\n\
3388 location 0.0 8.0 10.0\n\
3389 @}\n\
3390 DEF B1 Background @{\n\
3391 skyColor [0 0 0, 1.0 1.0 1.0 ]\n\
3392 skyAngle 1.6\n\
3393 groundColor [1 1 1, 0.8 0.8 0.8, 0.2 0.2 0.2 ]\n\
3394 groundAngle [ 1.2 1.57 ]\n\
3395 @}\n\
3396 DEF Wall Shape @{\n\
3397 geometry Box @{size 1 1 1@}\n\
3398 appearance Appearance @{ material Material @{ diffuseColor 0 0 1 @} @}\n\
3399 @}\n\
3400 DEF Entry Viewpoint @{\n\
3401 position 0.5 1.0 5.0\n\
3402 orientation 0.0 0.0 -1.0 0.52\n\
3403 @}\n"
3404 for (i in Maze) @{
3405 split(i, t, SUBSEP)
3406 Document = Document " Transform @{ translation "
3407 Document = Document t[1] " 0 -" t[2] " children USE Wall @}\n"
3408 @}
3409 Document = Document " ] # end of group for world\n@}"
3410 Reason = "OK" ORS "Content-type: model/vrml"
3411 Header = Footer = ""
3412 @}
3413@}
3414@c endfile
3415@end smallexample
3416
3417Finally, we have a look at @code{MakeMaze}, the function that generates
3418the @code{Maze} array. When entered, this function assumes that the array
3419has been initialized so that each element represents a wall element and
3420the maze is initially full of wall elements. Only the entrance and the exit
3421of the maze should have been left free. The parameters of the function tell
3422us which element must be marked as not being a wall. After this, we take
3423a look at the four neighbouring elements and remember which we have already
3424treated. Of all the neighbouring elements, we take one at random and
3425walk in that direction. Therefore, the wall element in that direction has
3426to be removed and then, we call the function recursively for that element.
3427The maze is only completed if we iterate the above procedure for
3428@emph{all} neighbouring elements (in random order) and for our present
3429element by recursively calling the function for the present element. This
3430last iteration could have been done in a loop,
3431but it is done much simpler recursively.
3432
3433Notice that elements with coordinates that are both odd are assumed to be
3434on our way through the maze and the generating process cannot terminate
3435as long as there is such an element not being @code{delete}d. All other
3436elements are potentially part of the wall.
3437
3438@smallexample
3439@c file eg/network/maze.awk
3440function MakeMaze(x, y) @{
3441 delete Maze[x, y] # here we are, we have no wall here
3442 p = 0 # count unvisited fields in all directions
3443 if (x-2 SUBSEP y in Maze) d[p++] = "-x"
3444 if (x SUBSEP y-2 in Maze) d[p++] = "-y"
3445 if (x+2 SUBSEP y in Maze) d[p++] = "+x"
3446 if (x SUBSEP y+2 in Maze) d[p++] = "+y"
3447 if (p>0) @{ # if there are univisited fields, go there
3448 p = int(p*rand()) # choose one unvisited field at random
3449 if (d[p] == "-x") @{ delete Maze[x - 1, y]; MakeMaze(x - 2, y)
3450 @} else if (d[p] == "-y") @{ delete Maze[x, y - 1]; MakeMaze(x, y - 2)
3451 @} else if (d[p] == "+x") @{ delete Maze[x + 1, y]; MakeMaze(x + 2, y)
3452 @} else if (d[p] == "+y") @{ delete Maze[x, y + 1]; MakeMaze(x, y + 2)
3453 @} # we are back from recursion
3454 MakeMaze(x, y); # try again while there are unvisited fields
3455 @}
3456@}
3457@c endfile
3458@end smallexample
3459
3460@node MOBAGWHO, STOXPRED, MAZE, Some Applications and Techniques
3461@section MOBAGWHO: a Simple Mobile Agent
3462@cindex MOBAGWHO program
3463@cindex agent
3464@quotation
3465@cindex Hoare, C.A.R.
3466@i{There are two ways of constructing a software design: One way is to
3467make it so simple that there are obviously no deficiencies, and the
3468other way is to make it so complicated that there are no obvious
3469deficiencies.} @*
3470C. A. R. Hoare
3471@end quotation
3472
3473A @dfn{mobile agent} is a program that can be dispatched from a computer and
3474transported to a remote server for execution. This is called @dfn{migration},
3475which means that a process on another system is started that is independent
3476from its originator. Ideally, it wanders through
3477a network while working for its creator or owner. In places like
3478the UMBC Agent Web,
3479people are quite confident that (mobile) agents are a software engineering
3480paradigm that enables us to significantly increase the efficiency
3481of our work. Mobile agents could become the mediators between users and
3482the networking world. For an unbiased view at this technology,
3483see the remarkable paper @cite{Mobile Agents: Are they a good
3484idea?}.@footnote{@uref{http://www.research.ibm.com/massive/mobag.ps}}
3485
3486@ignore
3487@c Chuck says to take all of this out.
3488@cindex Tcl/Tk
3489A good instance of this paradigm is
3490@cite{Agent Tcl},@footnote{@uref{http://agent.cs.dartmouth.edu/software/agent2.0/}}
3491an extension of the Tcl language. After introducing a typical
3492development environment, the aforementioned paper shows a nice little
3493example application that we will try to rebuild in @command{gawk}. The
3494@command{who} agent takes a list of servers and wanders from one server
3495to the next one, always looking to see who is logged in.
3496Having reached the last
3497one, it sends back a message with a list of all users it found on each
3498machine.
3499
3500But before implementing something that might or might not be a mobile
3501agent, let us clarify the concept and some important terms. The agent
3502paradigm in general is such a young scientific discipline that it has
3503not yet developed a widely-accepted terminology. Some authors try to
3504give precise definitions, but their scope is often not wide enough
3505to be generally accepted. Franklin and Graesser ask
3506@cite{Is it an Agent or just a Program: A Taxonomy for Autonomous
3507Agents}@footnote{@uref{http://www.msci.memphis.edu/~franklin/AgentProg.html}}
3508and give even better answers than Caglayan and Harrison in their
3509@cite{Agent Sourcebook}.@footnote{@uref{http://www.aminda.com/mazzu/sourcebook/}}
3510
3511@itemize @minus
3512@item
3513@i{An autonomous agent is a system situated within and a part of
3514an environment that senses that environment and acts on it, over time, in
3515pursuit of its own agenda and so as to effect what it senses in the future.}
3516(Quoted from Franklin and Graesser.)
3517@item
3518A mobile agent is able to transport itself from one machine to another.
3519@item
3520The term @dfn{migration} often denotes this process of moving.
3521But neither of the two sources above even mentions this term, while others
3522use it regularly.
3523@end itemize
3524
3525Before delving into the (rather demanding) details of
3526implementation, let us give just one more quotation as a final
3527motivation. Steven Farley published an excellent paper called
3528@cite{Mobile Agent System Architecture},@footnote{This often
3529cited text originally appeared as a conference paper here:
3530@uref{http://www.sigs.com/publications/docs/java/9705/farley.html}
3531Many bibliographies on the Internet point to this dead link. Meanwhile,
3532the paper appeared as a contribution to a book called More Java Gems here:
3533@uref{http://uk.cambridge.org/computerscience/object/catalogue/0521774772/default.htm}}
3534in which he asks ``Why use an agent architecture?''
3535
3536@quotation
3537If client-server systems are the currently established norm and distributed
3538object systems such as CORBA are defining the future standards, why bother
3539with agents? Agent architectures have certain advantages over these other
3540types. Three of the most important advantages are:
3541@cindex CORBA
3542
3543@enumerate
3544@item
3545An agent performs much processing at the server where local bandwidth
3546is high, thus reducing the amount of network bandwidth consumed and increasing
3547overall performance. In contrast, a CORBA client object with the equivalent
3548functionality of a given agent must make repeated remote method calls to
3549the server object because CORBA objects cannot move across the network
3550at runtime.
3551
3552@item
3553An agent operates independently of the application from which the
3554agent was invoked. The agent operates asynchronously, meaning that the
3555client application does not need to wait for the results. This is especially
3556important for mobile users who are not always connected to the network.
3557
3558@item
3559The use of agents allows for the injection of new functionality into
3560a system at run time. An agent system essentially contains its own automatic
3561software distribution mechanism. Since CORBA has no built-in support for
3562mobile code, new functionality generally has to be installed manually.
3563
3564@end enumerate
3565
3566Of course a non-agent system can exhibit these same features with some
3567work. But the mobile code paradigm supports the transfer of executable
3568code to a remote location for asynchronous execution from the start. An
3569agent architecture should be considered for systems where the above features
3570are primary requirements.
3571@end quotation
3572@end ignore
3573
3574When trying to migrate a process from one system to another,
3575a server process is needed on the receiving side. Depending on the kind
3576of server process, several ways of implementation come to mind.
3577How the process is implemented depends upon the kind of server process:
3578
3579@itemize @bullet
3580@item
3581HTTP can be used as the protocol for delivery of the migrating
3582process. In this case, we use a common web
3583server as the receiving server process. A universal CGI script
3584mediates between migrating process and web server.
3585Each server willing to accept migrating agents makes this universal
3586service available. HTTP supplies the @code{POST} method to transfer
3587some data to a file on the web server. When a CGI script is called
3588remotely with the @code{POST} method instead of the usual @code{GET} method,
3589data is transmitted from the client process to the standard input
3590of the server's CGI script. So, to implement a mobile agent,
3591we must not only write the agent program to start on the client
3592side, but also the CGI script to receive the agent on the server side.
3593
3594@cindex CGI (Common Gateway Interface)
3595@cindex apache
3596@item
3597The @code{PUT} method can also be used for migration. HTTP does not
3598require a CGI script for migration via @code{PUT}. However, with common web
3599servers there is no advantage to this solution, because web servers such as
3600Apache
3601require explicit activation of a special @code{PUT} script.
3602
3603@item
3604@cite{Agent Tcl} pursues a different course; it relies on a dedicated server
3605process with a dedicated protocol specialized for receiving mobile agents.
3606@end itemize
3607
3608Our agent example abuses a common web server as a migration tool. So, it needs a
3609universal CGI script on the receiving side (the web server). The receiving script is
3610activated with a @code{POST} request when placed into a location like
3611@file{/httpd/cgi-bin/PostAgent.sh}. Make sure that the server system uses a
3612version of @command{gawk} that supports network access (Version 3.1 or later;
3613verify with @samp{gawk --version}).
3614
3615@example
3616@c file eg/network/PostAgent.sh
3617#!/bin/sh
3618MobAg=/tmp/MobileAgent.$$
3619# direct script to mobile agent file
3620cat > $MobAg
3621# execute agent concurrently
3622gawk -f $MobAg $MobAg > /dev/null &
3623# HTTP header, terminator and body
3624gawk 'BEGIN @{ print "\r\nAgent started" @}'
3625rm $MobAg # delete script file of agent
3626@c endfile
3627@end example
3628
3629By making its process id (@code{$$}) part of the unique @value{FN}, the
3630script avoids conflicts between concurrent instances of the script.
3631First, all lines
3632from standard input (the mobile agent's source code) are copied into
3633this unique file. Then, the agent is started as a concurrent process
3634and a short message reporting this fact is sent to the submitting client.
3635Finally, the script file of the mobile agent is removed because it is
3636no longer needed. Although it is a short script, there are several noteworthy
3637points:
3638
3639@table @asis
3640@item Security
3641@emph{There is none}. In fact, the CGI script should never
3642be made available on a server that is part of the Internet because everyone
3643would be allowed to execute arbitrary commands with it. This behavior is
3644acceptable only when performing rapid prototyping.
3645
3646@item Self-Reference
3647Each migrating instance of an agent is started
3648in a way that enables it to read its own source code from standard input
3649and use the code for subsequent
3650migrations. This is necessary because it needs to treat the agent's code
3651as data to transmit. @command{gawk} is not the ideal language for such
3652a job. Lisp and Tcl are more suitable because they do not make a distinction
3653between program code and data.
3654
3655@item Independence
3656After migration, the agent is not linked to its
3657former home in any way. By reporting @samp{Agent started}, it waves
3658``Goodbye'' to its origin. The originator may choose to terminate or not.
3659@end table
3660
3661@cindex Lisp
3662The originating agent itself is started just like any other command-line
3663script, and reports the results on standard output. By letting the name
3664of the original host migrate with the agent, the agent that migrates
3665to a host far away from its origin can report the result back home.
3666Having arrived at the end of the journey, the agent establishes
3667a connection and reports the results. This is the reason for
3668determining the name of the host with @samp{uname -n} and storing it
3669in @code{MyOrigin} for later use. We may also set variables with the
3670@option{-v} option from the command line. This interactivity is only
3671of importance in the context of starting a mobile agent; therefore this
3672@code{BEGIN} pattern and its action do not take part in migration:
3673
3674@smallexample
3675@c file eg/network/mobag.awk
3676BEGIN @{
3677 if (ARGC != 2) @{
3678 print "MOBAG - a simple mobile agent"
3679 print "CALL:\n gawk -f mobag.awk mobag.awk"
3680 print "IN:\n the name of this script as a command-line parameter"
3681 print "PARAM:\n -v MyOrigin=myhost.com"
3682 print "OUT:\n the result on stdout"
3683 print "JK 29.03.1998 01.04.1998"
3684 exit
3685 @}
3686 if (MyOrigin == "") @{
3687 "uname -n" | getline MyOrigin
3688 close("uname -n")
3689 @}
3690@}
3691@c endfile
3692@end smallexample
3693
3694Since @command{gawk} cannot manipulate and transmit parts of the program
3695directly, the source code is read and stored in strings.
3696Therefore, the program scans itself for
3697the beginning and the ending of functions.
3698Each line in between is appended to the code string until the end of
3699the function has been reached. A special case is this part of the program
3700itself. It is not a function.
3701Placing a similar framework around it causes it to be treated
3702like a function. Notice that this mechanism works for all the
3703functions of the source code, but it cannot guarantee that the order
3704of the functions is preserved during migration:
3705
3706@smallexample
3707@c file eg/network/mobag.awk
3708#ReadMySelf
3709/^function / @{ FUNC = $2 @}
3710/^END/ || /^#ReadMySelf/ @{ FUNC = $1 @}
3711FUNC != "" @{ MOBFUN[FUNC] = MOBFUN[FUNC] RS $0 @}
3712(FUNC != "") && (/^@}/ || /^#EndOfMySelf/) \
3713 @{ FUNC = "" @}
3714#EndOfMySelf
3715@c endfile
3716@end smallexample
3717
3718The web server code in
3719@ref{Interacting Service, ,A Web Service with Interaction},
3720was first developed as a site-independent core. Likewise, the
3721@command{gawk}-based mobile agent
3722starts with an agent-independent core, to which can be appended
3723application-dependent functions. What follows is the only
3724application-independent function needed for the mobile agent:
3725
3726@smallexample
3727@c file eg/network/mobag.awk
3728function migrate(Destination, MobCode, Label) @{
3729 MOBVAR["Label"] = Label
3730 MOBVAR["Destination"] = Destination
3731 RS = ORS = "\r\n"
3732 HttpService = "/inet/tcp/0/" Destination
3733 for (i in MOBFUN)
3734 MobCode = (MobCode "\n" MOBFUN[i])
3735 MobCode = MobCode "\n\nBEGIN @{"
3736 for (i in MOBVAR)
3737 MobCode = (MobCode "\n MOBVAR[\"" i "\"] = \"" MOBVAR[i] "\"")
3738 MobCode = MobCode "\n@}\n"
3739 print "POST /cgi-bin/PostAgent.sh HTTP/1.0" |& HttpService
3740 print "Content-length:", length(MobCode) ORS |& HttpService
3741 printf "%s", MobCode |& HttpService
3742 while ((HttpService |& getline) > 0)
3743 print $0
3744 close(HttpService)
3745@}
3746@c endfile
3747@end smallexample
3748
3749The @code{migrate} function prepares the
3750aforementioned strings containing the program code and transmits them to a
3751server. A consequence of this modular approach is that the @code{migrate}
3752function takes some parameters that aren't needed in this application,
3753but that will be in future ones. Its mandatory parameter @code{Destination} holds the
3754name (or IP address) of the server that the agent wants as a host for its
3755code. The optional parameter @code{MobCode} may contain some @command{gawk}
3756code that is inserted during migration in front of all other code.
3757The optional parameter @code{Label} may contain
3758a string that tells the agent what to do in program execution after
3759arrival at its new home site. One of the serious obstacles in implementing
3760a framework for mobile agents is that it does not suffice to migrate the
3761code. It is also necessary to migrate the state of execution of the agent. In
3762contrast to @cite{Agent Tcl}, this program does not try to migrate the complete set
3763of variables. The following conventions are used:
3764
3765@itemize @bullet
3766@item
3767Each variable in an agent program is local to the current host and does
3768@emph{not} migrate.
3769
3770@item
3771The array @code{MOBFUN} shown above is an exception. It is handled
3772by the function @code{migrate} and does migrate with the application.
3773
3774@item
3775The other exception is the array @code{MOBVAR}. Each variable that
3776takes part in migration has to be an element of this array.
3777@code{migrate} also takes care of this.
3778@end itemize
3779
3780Now it's clear what happens to the @code{Label} parameter of the
3781function @code{migrate}. It is copied into @code{MOBVAR["Label"]} and
3782travels alongside the other data. Since travelling takes place via HTTP,
3783records must be separated with @code{"\r\n"} in @code{RS} and
3784@code{ORS} as usual. The code assembly for migration takes place in
3785three steps:
3786
3787@itemize @bullet
3788@item
3789Iterate over @code{MOBFUN} to collect all functions verbatim.
3790
3791@item
3792Prepare a @code{BEGIN} pattern and put assignments to mobile
3793variables into the action part.
3794
3795@item
3796Transmission itself resembles GETURL: the header with the request
3797and the @code{Content-length} is followed by the body. In case there is
3798any reply over the network, it is read completely and echoed to
3799standard output to avoid irritating the server.
3800@end itemize
3801
3802The application-independent framework is now almost complete. What follows
3803is the @code{END} pattern that is executed when the mobile agent has
3804finished reading its own code. First, it checks whether it is already
3805running on a remote host or not. In case initialization has not yet taken
3806place, it starts @code{MyInit}. Otherwise (later, on a remote host), it
3807starts @code{MyJob}:
3808
3809@smallexample
3810@c file eg/network/mobag.awk
3811END @{
3812 if (ARGC != 2) exit # stop when called with wrong parameters
3813 if (MyOrigin != "") # is this the originating host?
3814 MyInit() # if so, initialize the application
3815 else # we are on a host with migrated data
3816 MyJob() # so we do our job
3817@}
3818@c endfile
3819@end smallexample
3820
3821All that's left to extend the framework into a complete application
3822is to write two application-specific functions: @code{MyInit} and
3823@code{MyJob}. Keep in mind that the former is executed once on the
3824originating host, while the latter is executed after each migration:
3825
3826@smallexample
3827@c file eg/network/mobag.awk
3828function MyInit() @{
3829 MOBVAR["MyOrigin"] = MyOrigin
3830 MOBVAR["Machines"] = "localhost/80 max/80 moritz/80 castor/80"
3831 split(MOBVAR["Machines"], Machines) # which host is the first?
3832 migrate(Machines[1], "", "") # go to the first host
3833 while (("/inet/tcp/8080/0/0" |& getline) > 0) # wait for result
3834 print $0 # print result
3835 close("/inet/tcp/8080/0/0")
3836@}
3837@c endfile
3838@end smallexample
3839
3840As mentioned earlier, this agent takes the name of its origin
3841(@code{MyOrigin}) with it. Then, it takes the name of its first
3842destination and goes there for further work. Notice that this name has
3843the port number of the web server appended to the name of the server,
3844because the function @code{migrate} needs it this way to create
3845the @code{HttpService} variable. Finally, it waits for the result to arrive.
3846The @code{MyJob} function runs on the remote host:
3847
3848@smallexample
3849@c file eg/network/mobag.awk
3850function MyJob() @{
3851 # forget this host
3852 sub(MOBVAR["Destination"], "", MOBVAR["Machines"])
3853 MOBVAR["Result"]=MOBVAR["Result"] SUBSEP SUBSEP MOBVAR["Destination"] ":"
3854 while (("who" | getline) > 0) # who is logged in?
3855 MOBVAR["Result"] = MOBVAR["Result"] SUBSEP $0
3856 close("who")
3857 if (index(MOBVAR["Machines"], "/") > 0) @{ # any more machines to visit?
3858 split(MOBVAR["Machines"], Machines) # which host is next?
3859 migrate(Machines[1], "", "") # go there
3860 @} else @{ # no more machines
3861 gsub(SUBSEP, "\n", MOBVAR["Result"]) # send result to origin
3862 print MOBVAR["Result"] |& "/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080"
3863 close("/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080")
3864 @}
3865@}
3866@c endfile
3867@end smallexample
3868
3869After migrating, the first thing to do in @code{MyJob} is to delete
3870the name of the current host from the list of hosts to visit. Now, it
3871is time to start the real work by appending the host's name to the
3872result string, and reading line by line who is logged in on this host.
3873A very annoying circumstance is the fact that the elements of
3874@code{MOBVAR} cannot hold the newline character (@code{"\n"}). If they
3875did, migration of this string did not work because the string didn't
3876obey the syntax rule for a string in @command{gawk}.
3877@code{SUBSEP} is used as a temporary replacement.
3878If the list of hosts to visit holds
3879at least one more entry, the agent migrates to that place to go on
3880working there. Otherwise, we replace the @code{SUBSEP}s
3881with a newline character in the resulting string, and report it to
3882the originating host, whose name is stored in @code{MOBVAR["MyOrigin"]}.
3883
3884@node STOXPRED, PROTBASE, MOBAGWHO, Some Applications and Techniques
3885@section STOXPRED: Stock Market Prediction As A Service
3886@cindex STOXPRED program
3887@cindex Yahoo!
3888@quotation
3889@i{Far out in the uncharted backwaters of the unfashionable end of
3890the Western Spiral arm of the Galaxy lies a small unregarded yellow sun.}
3891
3892@i{Orbiting this at a distance of roughly ninety-two million miles is an
3893utterly insignificant little blue-green planet whose ape-descendent life
3894forms are so amazingly primitive that they still think digital watches are
3895a pretty neat idea.}
3896
3897@i{This planet has --- or rather had --- a problem, which was this:
3898most of the people living on it were unhappy for pretty much of the time.
3899Many solutions were suggested for this problem, but most of these were
3900largely concerned with the movements of small green pieces of paper,
3901which is odd because it wasn't the small green pieces of paper that
3902were unhappy.} @*
3903Douglas Adams, @cite{The Hitch Hiker's Guide to the Galaxy}
3904@end quotation
3905
3906@cindex @command{cron} utility
3907Valuable services on the Internet are usually @emph{not} implemented
3908as mobile agents. There are much simpler ways of implementing services.
3909All Unix systems provide, for example, the @command{cron} service.
3910Unix system users can write a list of tasks to be done each day, each
3911week, twice a day, or just once. The list is entered into a file named
3912@file{crontab}. For example, to distribute a newsletter on a daily
3913basis this way, use @command{cron} for calling a script each day early
3914in the morning.
3915
3916@example
3917# run at 8 am on weekdays, distribute the newsletter
39180 8 * * 1-5 $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
3919@end example
3920
3921The script first looks for interesting information on the Internet,
3922assembles it in a nice form and sends the results via email to
3923the customers.
3924
3925The following is an example of a primitive
3926newsletter on stock market prediction. It is a report which first
3927tries to predict the change of each share in the Dow Jones Industrial
3928Index for the particular day. Then it mentions some especially
3929promising shares as well as some shares which look remarkably bad
3930on that day. The report ends with the usual disclaimer which tells
3931every child @emph{not} to try this at home and hurt anybody.
3932@cindex Dow Jones Industrial Index
3933
3934@smallexample
3935Good morning Uncle Scrooge,
3936
3937This is your daily stock market report for Monday, October 16, 2000.
3938Here are the predictions for today:
3939
3940 AA neutral
3941 GE up
3942 JNJ down
3943 MSFT neutral
3944 @dots{}
3945 UTX up
3946 DD down
3947 IBM up
3948 MO down
3949 WMT up
3950 DIS up
3951 INTC up
3952 MRK down
3953 XOM down
3954 EK down
3955 IP down
3956
3957The most promising shares for today are these:
3958
3959 INTC http://biz.yahoo.com/n/i/intc.html
3960
3961The stock shares to avoid today are these:
3962
3963 EK http://biz.yahoo.com/n/e/ek.html
3964 IP http://biz.yahoo.com/n/i/ip.html
3965 DD http://biz.yahoo.com/n/d/dd.html
3966 @dots{}
3967@end smallexample
3968
3969@ignore
3970@c Chuck suggests removing this paragraph
3971If you are not into stock market prediction but want to earn money
3972with a more humane service, you might prefer to send out horoscopes
3973to your customers. Or, once every refrigerator in every household on this side
3974of the Chinese Wall is connected to the Internet, such a service could
3975inspect the contents of your customer's refrigerators each day and
3976advise them on nutrition. Big Brother is watching them.
3977@end ignore
3978
3979The script as a whole is rather long. In order to ease the pain of
3980studying other people's source code, we have broken the script
3981up into meaningful parts which are invoked one after the other.
3982The basic structure of the script is as follows:
3983
3984@example
3985@c file eg/network/stoxpred.awk
3986BEGIN @{
3987 Init()
3988 ReadQuotes()
3989 CleanUp()
3990 Prediction()
3991 Report()
3992 SendMail()
3993@}
3994@c endfile
3995@end example
3996
3997The earlier parts store data into variables and arrays which are
3998subsequently used by later parts of the script. The @code{Init} function
3999first checks if the script is invoked correctly (without any parameters).
4000If not, it informs the user of the correct usage. What follows are preparations
4001for the retrieval of the historical quote data. The names of the 30 stock
4002shares are stored in an array @code{name} along with the current date
4003in @code{day}, @code{month}, and @code{year}.
4004
4005All users who are separated
4006from the Internet by a firewall and have to direct their Internet accesses
4007to a proxy must supply the name of the proxy to this script with the
4008@samp{-v Proxy=@var{name}} option. For most users, the default proxy and
4009port number should suffice.
4010
4011@example
4012@c file eg/network/stoxpred.awk
4013function Init() @{
4014 if (ARGC != 1) @{
4015 print "STOXPRED - daily stock share prediction"
4016 print "IN:\n no parameters, nothing on stdin"
4017 print "PARAM:\n -v Proxy=MyProxy -v ProxyPort=80"
4018 print "OUT:\n commented predictions as email"
4019 print "JK 09.10.2000"
4020 exit
4021 @}
4022 # Remember ticker symbols from Dow Jones Industrial Index
4023 StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
4024 SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
4025 MRK XOM EK IP", name);
4026 # Remember the current date as the end of the time series
4027 day = strftime("%d")
4028 month = strftime("%m")
4029 year = strftime("%Y")
4030 if (Proxy == "") Proxy = "chart.yahoo.com"
4031 if (ProxyPort == 0) ProxyPort = 80
4032 YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
4033@}
4034@c endfile
4035@end example
4036
4037@cindex CSV format
4038There are two really interesting parts in the script. One is the
4039function which reads the historical stock quotes from an Internet
4040server. The other is the one that does the actual prediction. In
4041the following function we see how the quotes are read from the
4042Yahoo server. The data which comes from the server is in
4043CSV format (comma-separated values):
4044
4045@example
4046@c file eg/network/stoxdata.txt
4047Date,Open,High,Low,Close,Volume
40489-Oct-00,22.75,22.75,21.375,22.375,7888500
40496-Oct-00,23.8125,24.9375,21.5625,22,10701100
40505-Oct-00,24.4375,24.625,23.125,23.50,5810300
4051@c endfile
4052@end example
4053
4054Lines contain values of the same time instant, whereas columns are
4055separated by commas and contain the kind of data that is described
4056in the header (first) line. At first, @command{gawk} is instructed to
4057separate columns by commas (@samp{FS = ","}). In the loop that follows,
4058a connection to the Yahoo server is first opened, then a download takes
4059place, and finally the connection is closed. All this happens once for
4060each ticker symbol. In the body of this loop, an Internet address is
4061built up as a string according to the rules of the Yahoo server. The
4062starting and ending date are chosen to be exactly the same, but one year
4063apart in the past. All the action is initiated within the @code{printf}
4064command which transmits the request for data to the Yahoo server.
4065
4066In the inner loop, the server's data is first read and then scanned
4067line by line. Only lines which have six columns and the name of a month
4068in the first column contain relevant data. This data is stored
4069in the two-dimensional array @code{quote}; one dimension
4070being time, the other being the ticker symbol. During retrieval of the
4071first stock's data, the calendar names of the time instances are stored
4072in the array @code{day} because we need them later.
4073
4074@smallexample
4075@c file eg/network/stoxpred.awk
4076function ReadQuotes() @{
4077 # Retrieve historical data for each ticker symbol
4078 FS = ","
4079 for (stock = 1; stock <= StockCount; stock++) @{
4080 URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
4081 "&a=" month "&b=" day "&c=" year-1 \
4082 "&d=" month "&e=" day "&f=" year \
4083 "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
4084 printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
4085 while ((YahooData |& getline) > 0) @{
4086 if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) @{
4087 if (stock == 1)
4088 days[++daycount] = $1;
4089 quote[$1, stock] = $5
4090 @}
4091 @}
4092 close(YahooData)
4093 @}
4094 FS = " "
4095@}
4096@c endfile
4097@end smallexample
4098
4099Now that we @emph{have} the data, it can be checked once again to make sure
4100that no individual stock is missing or invalid, and that all the stock quotes are
4101aligned correctly. Furthermore, we renumber the time instances. The
4102most recent day gets day number 1 and all other days get consecutive
4103numbers. All quotes are rounded toward the nearest whole number in US Dollars.
4104
4105@smallexample
4106@c file eg/network/stoxpred.awk
4107function CleanUp() @{
4108 # clean up time series; eliminate incomplete data sets
4109 for (d = 1; d <= daycount; d++) @{
4110 for (stock = 1; stock <= StockCount; stock++)
4111 if (! ((days[d], stock) in quote))
4112 stock = StockCount + 10
4113 if (stock > StockCount + 1)
4114 continue
4115 datacount++
4116 for (stock = 1; stock <= StockCount; stock++)
4117 data[datacount, stock] = int(0.5 + quote[days[d], stock])
4118 @}
4119 delete quote
4120 delete days
4121@}
4122@c endfile
4123@end smallexample
4124
4125Now we have arrived at the second really interesting part of the whole affair.
4126What we present here is a very primitive prediction algorithm:
4127@emph{If a stock fell yesterday, assume it will also fall today; if
4128it rose yesterday, assume it will rise today}. (Feel free to replace this
4129algorithm with a smarter one.) If a stock changed in the same direction
4130on two consecutive days, this is an indication which should be highlighted.
4131Two-day advances are stored in @code{hot} and two-day declines in
4132@code{avoid}.
4133
4134The rest of the function is a sanity check. It counts the number of
4135correct predictions in relation to the total number of predictions
4136one could have made in the year before.
4137
4138@smallexample
4139@c file eg/network/stoxpred.awk
4140function Prediction() @{
4141 # Predict each ticker symbol by prolonging yesterday's trend
4142 for (stock = 1; stock <= StockCount; stock++) @{
4143 if (data[1, stock] > data[2, stock]) @{
4144 predict[stock] = "up"
4145 @} else if (data[1, stock] < data[2, stock]) @{
4146 predict[stock] = "down"
4147 @} else @{
4148 predict[stock] = "neutral"
4149 @}
4150 if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
4151 hot[stock] = 1
4152 if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
4153 avoid[stock] = 1
4154 @}
4155 # Do a plausibility check: how many predictions proved correct?
4156 for (s = 1; s <= StockCount; s++) @{
4157 for (d = 1; d <= datacount-2; d++) @{
4158 if (data[d+1, s] > data[d+2, s]) @{
4159 UpCount++
4160 @} else if (data[d+1, s] < data[d+2, s]) @{
4161 DownCount++
4162 @} else @{
4163 NeutralCount++
4164 @}
4165 if (((data[d, s] > data[d+1, s]) && (data[d+1, s] > data[d+2, s])) ||
4166 ((data[d, s] < data[d+1, s]) && (data[d+1, s] < data[d+2, s])) ||
4167 ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
4168 CorrectCount++
4169 @}
4170 @}
4171@}
4172@c endfile
4173@end smallexample
4174
4175At this point the hard work has been done: the array @code{predict}
4176contains the predictions for all the ticker symbols. It is up to the
4177function @code{Report} to find some nice words to introduce the
4178desired information.
4179
4180@smallexample
4181@c file eg/network/stoxpred.awk
4182function Report() @{
4183 # Generate report
4184 report = "\nThis is your daily "
4185 report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
4186 report = report "Here are the predictions for today:\n\n"
4187 for (stock = 1; stock <= StockCount; stock++)
4188 report = report "\t" name[stock] "\t" predict[stock] "\n"
4189 for (stock in hot) @{
4190 if (HotCount++ == 0)
4191 report = report "\nThe most promising shares for today are these:\n\n"
4192 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
4193 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
4194 @}
4195 for (stock in avoid) @{
4196 if (AvoidCount++ == 0)
4197 report = report "\nThe stock shares to avoid today are these:\n\n"
4198 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
4199 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
4200 @}
4201 report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
4202 report = report " losers. When using this kind\nof prediction scheme for"
4203 report = report " the 12 months which lie behind us,\nwe get " UpCount
4204 report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
4205 report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
4206 report = report " predictions " CorrectCount " proved correct next day.\n"
4207 report = report "A success rate of "\
4208 int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
4209 report = report "Random choice would have produced a 33% success rate.\n"
4210 report = report "Disclaimer: Like every other prediction of the stock\n"
4211 report = report "market, this report is, of course, complete nonsense.\n"
4212 report = report "If you are stupid enough to believe these predictions\n"
4213 report = report "you should visit a doctor who can treat your ailment."
4214@}
4215@c endfile
4216@end smallexample
4217
4218The function @code{SendMail} goes through the list of customers and opens
4219a pipe to the @code{mail} command for each of them. Each one receives an
4220email message with a proper subject heading and is addressed with his full name.
4221
4222@smallexample
4223@c file eg/network/stoxpred.awk
4224function SendMail() @{
4225 # send report to customers
4226 customer["uncle.scrooge@@ducktown.gov"] = "Uncle Scrooge"
4227 customer["more@@utopia.org" ] = "Sir Thomas More"
4228 customer["spinoza@@denhaag.nl" ] = "Baruch de Spinoza"
4229 customer["marx@@highgate.uk" ] = "Karl Marx"
4230 customer["keynes@@the.long.run" ] = "John Maynard Keynes"
4231 customer["bierce@@devil.hell.org" ] = "Ambrose Bierce"
4232 customer["laplace@@paris.fr" ] = "Pierre Simon de Laplace"
4233 for (c in customer) @{
4234 MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
4235 print "Good morning " customer[c] "," | MailPipe
4236 print report "\n.\n" | MailPipe
4237 close(MailPipe)
4238 @}
4239@}
4240@c endfile
4241@end smallexample
4242
4243Be patient when running the script by hand.
4244Retrieving the data for all the ticker symbols and sending the emails
4245may take several minutes to complete, depending upon network traffic
4246and the speed of the available Internet link.
4247The quality of the prediction algorithm is likely to be disappointing.
4248Try to find a better one.
4249Should you find one with a success rate of more than 50%, please tell
4250us about it! It is only for the sake of curiosity, of course. @code{:-)}
4251
4252@ignore
4253@c chuck says to remove this
4254Let us give you one final indication as to what one can expect from
4255a prediction of stock data, which is sometimes said to contain much
4256randomness. One theory says that all relevant information to be taken
4257into account when estimating the price of a stock is contained in the
4258stock quotes. Every bit of useful information has influenced the
4259fair price. Therefore (the theory says) temporary changes (i.e., fluctuations
4260within a minute) have to be purely random. But what is the cause of
4261short-term changes in stock prices?
4262
4263Stock prices are fixed when supply and demand meet each other.
4264What people are willing to pay reflects human expectations.
4265Human expectations are not necessarily random. On the Internet,
4266you can find an elucidating paper about predictability and human
4267expectations:
4268@uref{http://it.ucsd.edu/IT/Newsletter/archives/meir/05meir.html,
4269@cite{Reflections on ``Universal Prediction of Individual Sequences''}}
4270The authors (Feder, Merhav, Gutman) introduce the reader to the subject
4271by telling a thrilling anecdote.
4272@cindex Shannon, Claude
4273@quotation
4274In the early 50's, at Bell Laboratories, David Hagelbarger built a
4275simple ``mind reading'' machine, whose purpose was to play the ``penny
4276matching'' game. In this game, a player chooses head or tail, while a
4277``mind reading'' machine tries to predict and match his choice.
4278Surprisingly, as Robert Lucky tells in his book ``Silicon Dreams'',
4279Hagelbarger's simple, 8-state machine, was able to match the ``pennies''
4280of its human opponent 5,218 times over the course of 9,795 plays.
4281Random guessing would lead to such a high success rate with a probability
4282less than one out of 10 billion! Shannon, who was interested in prediction,
4283information, and thinking machines, closely followed Hagelbarger's
4284machine, and eventually built his own stripped-down version of the machine,
4285having the same states, but one that used a simpler strategy at each state.
4286As the legend goes, in a duel between the two machines, Shannon's machine
4287won by a slight margin! No one knows if this was due to a superior algorithm
4288or just a chance happening associated with the specific sequence at that game.
4289In any event, the success of both these machines against ``untrained'' human
4290opponents was explained by the fact that the human opponents cannot draw
4291completely random
4292bits.
4293@end quotation
4294@end ignore
4295
4296@node PROTBASE, , STOXPRED, Some Applications and Techniques
4297@section PROTBASE: Searching Through A Protein Database
4298@cindex PROTBASE
4299@cindex NCBI, National Center for Biotechnology Information
4300@cindex BLAST, Basic Local Alignment Search Tool
4301@cindex Hoare, C.A.R.
4302@quotation
4303@i{Hoare's Law of Large Problems: Inside every large problem is a small
4304 problem struggling to get out.}
4305@end quotation
4306
4307Yahoo's database of stock market data is just one among the many large
4308databases on the Internet. Another one is located at NCBI
4309(National Center for Biotechnology
4310Information). Established in 1988 as a national resource for molecular
4311biology information, NCBI creates public databases, conducts research
4312in computational biology, develops software tools for analyzing genome
4313data, and disseminates biomedical information. In this section, we
4314look at one of NCBI's public services, which is called BLAST
4315(Basic Local Alignment Search Tool).
4316
4317You probably know that the information necessary for reproducing living
4318cells is encoded in the genetic material of the cells. The genetic material
4319is a very long chain of four base nucleotides. It is the order of
4320appearance (the sequence) of nucleotides which contains the information
4321about the substance to be produced. Scientists in biotechnology often
4322find a specific fragment, determine the nucleotide sequence, and need
4323to know where the sequence at hand comes from. This is where the large
4324databases enter the game. At NCBI, databases store the knowledge
4325about which sequences have ever been found and where they have been found.
4326When the scientist sends his sequence to the BLAST service, the server
4327looks for regions of genetic material in its database which
4328look the most similar to the delivered nucleotide sequence. After a
4329search time of some seconds or minutes the server sends an answer to
4330the scientist. In order to make access simple, NCBI chose to offer
4331their database service through popular Internet protocols. There are
4332four basic ways to use the so-called BLAST services:
4333
4334@itemize @bullet
4335@item
4336The easiest way to use BLAST is through the web. Users may simply point
4337their browsers at the NCBI home page
4338and link to the BLAST pages.
4339NCBI provides a stable URL that may be used to perform BLAST searches
4340without interactive use of a web browser. This is what we will do later
4341in this section.
4342A demonstration client
4343and a @file{README} file demonstrate how to access this URL.
4344
4345@item
4346Currently,
4347@command{blastcl3} is the standard network BLAST client.
4348You can download @command{blastcl3} from the
4349anonymous FTP location.
4350
4351@item
4352BLAST 2.0 can be run locally as a full executable and can be used to run
4353BLAST searches against private local databases, or downloaded copies of the
4354NCBI databases. BLAST 2.0 executables may be found on the NCBI
4355anonymous FTP server.
4356
4357@item
4358The NCBI BLAST Email server is the best option for people without convenient
4359access to the web. A similarity search can be performed by sending a properly
4360formatted mail message containing the nucleotide or protein query sequence to
4361@email{blast@@ncbi.nlm.nih.gov}. The query sequence is compared against the
4362specified database using the BLAST algorithm and the results are returned in
4363an email message. For more information on formulating email BLAST searches,
4364you can send a message consisting of the word ``HELP'' to the same address,
4365@email{blast@@ncbi.nlm.nih.gov}.
4366@end itemize
4367
4368Our starting point is the demonstration client mentioned in the first option.
4369The @file{README} file that comes along with the client explains the whole
4370process in a nutshell. In the rest of this section, we first show
4371what such requests look like. Then we show how to use @command{gawk} to
4372implement a client in about 10 lines of code. Finally, we show how to
4373interpret the result returned from the service.
4374
4375Sequences are expected to be represented in the standard
4376IUB/IUPAC amino acid and nucleic acid codes,
4377with these exceptions: lower-case letters are accepted and are mapped
4378into upper-case; a single hyphen or dash can be used to represent a gap
4379of indeterminate length; and in amino acid sequences, @samp{U} and @samp{*}
4380are acceptable letters (see below). Before submitting a request, any numerical
4381digits in the query sequence should either be removed or replaced by
4382appropriate letter codes (e.g., @samp{N} for unknown nucleic acid residue
4383or @samp{X} for unknown amino acid residue).
4384The nucleic acid codes supported are:
4385
4386@example
4387A --> adenosine M --> A C (amino)
4388C --> cytidine S --> G C (strong)
4389G --> guanine W --> A T (weak)
4390T --> thymidine B --> G T C
4391U --> uridine D --> G A T
4392R --> G A (purine) H --> A C T
4393Y --> T C (pyrimidine) V --> G C A
4394K --> G T (keto) N --> A G C T (any)
4395 - gap of indeterminate length
4396@end example
4397
4398Now you know the alphabet of nucleotide sequences. The last two lines
4399of the following example query show you such a sequence, which is obviously
4400made up only of elements of the alphabet just described. Store this example
4401query into a file named @file{protbase.request}. You are now ready to send
4402it to the server with the demonstration client.
4403
4404@example
4405@c file eg/network/protbase.request
4406PROGRAM blastn
4407DATALIB month
4408EXPECT 0.75
4409BEGIN
4410>GAWK310 the gawking gene GNU AWK
4411tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat
4412caccaccatggacagcaaa
4413@c endfile
4414@end example
4415
4416@cindex FASTA/Pearson format
4417The actual search request begins with the mandatory parameter @samp{PROGRAM}
4418in the first column followed by the value @samp{blastn} (the name of the
4419program) for searching nucleic acids. The next line contains the mandatory
4420search parameter @samp{DATALIB} with the value @samp{month} for the newest
4421nucleic acid sequences. The third line contains an optional @samp{EXPECT}
4422parameter and the value desired for it. The fourth line contains the
4423mandatory @samp{BEGIN} directive, followed by the query sequence in
4424FASTA/Pearson format.
4425Each line of information must be less than 80 characters in length.
4426
4427The ``month'' database contains all new or revised sequences released in the
4428last 30 days and is useful for searching against new sequences.
4429There are five different blast programs, @command{blastn} being the one that
4430compares a nucleotide query sequence against a nucleotide sequence database.
4431
4432The last server directive that must appear in every request is the
4433@samp{BEGIN} directive. The query sequence should immediately follow the
4434@samp{BEGIN} directive and must appear in FASTA/Pearson format.
4435A sequence in
4436FASTA/Pearson format begins with a single-line description.
4437The description line, which is required, is distinguished from the lines of
4438sequence data that follow it by having a greater-than (@samp{>}) symbol
4439in the first column. For the purposes of the BLAST server, the text of
4440the description is arbitrary.
4441
4442If you prefer to use a client written in @command{gawk}, just store the following
444310 lines of code into a file named @file{protbase.awk} and use this client
4444instead. Invoke it with @samp{gawk -f protbase.awk protbase.request}.
4445Then wait a minute and watch the result coming in. In order to replicate
4446the demonstration client's behaviour as closely as possible, this client
4447does not use a proxy server. We could also have extended the client program
4448in @ref{GETURL, ,Retrieving Web Pages}, to implement the client request from
4449@file{protbase.awk} as a special case.
4450
4451@smallexample
4452@c file eg/network/protbase.awk
4453@{ request = request "\n" $0 @}
4454
4455END @{
4456 BLASTService = "/inet/tcp/0/www.ncbi.nlm.nih.gov/80"
4457 printf "POST /cgi-bin/BLAST/nph-blast_report HTTP/1.0\n" |& BLASTService
4458 printf "Content-Length: " length(request) "\n\n" |& BLASTService
4459 printf request |& BLASTService
4460 while ((BLASTService |& getline) > 0)
4461 print $0
4462 close(BLASTService)
4463@}
4464@c endfile
4465@end smallexample
4466
4467The demonstration client from NCBI is 214 lines long (written in C) and
4468it is not immediately obvious what it does. Our client is so short that
4469it @emph{is} obvious what it does. First it loops over all lines of the
4470query and stores the whole query into a variable. Then the script
4471establishes an Internet connection to the NCBI server and transmits the
4472query by framing it with a proper HTTP request. Finally it receives
4473and prints the complete result coming from the server.
4474
4475Now, let us look at the result. It begins with an HTTP header, which you
4476can ignore. Then there are some comments about the query having been
4477filtered to avoid spuriously high scores. After this, there is a reference
4478to the paper that describes the software being used for searching the data
4479base. After a repitition of the original query's description we find the
4480list of significant alignments:
4481
4482@smallexample
4483@c file eg/network/protbase.result
4484Sequences producing significant alignments: (bits) Value
4485
4486gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733... 38 0.20
4487gb|AC021056.12|AC021056 Homo sapiens chromosome 3 clone RP11-115... 38 0.20
4488emb|AL160278.10|AL160278 Homo sapiens chromosome 9 clone RP11-57... 38 0.20
4489emb|AL391139.11|AL391139 Homo sapiens chromosome X clone RP11-35... 38 0.20
4490emb|AL365192.6|AL365192 Homo sapiens chromosome 6 clone RP3-421H... 38 0.20
4491emb|AL138812.9|AL138812 Homo sapiens chromosome 11 clone RP1-276... 38 0.20
4492gb|AC073881.3|AC073881 Homo sapiens chromosome 15 clone CTD-2169... 38 0.20
4493@c endfile
4494@end smallexample
4495
4496This means that the query sequence was found in seven human chromosomes.
4497But the value 0.20 (20%) means that the probability of an accidental match
4498is rather high (20%) in all cases and should be taken into account.
4499You may wonder what the first column means. It is a key to the specific
4500database in which this occurence was found. The unique sequence identifiers
4501reported in the search results can be used as sequence retrieval keys
4502via the NCBI server. The syntax of sequence header lines used by the NCBI
4503BLAST server depends on the database from which each sequence was obtained.
4504The table below lists the identifiers for the databases from which the
4505sequences were derived.
4506
4507@ifinfo
4508@example
4509Database Name Identifier Syntax
4510============================ ========================
4511GenBank gb|accession|locus
4512EMBL Data Library emb|accession|locus
4513DDBJ, DNA Database of Japan dbj|accession|locus
4514NBRF PIR pir||entry
4515Protein Research Foundation prf||name
4516SWISS-PROT sp|accession|entry name
4517Brookhaven Protein Data Bank pdb|entry|chain
4518Kabat's Sequences of Immuno@dots{} gnl|kabat|identifier
4519Patents pat|country|number
4520GenInfo Backbone Id bbs|number
4521@end example
4522@end ifinfo
4523
4524@ifnotinfo
4525@multitable {Kabat's Sequences of Immuno@dots{}} {@code{@w{sp|accession|entry name}}}
4526@item GenBank @tab @code{gb|accession|locus}
4527@item EMBL Data Library @tab @code{emb|accession|locus}
4528@item DDBJ, DNA Database of Japan @tab @code{dbj|accession|locus}
4529@item NBRF PIR @tab @code{pir||entry}
4530@item Protein Research Foundation @tab @code{prf||name}
4531@item SWISS-PROT @tab @code{@w{sp|accession|entry name}}
4532@item Brookhaven Protein Data Bank @tab @code{pdb|entry|chain}
4533@item Kabat's Sequences of Immuno@dots{} @tab @code{gnl|kabat|identifier}
4534@item Patents @tab @code{pat|country|number}
4535@item GenInfo Backbone Id @tab @code{bbs|number}
4536@end multitable
4537@end ifnotinfo
4538
4539
4540For example, an identifier might be @samp{gb|AC021182.14|AC021182}, where the
4541@samp{gb} tag indicates that the identifier refers to a GenBank sequence,
4542@samp{AC021182.14} is its GenBank ACCESSION, and @samp{AC021182} is the GenBank LOCUS.
4543The identifier contains no spaces, so that a space indicates the end of the
4544identifier.
4545
4546Let us continue in the result listing. Each of the seven alignments mentioned
4547above is subsequently described in detail. We will have a closer look at
4548the first of them.
4549
4550@smallexample
4551>gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733N23, WORKING DRAFT SEQUENCE, 4
4552 unordered pieces
4553 Length = 176383
4554
4555 Score = 38.2 bits (19), Expect = 0.20
4556 Identities = 19/19 (100%)
4557 Strand = Plus / Plus
4558
4559Query: 35 tggtgaagtgtgtttcttg 53
4560 |||||||||||||||||||
4561Sbjct: 69786 tggtgaagtgtgtttcttg 69804
4562@end smallexample
4563
4564This alignment was located on the human chromosome 7. The fragment on which
4565part of the query was found had a total length of 176383. Only 19 of the
4566nucleotides matched and the matching sequence ran from character 35 to 53
4567in the query sequence and from 69786 to 69804 in the fragment on chromosome 7.
4568If you are still reading at this point, you are probably interested in finding
4569out more about Computational Biology and you might appreciate the following
4570hints.
4571
4572@cindex Computational Biology
4573@cindex Bioinformatics
4574@enumerate
4575@item
4576There is a book called @cite{Introduction to Computational Biology}
4577by Michael S. Waterman, which is worth reading if you are seriously
4578interested. You can find a good
4579book review
4580on the Internet.
4581
4582@item
4583While Waterman's book can explain to you the algorithms employed internally
4584in the database search engines, most practicioners prefer to approach
4585the subject differently. The applied side of Computational Biology is
4586called Bioinformatics, and emphasizes the tools available for day-to-day
4587work as well as how to actually @emph{use} them. One of the very few affordable
4588books on Bioinformatics is
4589@cite{Developing Bioinformatics Computer Skills}.
4590
4591@item
4592The sequences @emph{gawk} and @emph{gnuawk} are in widespread use in
4593the genetic material of virtually every earthly living being. Let us
4594take this as a clear indication that the divine creator has intended
4595@code{gawk} to prevail over other scripting languages such as @code{perl},
4596@code{tcl}, or @code{python} which are not even proper sequences. (:-)
4597@end enumerate
4598
4599@node Links, GNU Free Documentation License, Some Applications and Techniques, Top
4600@chapter Related Links
4601
4602This section lists the URLs for various items discussed in this @value{CHAPTER}.
4603They are presented in the order in which they appear.
4604
4605@table @asis
4606
4607@item @cite{Internet Programming with Python}
4608@uref{http://www.fsbassociates.com/books/python.htm}
4609
4610@item @cite{Advanced Perl Programming}
4611@uref{http://www.oreilly.com/catalog/advperl}
4612
4613@item @cite{Web Client Programming with Perl}
4614@uref{http://www.oreilly.com/catalog/webclient}
4615
4616@item Richard Stevens's home page and book
4617@uref{http://www.kohala.com/~rstevens}
4618
4619@item The SPAK home page
4620@uref{http://www.userfriendly.net/linux/RPM/contrib/libc6/i386/spak-0.6b-1.i386.html}
4621
4622@item Volume III of @cite{Internetworking with TCP/IP}, by Comer and Stevens
4623@uref{http://www.cs.purdue.edu/homes/dec/tcpip3s.cont.html}
4624
4625@item XBM Graphics File Format
4626@uref{http://www.wotsit.org/download.asp?f=xbm}
4627
4628@item GNUPlot
4629@uref{http://www.cs.dartmouth.edu/gnuplot_info.html}
4630
4631@item Mark Humphrys' Eliza page
4632@uref{http://www.compapp.dcu.ie/~humphrys/eliza.html}
4633
4634@item Yahoo! Eliza Information
4635@uref{http://dir.yahoo.com/Recreation/Games/Computer_Games/Internet_Games/Web_Games/Artificial_Intelligence}
4636
4637@item Java versions of Eliza
4638@uref{http://www.tjhsst.edu/Psych/ch1/eliza.html}
4639
4640@item Java versions of Eliza with source code
4641@uref{http://home.adelphia.net/~lifeisgood/eliza/eliza.htm}
4642
4643@item Eliza Programs with Explanations
4644@uref{http://chayden.net/chayden/eliza/Eliza.shtml}
4645
4646@item Loebner Contest
4647@uref{http://acm.org/~loebner/loebner-prize.htmlx}
4648
4649@item Tck/Tk Information
4650@uref{http://www.scriptics.com/}
4651
4652@item Intel 80x86 Processors
4653@uref{http://developer.intel.com/design/platform/embedpc/what_is.htm}
4654
4655@item AMD Elan Processors
4656@uref{http://www.amd.com/products/epd/processors/4.32bitcont/32bitcont/index.html}
4657
4658@item XINU
4659@uref{http://willow.canberra.edu.au/~chrisc/xinu.html }
4660
4661@item GNU/Linux
4662@uref{http://uclinux.lineo.com/}
4663
4664@item Embedded PCs
4665@uref{http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Computers/Hardware/Embedded_Control/}
4666
4667@item MiniSQL
4668@uref{http://www.hughes.com.au/library/}
4669
4670@item Market Share Surveys
4671@uref{http://www.netcraft.com/survey}
4672
4673@item @cite{Numerical Recipes in C: The Art of Scientific Computing}
4674@uref{http://www.nr.com}
4675
4676@item VRML
4677@uref{http://www.vrml.org}
4678
4679@item The VRML FAQ
4680@uref{http://www.vrml.org/technicalinfo/specifications/specifications.htm#FAQ}
4681
4682@item The UMBC Agent Web
4683@uref{http://www.cs.umbc.edu/agents }
4684
4685@item Apache Web Server
4686@uref{http://www.apache.org}
4687
4688@item National Center for Biotechnology Information (NCBI)
4689@uref{http://www.ncbi.nlm.nih.gov}
4690
4691@item Basic Local Alignment Search Tool (BLAST)
4692@uref{http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html}
4693
4694@item NCBI Home Page
4695@uref{http://www.ncbi.nlm.nih.gov}
4696
4697@item BLAST Pages
4698@uref{http://www.ncbi.nlm.nih.gov/BLAST}
4699
4700@item BLAST Demonstration Client
4701@uref{ftp://ncbi.nlm.nih.gov/blast/blasturl/}
4702
4703@item BLAST anonymous FTP location
4704@uref{ftp://ncbi.nlm.nih.gov/blast/network/netblast/}
4705
4706@item BLAST 2.0 Executables
4707@uref{ftp://ncbi.nlm.nih.gov/blast/executables/}
4708
4709@item IUB/IUPAC Amino Acid and Nucleic Acid Codes
4710@uref{http://www.uthscsa.edu/geninfo/blastmail.html#item6}
4711
4712@item FASTA/Pearson Format
4713@uref{http://www.ncbi.nlm.nih.gov/BLAST/fasta.html}
4714
4715@item Fasta/Pearson Sequence in Java
4716@uref{http://www.kazusa.or.jp/java/codon_table_java/}
4717
4718@item Book Review of @cite{Introduction to Computational Biology}
4719@uref{http://www.acm.org/crossroads/xrds5-1/introcb.html}
4720
4721@item @cite{Developing Bioinformatics Computer Skills}
4722@uref{http://www.oreilly.com/catalog/bioskills/}
4723
4724@end table
4725
4726@node GNU Free Documentation License
4727@unnumbered GNU Free Documentation License
4728
4729@cindex FDL (Free Documentation License)
4730@cindex Free Documentation License (FDL)
4731@cindex GNU Free Documentation License
4732@center Version 1.2, November 2002
4733
4734@display
4735Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
473651 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
4737
4738Everyone is permitted to copy and distribute verbatim copies
4739of this license document, but changing it is not allowed.
4740@end display
4741
4742@enumerate 0
4743@item
4744PREAMBLE
4745
4746The purpose of this License is to make a manual, textbook, or other
4747functional and useful document @dfn{free} in the sense of freedom: to
4748assure everyone the effective freedom to copy and redistribute it,
4749with or without modifying it, either commercially or noncommercially.
4750Secondarily, this License preserves for the author and publisher a way
4751to get credit for their work, while not being considered responsible
4752for modifications made by others.
4753
4754This License is a kind of ``copyleft'', which means that derivative
4755works of the document must themselves be free in the same sense. It
4756complements the GNU General Public License, which is a copyleft
4757license designed for free software.
4758
4759We have designed this License in order to use it for manuals for free
4760software, because free software needs free documentation: a free
4761program should come with manuals providing the same freedoms that the
4762software does. But this License is not limited to software manuals;
4763it can be used for any textual work, regardless of subject matter or
4764whether it is published as a printed book. We recommend this License
4765principally for works whose purpose is instruction or reference.
4766
4767@item
4768APPLICABILITY AND DEFINITIONS
4769
4770This License applies to any manual or other work, in any medium, that
4771contains a notice placed by the copyright holder saying it can be
4772distributed under the terms of this License. Such a notice grants a
4773world-wide, royalty-free license, unlimited in duration, to use that
4774work under the conditions stated herein. The ``Document'', below,
4775refers to any such manual or work. Any member of the public is a
4776licensee, and is addressed as ``you''. You accept the license if you
4777copy, modify or distribute the work in a way requiring permission
4778under copyright law.
4779
4780A ``Modified Version'' of the Document means any work containing the
4781Document or a portion of it, either copied verbatim, or with
4782modifications and/or translated into another language.
4783
4784A ``Secondary Section'' is a named appendix or a front-matter section
4785of the Document that deals exclusively with the relationship of the
4786publishers or authors of the Document to the Document's overall
4787subject (or to related matters) and contains nothing that could fall
4788directly within that overall subject. (Thus, if the Document is in
4789part a textbook of mathematics, a Secondary Section may not explain
4790any mathematics.) The relationship could be a matter of historical
4791connection with the subject or with related matters, or of legal,
4792commercial, philosophical, ethical or political position regarding
4793them.
4794
4795The ``Invariant Sections'' are certain Secondary Sections whose titles
4796are designated, as being those of Invariant Sections, in the notice
4797that says that the Document is released under this License. If a
4798section does not fit the above definition of Secondary then it is not
4799allowed to be designated as Invariant. The Document may contain zero
4800Invariant Sections. If the Document does not identify any Invariant
4801Sections then there are none.
4802
4803The ``Cover Texts'' are certain short passages of text that are listed,
4804as Front-Cover Texts or Back-Cover Texts, in the notice that says that
4805the Document is released under this License. A Front-Cover Text may
4806be at most 5 words, and a Back-Cover Text may be at most 25 words.
4807
4808A ``Transparent'' copy of the Document means a machine-readable copy,
4809represented in a format whose specification is available to the
4810general public, that is suitable for revising the document
4811straightforwardly with generic text editors or (for images composed of
4812pixels) generic paint programs or (for drawings) some widely available
4813drawing editor, and that is suitable for input to text formatters or
4814for automatic translation to a variety of formats suitable for input
4815to text formatters. A copy made in an otherwise Transparent file
4816format whose markup, or absence of markup, has been arranged to thwart
4817or discourage subsequent modification by readers is not Transparent.
4818An image format is not Transparent if used for any substantial amount
4819of text. A copy that is not ``Transparent'' is called ``Opaque''.
4820
4821Examples of suitable formats for Transparent copies include plain
4822@sc{ascii} without markup, Texinfo input format, La@TeX{} input
4823format, @acronym{SGML} or @acronym{XML} using a publicly available
4824@acronym{DTD}, and standard-conforming simple @acronym{HTML},
4825PostScript or @acronym{PDF} designed for human modification. Examples
4826of transparent image formats include @acronym{PNG}, @acronym{XCF} and
4827@acronym{JPG}. Opaque formats include proprietary formats that can be
4828read and edited only by proprietary word processors, @acronym{SGML} or
4829@acronym{XML} for which the @acronym{DTD} and/or processing tools are
4830not generally available, and the machine-generated @acronym{HTML},
4831PostScript or @acronym{PDF} produced by some word processors for
4832output purposes only.
4833
4834The ``Title Page'' means, for a printed book, the title page itself,
4835plus such following pages as are needed to hold, legibly, the material
4836this License requires to appear in the title page. For works in
4837formats which do not have any title page as such, ``Title Page'' means
4838the text near the most prominent appearance of the work's title,
4839preceding the beginning of the body of the text.
4840
4841A section ``Entitled XYZ'' means a named subunit of the Document whose
4842title either is precisely XYZ or contains XYZ in parentheses following
4843text that translates XYZ in another language. (Here XYZ stands for a
4844specific section name mentioned below, such as ``Acknowledgements'',
4845``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
4846of such a section when you modify the Document means that it remains a
4847section ``Entitled XYZ'' according to this definition.
4848
4849The Document may include Warranty Disclaimers next to the notice which
4850states that this License applies to the Document. These Warranty
4851Disclaimers are considered to be included by reference in this
4852License, but only as regards disclaiming warranties: any other
4853implication that these Warranty Disclaimers may have is void and has
4854no effect on the meaning of this License.
4855
4856@item
4857VERBATIM COPYING
4858
4859You may copy and distribute the Document in any medium, either
4860commercially or noncommercially, provided that this License, the
4861copyright notices, and the license notice saying this License applies
4862to the Document are reproduced in all copies, and that you add no other
4863conditions whatsoever to those of this License. You may not use
4864technical measures to obstruct or control the reading or further
4865copying of the copies you make or distribute. However, you may accept
4866compensation in exchange for copies. If you distribute a large enough
4867number of copies you must also follow the conditions in section 3.
4868
4869You may also lend copies, under the same conditions stated above, and
4870you may publicly display copies.
4871
4872@item
4873COPYING IN QUANTITY
4874
4875If you publish printed copies (or copies in media that commonly have
4876printed covers) of the Document, numbering more than 100, and the
4877Document's license notice requires Cover Texts, you must enclose the
4878copies in covers that carry, clearly and legibly, all these Cover
4879Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
4880the back cover. Both covers must also clearly and legibly identify
4881you as the publisher of these copies. The front cover must present
4882the full title with all words of the title equally prominent and
4883visible. You may add other material on the covers in addition.
4884Copying with changes limited to the covers, as long as they preserve
4885the title of the Document and satisfy these conditions, can be treated
4886as verbatim copying in other respects.
4887
4888If the required texts for either cover are too voluminous to fit
4889legibly, you should put the first ones listed (as many as fit
4890reasonably) on the actual cover, and continue the rest onto adjacent
4891pages.
4892
4893If you publish or distribute Opaque copies of the Document numbering
4894more than 100, you must either include a machine-readable Transparent
4895copy along with each Opaque copy, or state in or with each Opaque copy
4896a computer-network location from which the general network-using
4897public has access to download using public-standard network protocols
4898a complete Transparent copy of the Document, free of added material.
4899If you use the latter option, you must take reasonably prudent steps,
4900when you begin distribution of Opaque copies in quantity, to ensure
4901that this Transparent copy will remain thus accessible at the stated
4902location until at least one year after the last time you distribute an
4903Opaque copy (directly or through your agents or retailers) of that
4904edition to the public.
4905
4906It is requested, but not required, that you contact the authors of the
4907Document well before redistributing any large number of copies, to give
4908them a chance to provide you with an updated version of the Document.
4909
4910@item
4911MODIFICATIONS
4912
4913You may copy and distribute a Modified Version of the Document under
4914the conditions of sections 2 and 3 above, provided that you release
4915the Modified Version under precisely this License, with the Modified
4916Version filling the role of the Document, thus licensing distribution
4917and modification of the Modified Version to whoever possesses a copy
4918of it. In addition, you must do these things in the Modified Version:
4919
4920@enumerate A
4921@item
4922Use in the Title Page (and on the covers, if any) a title distinct
4923from that of the Document, and from those of previous versions
4924(which should, if there were any, be listed in the History section
4925of the Document). You may use the same title as a previous version
4926if the original publisher of that version gives permission.
4927
4928@item
4929List on the Title Page, as authors, one or more persons or entities
4930responsible for authorship of the modifications in the Modified
4931Version, together with at least five of the principal authors of the
4932Document (all of its principal authors, if it has fewer than five),
4933unless they release you from this requirement.
4934
4935@item
4936State on the Title page the name of the publisher of the
4937Modified Version, as the publisher.
4938
4939@item
4940Preserve all the copyright notices of the Document.
4941
4942@item
4943Add an appropriate copyright notice for your modifications
4944adjacent to the other copyright notices.
4945
4946@item
4947Include, immediately after the copyright notices, a license notice
4948giving the public permission to use the Modified Version under the
4949terms of this License, in the form shown in the Addendum below.
4950
4951@item
4952Preserve in that license notice the full lists of Invariant Sections
4953and required Cover Texts given in the Document's license notice.
4954
4955@item
4956Include an unaltered copy of this License.
4957
4958@item
4959Preserve the section Entitled ``History'', Preserve its Title, and add
4960to it an item stating at least the title, year, new authors, and
4961publisher of the Modified Version as given on the Title Page. If
4962there is no section Entitled ``History'' in the Document, create one
4963stating the title, year, authors, and publisher of the Document as
4964given on its Title Page, then add an item describing the Modified
4965Version as stated in the previous sentence.
4966
4967@item
4968Preserve the network location, if any, given in the Document for
4969public access to a Transparent copy of the Document, and likewise
4970the network locations given in the Document for previous versions
4971it was based on. These may be placed in the ``History'' section.
4972You may omit a network location for a work that was published at
4973least four years before the Document itself, or if the original
4974publisher of the version it refers to gives permission.
4975
4976@item
4977For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
4978the Title of the section, and preserve in the section all the
4979substance and tone of each of the contributor acknowledgements and/or
4980dedications given therein.
4981
4982@item
4983Preserve all the Invariant Sections of the Document,
4984unaltered in their text and in their titles. Section numbers
4985or the equivalent are not considered part of the section titles.
4986
4987@item
4988Delete any section Entitled ``Endorsements''. Such a section
4989may not be included in the Modified Version.
4990
4991@item
4992Do not retitle any existing section to be Entitled ``Endorsements'' or
4993to conflict in title with any Invariant Section.
4994
4995@item
4996Preserve any Warranty Disclaimers.
4997@end enumerate
4998
4999If the Modified Version includes new front-matter sections or
5000appendices that qualify as Secondary Sections and contain no material
5001copied from the Document, you may at your option designate some or all
5002of these sections as invariant. To do this, add their titles to the
5003list of Invariant Sections in the Modified Version's license notice.
5004These titles must be distinct from any other section titles.
5005
5006You may add a section Entitled ``Endorsements'', provided it contains
5007nothing but endorsements of your Modified Version by various
5008parties---for example, statements of peer review or that the text has
5009been approved by an organization as the authoritative definition of a
5010standard.
5011
5012You may add a passage of up to five words as a Front-Cover Text, and a
5013passage of up to 25 words as a Back-Cover Text, to the end of the list
5014of Cover Texts in the Modified Version. Only one passage of
5015Front-Cover Text and one of Back-Cover Text may be added by (or
5016through arrangements made by) any one entity. If the Document already
5017includes a cover text for the same cover, previously added by you or
5018by arrangement made by the same entity you are acting on behalf of,
5019you may not add another; but you may replace the old one, on explicit
5020permission from the previous publisher that added the old one.
5021
5022The author(s) and publisher(s) of the Document do not by this License
5023give permission to use their names for publicity for or to assert or
5024imply endorsement of any Modified Version.
5025
5026@item
5027COMBINING DOCUMENTS
5028
5029You may combine the Document with other documents released under this
5030License, under the terms defined in section 4 above for modified
5031versions, provided that you include in the combination all of the
5032Invariant Sections of all of the original documents, unmodified, and
5033list them all as Invariant Sections of your combined work in its
5034license notice, and that you preserve all their Warranty Disclaimers.
5035
5036The combined work need only contain one copy of this License, and
5037multiple identical Invariant Sections may be replaced with a single
5038copy. If there are multiple Invariant Sections with the same name but
5039different contents, make the title of each such section unique by
5040adding at the end of it, in parentheses, the name of the original
5041author or publisher of that section if known, or else a unique number.
5042Make the same adjustment to the section titles in the list of
5043Invariant Sections in the license notice of the combined work.
5044
5045In the combination, you must combine any sections Entitled ``History''
5046in the various original documents, forming one section Entitled
5047``History''; likewise combine any sections Entitled ``Acknowledgements'',
5048and any sections Entitled ``Dedications''. You must delete all
5049sections Entitled ``Endorsements.''
5050
5051@item
5052COLLECTIONS OF DOCUMENTS
5053
5054You may make a collection consisting of the Document and other documents
5055released under this License, and replace the individual copies of this
5056License in the various documents with a single copy that is included in
5057the collection, provided that you follow the rules of this License for
5058verbatim copying of each of the documents in all other respects.
5059
5060You may extract a single document from such a collection, and distribute
5061it individually under this License, provided you insert a copy of this
5062License into the extracted document, and follow this License in all
5063other respects regarding verbatim copying of that document.
5064
5065@item
5066AGGREGATION WITH INDEPENDENT WORKS
5067
5068A compilation of the Document or its derivatives with other separate
5069and independent documents or works, in or on a volume of a storage or
5070distribution medium, is called an ``aggregate'' if the copyright
5071resulting from the compilation is not used to limit the legal rights
5072of the compilation's users beyond what the individual works permit.
5073When the Document is included an aggregate, this License does not
5074apply to the other works in the aggregate which are not themselves
5075derivative works of the Document.
5076
5077If the Cover Text requirement of section 3 is applicable to these
5078copies of the Document, then if the Document is less than one half of
5079the entire aggregate, the Document's Cover Texts may be placed on
5080covers that bracket the Document within the aggregate, or the
5081electronic equivalent of covers if the Document is in electronic form.
5082Otherwise they must appear on printed covers that bracket the whole
5083aggregate.
5084
5085@item
5086TRANSLATION
5087
5088Translation is considered a kind of modification, so you may
5089distribute translations of the Document under the terms of section 4.
5090Replacing Invariant Sections with translations requires special
5091permission from their copyright holders, but you may include
5092translations of some or all Invariant Sections in addition to the
5093original versions of these Invariant Sections. You may include a
5094translation of this License, and all the license notices in the
5095Document, and any Warrany Disclaimers, provided that you also include
5096the original English version of this License and the original versions
5097of those notices and disclaimers. In case of a disagreement between
5098the translation and the original version of this License or a notice
5099or disclaimer, the original version will prevail.
5100
5101If a section in the Document is Entitled ``Acknowledgements'',
5102``Dedications'', or ``History'', the requirement (section 4) to Preserve
5103its Title (section 1) will typically require changing the actual
5104title.
5105
5106@item
5107TERMINATION
5108
5109You may not copy, modify, sublicense, or distribute the Document except
5110as expressly provided for under this License. Any other attempt to
5111copy, modify, sublicense or distribute the Document is void, and will
5112automatically terminate your rights under this License. However,
5113parties who have received copies, or rights, from you under this
5114License will not have their licenses terminated so long as such
5115parties remain in full compliance.
5116
5117@item
5118FUTURE REVISIONS OF THIS LICENSE
5119
5120The Free Software Foundation may publish new, revised versions
5121of the GNU Free Documentation License from time to time. Such new
5122versions will be similar in spirit to the present version, but may
5123differ in detail to address new problems or concerns. See
5124@uref{http://www.gnu.org/copyleft/}.
5125
5126Each version of the License is given a distinguishing version number.
5127If the Document specifies that a particular numbered version of this
5128License ``or any later version'' applies to it, you have the option of
5129following the terms and conditions either of that specified version or
5130of any later version that has been published (not as a draft) by the
5131Free Software Foundation. If the Document does not specify a version
5132number of this License, you may choose any version ever published (not
5133as a draft) by the Free Software Foundation.
5134@end enumerate
5135
5136@c fakenode --- for prepinfo
5137@unnumberedsec ADDENDUM: How to use this License for your documents
5138
5139To use this License in a document you have written, include a copy of
5140the License in the document and put the following copyright and
5141license notices just after the title page:
5142
5143@smallexample
5144@group
5145 Copyright (C) @var{year} @var{your name}.
5146 Permission is granted to copy, distribute and/or modify this document
5147 under the terms of the GNU Free Documentation License, Version 1.2
5148 or any later version published by the Free Software Foundation;
5149 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
5150 A copy of the license is included in the section entitled ``GNU
5151 Free Documentation License''.
5152@end group
5153@end smallexample
5154
5155If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
5156replace the ``with...Texts.'' line with this:
5157
5158@smallexample
5159@group
5160 with the Invariant Sections being @var{list their titles}, with
5161 the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
5162 being @var{list}.
5163@end group
5164@end smallexample
5165
5166If you have Invariant Sections without Cover Texts, or some other
5167combination of the three, merge those two alternatives to suit the
5168situation.
5169
5170If your document contains nontrivial examples of program code, we
5171recommend releasing these examples in parallel under your choice of
5172free software license, such as the GNU General Public License,
5173to permit their use in free software.
5174
5175@c Local Variables:
5176@c ispell-local-pdict: "ispell-dict"
5177@c End:
5178
5179
5180@node Index, , GNU Free Documentation License, Top
5181@comment node-name, next, previous, up
5182
5183@unnumbered Index
5184@printindex cp
5185@bye
5186
5187Conventions:
51881. Functions, built-in or otherwise, do NOT have () after them.
51892. Gawk built-in vars and functions are in @code. Also program vars and
5190 functions.
51913. HTTP method names are in @code.
51924. Protocols such as echo, ftp, etc are in @samp.
51935. URLs are in @url.
51946. All RFC's in the index. Put a space between `RFC' and the number.
Note: See TracBrowser for help on using the repository browser.