source: trunk/essentials/sys-apps/gawk/doc/gawkinet.info

Last change on this file was 3076, checked in by bird, 18 years ago

gawk 3.1.5

File size: 204.4 KB
Line 
1This is gawkinet.info, produced by makeinfo version 4.6 from
2gawkinet.texi.
3
4INFO-DIR-SECTION Network applications
5START-INFO-DIR-ENTRY
6* Gawkinet: (gawkinet). TCP/IP Internetworking With `gawk'.
7END-INFO-DIR-ENTRY
8
9This is Edition 1.1 of `TCP/IP Internetworking With `gawk'', for the
103.1.4 (or later) version of the GNU implementation of AWK.
11
12
13 Copyright (C) 2000, 2001, 2002, 2004 Free Software Foundation, Inc.
14
15
16 Permission is granted to copy, distribute and/or modify this document
17under the terms of the GNU Free Documentation License, Version 1.2 or
18any later version published by the Free Software Foundation; with the
19Invariant Sections being "GNU General Public License", the Front-Cover
20texts being (a) (see below), and with the Back-Cover Texts being (b)
21(see below). A copy of the license is included in the section entitled
22"GNU Free Documentation License".
23
24 a. "A GNU Manual"
25
26 b. "You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development."
29
30 This file documents the networking features in GNU `awk'.
31
32This is Edition 1.1 of `TCP/IP Internetworking With `gawk'', for the
333.1.4 (or later) version of the GNU implementation of AWK.
34
35
36 Copyright (C) 2000, 2001, 2002, 2004 Free Software Foundation, Inc.
37
38
39 Permission is granted to copy, distribute and/or modify this document
40under the terms of the GNU Free Documentation License, Version 1.2 or
41any later version published by the Free Software Foundation; with the
42Invariant Sections being "GNU General Public License", the Front-Cover
43texts being (a) (see below), and with the Back-Cover Texts being (b)
44(see below). A copy of the license is included in the section entitled
45"GNU Free Documentation License".
46
47 a. "A GNU Manual"
48
49 b. "You have freedom to copy and modify this GNU Manual, like GNU
50 software. Copies published by the Free Software Foundation raise
51 funds for GNU development."
52
53
54File: gawkinet.info, Node: Top, Next: Preface, Prev: (dir), Up: (dir)
55
56General Introduction
57********************
58
59This file documents the networking features in GNU Awk (`gawk') version
603.1 and later.
61
62This is Edition 1.1 of `TCP/IP Internetworking With `gawk'', for the
633.1.4 (or later) version of the GNU implementation of AWK.
64
65
66 Copyright (C) 2000, 2001, 2002, 2004 Free Software Foundation, Inc.
67
68
69 Permission is granted to copy, distribute and/or modify this document
70under the terms of the GNU Free Documentation License, Version 1.2 or
71any later version published by the Free Software Foundation; with the
72Invariant Sections being "GNU General Public License", the Front-Cover
73texts being (a) (see below), and with the Back-Cover Texts being (b)
74(see below). A copy of the license is included in the section entitled
75"GNU Free Documentation License".
76
77 a. "A GNU Manual"
78
79 b. "You have freedom to copy and modify this GNU Manual, like GNU
80 software. Copies published by the Free Software Foundation raise
81 funds for GNU development."
82
83* Menu:
84
85* Preface:: About this document.
86* Introduction:: About networkiing.
87* Using Networking:: Some examples.
88* Some Applications and Techniques:: More extended examples.
89* Links:: Where to find the stuff mentioned in this
90 document.
91* GNU Free Documentation License:: The license for this document.
92* Index:: The index.
93
94* Stream Communications:: Sending data streams.
95* Datagram Communications:: Sending self-contained messages.
96* The TCP/IP Protocols:: How these models work in the Internet.
97* Basic Protocols:: The basic protocols.
98* Ports:: The idea behind ports.
99* Making Connections:: Making TCP/IP connections.
100* Gawk Special Files:: How to do `gawk' networking.
101* Special File Fields:: The fields in the special file name.
102* Comparing Protocols:: Differences between the protocols.
103* File /inet/tcp:: The TCP special file.
104* File /inet/udp:: The UDP special file.
105* File /inet/raw:: The RAW special file.
106* TCP Connecting:: Making a TCP connection.
107* Troubleshooting:: Troubleshooting TCP/IP connections.
108* Interacting:: Interacting with a service.
109* Setting Up:: Setting up a service.
110* Email:: Reading email.
111* Web page:: Reading a Web page.
112* Primitive Service:: A primitive Web service.
113* Interacting Service:: A Web service with interaction.
114* CGI Lib:: A simple CGI library.
115* Simple Server:: A simple Web server.
116* Caveats:: Network programming caveats.
117* Challenges:: Where to go from here.
118* PANIC:: An Emergency Web Server.
119* GETURL:: Retrieving Web Pages.
120* REMCONF:: Remote Configuration Of Embedded Systems.
121* URLCHK:: Look For Changed Web Pages.
122* WEBGRAB:: Extract Links From A Page.
123* STATIST:: Graphing A Statistical Distribution.
124* MAZE:: Walking Through A Maze In Virtual Reality.
125* MOBAGWHO:: A Simple Mobile Agent.
126* STOXPRED:: Stock Market Prediction As A Service.
127* PROTBASE:: Searching Through A Protein Database.
128
129
130File: gawkinet.info, Node: Preface, Next: Introduction, Prev: Top, Up: Top
131
132Preface
133*******
134
135In May of 1997, Ju"rgen Kahrs felt the need for network access from
136`awk', and, with a little help from me, set about adding features to do
137this for `gawk'. At that time, he wrote the bulk of this Info file.
138
139 The code and documentation were added to the `gawk' 3.1 development
140tree, and languished somewhat until I could finally get down to some
141serious work on that version of `gawk'. This finally happened in the
142middle of 2000.
143
144 Meantime, Ju"rgen wrote an article about the Internet special files
145and `|&' operator for `Linux Journal', and made a networking patch for
146the production versions of `gawk' available from his home page. In
147August of 2000 (for `gawk' 3.0.6), this patch also made it to the main
148GNU `ftp' distribution site.
149
150 For release with `gawk', I edited Ju"rgen's prose for English
151grammar and style, as he is not a native English speaker. I also
152rearranged the material somewhat for what I felt was a better order of
153presentation, and (re)wrote some of the introductory material.
154
155 The majority of this document and the code are his work, and the
156high quality and interesting ideas speak for themselves. It is my hope
157that these features will be of significant value to the `awk' community.
158
159
160Arnold Robbins
161Nof Ayalon, ISRAEL
162March, 2001
163
164
165File: gawkinet.info, Node: Introduction, Next: Using Networking, Prev: Preface, Up: Top
166
1671 Networking Concepts
168*********************
169
170This major node provides a (necessarily) brief intoduction to computer
171networking concepts. For many applications of `gawk' to TCP/IP
172networking, we hope that this is enough. For more advanced tasks, you
173will need deeper background, and it may be necessary to switch to
174lower-level programming in C or C++.
175
176 There are two real-life models for the way computers send messages
177to each other over a network. While the analogies are not perfect,
178they are close enough to convey the major concepts. These two models
179are the phone system (reliable byte-stream communications), and the
180postal system (best-effort datagrams).
181
182* Menu:
183
184* Stream Communications:: Sending data streams.
185* Datagram Communications:: Sending self-contained messages.
186* The TCP/IP Protocols:: How these models work in the Internet.
187* Making Connections:: Making TCP/IP connections.
188
189
190File: gawkinet.info, Node: Stream Communications, Next: Datagram Communications, Prev: Introduction, Up: Introduction
191
1921.1 Reliable Byte-streams (Phone Calls)
193=======================================
194
195When you make a phone call, the following steps occur:
196
197 1. You dial a number.
198
199 2. The phone system connects to the called party, telling them there
200 is an incoming call. (Their phone rings.)
201
202 3. The other party answers the call, or, in the case of a computer
203 network, refuses to answer the call.
204
205 4. Assuming the other party answers, the connection between you is
206 now a "duplex" (two-way), "reliable" (no data lost), sequenced
207 (data comes out in the order sent) data stream.
208
209 5. You and your friend may now talk freely, with the phone system
210 moving the data (your voices) from one end to the other. From
211 your point of view, you have a direct end-to-end connection with
212 the person on the other end.
213
214 The same steps occur in a duplex reliable computer networking
215connection. There is considerably more overhead in setting up the
216communications, but once it's done, data moves in both directions,
217reliably, in sequence.
218
219
220File: gawkinet.info, Node: Datagram Communications, Next: The TCP/IP Protocols, Prev: Stream Communications, Up: Introduction
221
2221.2 Best-effort Datagrams (Mailed Letters)
223==========================================
224
225Suppose you mail three different documents to your office on the other
226side of the country on two different days. Doing so entails the
227following.
228
229 1. Each document travels in its own envelope.
230
231 2. Each envelope contains both the sender and the recipient address.
232
233 3. Each envelope may travel a different route to its destination.
234
235 4. The envelopes may arrive in a different order from the one in
236 which they were sent.
237
238 5. One or more may get lost in the mail. (Although, fortunately,
239 this does not occur very often.)
240
241 6. In a computer network, one or more "packets" may also arrive
242 multiple times. (This doesn't happen with the postal system!)
243
244
245 The important characteristics of datagram communications, like those
246of the postal system are thus:
247
248 * Delivery is "best effort;" the data may never get there.
249
250 * Each message is self-contained, including the source and
251 destination addresses.
252
253 * Delivery is _not_ sequenced; packets may arrive out of order,
254 and/or multiple times.
255
256 * Unlike the phone system, overhead is considerably lower. It is
257 not necessary to set up the call first.
258
259 The price the user pays for the lower overhead of datagram
260communications is exactly the lower reliability; it is often necessary
261for user-level protocols that use datagram communications to add their
262own reliability features on top of the basic communications.
263
264
265File: gawkinet.info, Node: The TCP/IP Protocols, Next: Making Connections, Prev: Datagram Communications, Up: Introduction
266
2671.3 The Internet Protocols
268==========================
269
270The Internet Protocol Suite (usually referred to as just TCP/IP)(1)
271consists of a number of different protocols at different levels or
272"layers." For our purposes, three protocols provide the fundamental
273communications mechanisms. All other defined protocols are referred to
274as user-level protocols (e.g., HTTP, used later in this Info file).
275
276* Menu:
277
278* Basic Protocols:: The basic protocols.
279* Ports:: The idea behind ports.
280
281 ---------- Footnotes ----------
282
283 (1) It should be noted that although the Internet seems to have
284conquered the world, there are other networking protocol suites in
285existence and in use.
286
287
288File: gawkinet.info, Node: Basic Protocols, Next: Ports, Prev: The TCP/IP Protocols, Up: The TCP/IP Protocols
289
2901.3.1 The Basic Internet Protocols
291----------------------------------
292
293IP
294 The Internet Protocol. This protocol is almost never used
295 directly by applications. It provides the basic packet delivery
296 and routing infrastructure of the Internet. Much like the phone
297 company's switching centers or the Post Office's trucks, it is not
298 of much day-to-day interest to the regular user (or programmer).
299 It happens to be a best effort datagram protocol.
300
301UDP
302 The User Datagram Protocol. This is a best effort datagram
303 protocol. It provides a small amount of extra reliability over
304 IP, and adds the notion of "ports", described in *Note TCP and UDP
305 Ports: Ports.
306
307TCP
308 The Transmission Control Protocol. This is a duplex, reliable,
309 sequenced byte-stream protocol, again layered on top of IP, and
310 also providing the notion of ports. This is the protocol that you
311 will most likely use when using `gawk' for network programming.
312
313 All other user-level protocols use either TCP or UDP to do their
314basic communications. Examples are SMTP (Simple Mail Transfer
315Protocol), FTP (File Transfer Protocol), and HTTP (HyperText Transfer
316Protocol).
317
318
319File: gawkinet.info, Node: Ports, Prev: Basic Protocols, Up: The TCP/IP Protocols
320
3211.3.2 TCP and UDP Ports
322-----------------------
323
324In the postal system, the address on an envelope indicates a physical
325location, such as a residence or office building. But there may be
326more than one person at a location; thus you have to further quantify
327the recipient by putting a person or company name on the envelope.
328
329 In the phone system, one phone number may represent an entire
330company, in which case you need a person's extension number in order to
331reach that individual directly. Or, when you call a home, you have to
332say, "May I please speak to ..." before talking to the person directly.
333
334 IP networking provides the concept of addressing. An IP address
335represents a particular computer, but no more. In order to reach the
336mail service on a system, or the FTP or WWW service on a system, you
337must have some way to further specify which service you want. In the
338Internet Protocol suite, this is done with "port numbers", which
339represent the services, much like an extension number used with a phone
340number.
341
342 Port numbers are 16-bit integers. Unix and Unix-like systems
343reserve ports below 1024 for "well known" services, such as SMTP, FTP,
344and HTTP. Numbers 1024 and above may be used by any application,
345although there is no promise made that a particular port number is
346always available.
347
348
349File: gawkinet.info, Node: Making Connections, Prev: The TCP/IP Protocols, Up: Introduction
350
3511.4 Making TCP/IP Connections (And Some Terminology)
352====================================================
353
354Two terms come up repeatedly when discussing networking: "client" and
355"server". For now, we'll discuss these terms at the "connection
356level", when first establishing connections between two processes on
357different systems over a network. (Once the connection is established,
358the higher level, or "application level" protocols, such as HTTP or
359FTP, determine who is the client and who is the server. Often, it
360turns out that the client and server are the same in both roles.)
361
362 The "server" is the system providing the service, such as the web
363server or email server. It is the "host" (system) which is _connected
364to_ in a transaction. For this to work though, the server must be
365expecting connections. Much as there has to be someone at the office
366building to answer the phone(1), the server process (usually) has to be
367started first and be waiting for a connection.
368
369 The "client" is the system requesting the service. It is the system
370_initiating the connection_ in a transaction. (Just as when you pick
371up the phone to call an office or store.)
372
373 In the TCP/IP framework, each end of a connection is represented by
374a pair of (ADDRESS, PORT) pairs. For the duration of the connection,
375the ports in use at each end are unique, and cannot be used
376simultaneously by other processes on the same system. (Only after
377closing a connection can a new one be built up on the same port. This
378is contrary to the usual behavior of fully developed web servers which
379have to avoid situations in which they are not reachable. We have to
380pay this price in order to enjoy the benefits of a simple communication
381paradigm in `gawk'.)
382
383 Furthermore, once the connection is established, communications are
384"synchronous".(2) I.e., each end waits on the other to finish
385transmitting, before replying. This is much like two people in a phone
386conversation. While both could talk simultaneously, doing so usually
387doesn't work too well.
388
389 In the case of TCP, the synchronicity is enforced by the protocol
390when sending data. Data writes "block" until the data have been
391received on the other end. For both TCP and UDP, data reads block
392until there is incoming data waiting to be read. This is summarized in
393the following table, where an "X" indicates that the given action
394blocks.
395
396TCP X X
397UDP X
398RAW X
399
400 ---------- Footnotes ----------
401
402 (1) In the days before voice mail systems!
403
404 (2) For the technically savvy, data reads block--if there's no
405incoming data, the program is made to wait until there is, instead of
406receiving a "there's no data" error return.
407
408
409File: gawkinet.info, Node: Using Networking, Next: Some Applications and Techniques, Prev: Introduction, Up: Top
410
4112 Networking With `gawk'
412************************
413
414The `awk' programming language was originally developed as a
415pattern-matching language for writing short programs to perform data
416manipulation tasks. `awk''s strength is the manipulation of textual
417data that is stored in files. It was never meant to be used for
418networking purposes. To exploit its features in a networking context,
419it's necessary to use an access mode for network connections that
420resembles the access of files as closely as possible.
421
422 `awk' is also meant to be a prototyping language. It is used to
423demonstrate feasibility and to play with features and user interfaces.
424This can be done with file-like handling of network connections.
425`gawk' trades the lack of many of the advanced features of the TCP/IP
426family of protocols for the convenience of simple connection handling.
427The advanced features are available when programming in C or Perl. In
428fact, the network programming in this major node is very similar to
429what is described in books such as `Internet Programming with Python',
430`Advanced Perl Programming', or `Web Client Programming with Perl'.
431
432 However, you can do the programming here without first having to
433learn object-oriented ideology; underlying languages such as Tcl/Tk,
434Perl, Python; or all of the libraries necessary to extend these
435languages before they are ready for the Internet.
436
437 This major node demonstrates how to use the TCP protocol. The other
438protocols are much less important for most users (UDP) or even
439untractable (RAW).
440
441* Menu:
442
443* Gawk Special Files:: How to do `gawk' networking.
444* TCP Connecting:: Making a TCP connection.
445* Troubleshooting:: Troubleshooting TCP/IP connections.
446* Interacting:: Interacting with a service.
447* Setting Up:: Setting up a service.
448* Email:: Reading email.
449* Web page:: Reading a Web page.
450* Primitive Service:: A primitive Web service.
451* Interacting Service:: A Web service with interaction.
452* Simple Server:: A simple Web server.
453* Caveats:: Network programming caveats.
454* Challenges:: Where to go from here.
455
456
457File: gawkinet.info, Node: Gawk Special Files, Next: TCP Connecting, Prev: Using Networking, Up: Using Networking
458
4592.1 `gawk''s Networking Mechanisms
460==================================
461
462The `|&' operator introduced in `gawk' 3.1 for use in communicating
463with a "coprocess" is described in *Note Two-way Communications With
464Another Process: (gawk)Two-way I/O. It shows how to do two-way I/O to a
465separate process, sending it data with `print' or `printf' and reading
466data with `getline'. If you haven't read it already, you should detour
467there to do so.
468
469 `gawk' transparently extends the two-way I/O mechanism to simple
470networking through the use of special file names. When a "coprocess"
471that matches the special files we are about to describe is started,
472`gawk' creates the appropriate network connection, and then two-way I/O
473proceeds as usual.
474
475 At the C, C++, and Perl level, networking is accomplished via
476"sockets", an Application Programming Interface (API) originally
477developed at the University of California at Berkeley that is now used
478almost universally for TCP/IP networking. Socket level programming,
479while fairly straightforward, requires paying attention to a number of
480details, as well as using binary data. It is not well-suited for use
481from a high-level language like `awk'. The special files provided in
482`gawk' hide the details from the programmer, making things much simpler
483and easier to use.
484
485 The special file name for network access is made up of several
486fields, all of which are mandatory:
487
488 /inet/PROTOCOL/LOCALPORT/HOSTNAME/REMOTEPORT
489
490The `/inet/' field is, of course, constant when accessing the network.
491The LOCALPORT and REMOTEPORT fields do not have a meaning when used
492with `/inet/raw' because "ports" only apply to TCP and UDP. So, when
493using `/inet/raw', the port fields always have to be `0'.
494
495* Menu:
496
497* Special File Fields:: The fields in the special file name.
498* Comparing Protocols:: Differences between the protocols.
499
500
501File: gawkinet.info, Node: Special File Fields, Next: Comparing Protocols, Prev: Gawk Special Files, Up: Gawk Special Files
502
5032.1.1 The Fields of the Special File Name
504-----------------------------------------
505
506This node explains the meaning of all the other fields, as well as the
507range of values and the defaults. All of the fields are mandatory. To
508let the system pick a value, or if the field doesn't apply to the
509protocol, specify it as `0':
510
511PROTOCOL
512 Determines which member of the TCP/IP family of protocols is
513 selected to transport the data across the network. There are three
514 possible values (always written in lowercase): `tcp', `udp', and
515 `raw'. The exact meaning of each is explained later in this node.
516
517LOCALPORT
518 Determines which port on the local machine is used to communicate
519 across the network. It has no meaning with `/inet/raw' and must
520 therefore be `0'. Application-level clients usually use `0' to
521 indicate they do not care which local port is used--instead they
522 specify a remote port to connect to. It is vital for
523 application-level servers to use a number different from `0' here
524 because their service has to be available at a specific publicly
525 known port number. It is possible to use a name from
526 `/etc/services' here.
527
528HOSTNAME
529 Determines which remote host is to be at the other end of the
530 connection. Application-level servers must fill this field with a
531 `0' to indicate their being open for all other hosts to connect to
532 them and enforce connection level server behavior this way. It is
533 not possible for an application-level server to restrict its
534 availability to one remote host by entering a host name here.
535 Application-level clients must enter a name different from `0'.
536 The name can be either symbolic (e.g., `jpl-devvax.jpl.nasa.gov')
537 or numeric (e.g., `128.149.1.143').
538
539REMOTEPORT
540 Determines which port on the remote machine is used to communicate
541 across the network. It has no meaning with `/inet/raw' and must
542 therefore be 0. For `/inet/tcp' and `/inet/udp',
543 application-level clients _must_ use a number other than `0' to
544 indicate to which port on the remote machine they want to connect.
545 Application-level servers must not fill this field with a `0'.
546 Instead they specify a local port to which clients connect. It is
547 possible to use a name from `/etc/services' here.
548
549Experts in network programming will notice that the usual client/server
550asymmetry found at the level of the socket API is not visible here.
551This is for the sake of simplicity of the high-level concept. If this
552asymmetry is necessary for your application, use another language. For
553`gawk', it is more important to enable users to write a client program
554with a minimum of code. What happens when first accessing a network
555connection is seen in the following pseudocode:
556
557 if ((name of remote host given) && (other side accepts connection)) {
558 rendez-vous successful; transmit with getline or print
559 } else {
560 if ((other side did not accept) && (localport == 0))
561 exit unsuccessful
562 if (TCP) {
563 set up a server accepting connections
564 this means waiting for the client on the other side to connect
565 } else
566 ready
567 }
568
569The exact behavior of this algorithm depends on the values of the
570fields of the special file name. When in doubt, *Note
571table-inet-components:: gives you the combinations of values and their
572meaning. If this table is too complicated, focus on the three lines
573printed in *bold*. All the examples in *Note Networking With `gawk':
574Using Networking, use only the patterns printed in bold letters.
575
576PROTOCOL LOCAL PORT HOST NAME REMOTE RESULTING CONNECTION-LEVEL
577 PORT BEHAVIOR
578------------------------------------------------------------------------------
579*tcp* *0* *x* *x* *Dedicated client, fails if
580 immediately connecting to a
581 server on the
582 other side fails*
583------------------------------------------------------------------------------
584udp 0 x x Dedicated client
585------------------------------------------------------------------------------
586raw 0 x 0 Dedicated client, works only
587 as `root'
588------------------------------------------------------------------------------
589*tcp, udp* *x* *x* *x* *Client, switches to
590 dedicated server if
591 necessary*
592------------------------------------------------------------------------------
593*tcp, udp* *x* *0* *0* *Dedicated server*
594------------------------------------------------------------------------------
595raw 0 0 0 Dedicated server, works only
596 as `root'
597------------------------------------------------------------------------------
598tcp, udp, x x 0 Invalid
599raw
600------------------------------------------------------------------------------
601tcp, udp, 0 0 x Invalid
602raw
603------------------------------------------------------------------------------
604tcp, udp, x 0 x Invalid
605raw
606------------------------------------------------------------------------------
607tcp, udp 0 0 0 Invalid
608------------------------------------------------------------------------------
609tcp, udp 0 x 0 Invalid
610------------------------------------------------------------------------------
611raw x 0 0 Invalid
612------------------------------------------------------------------------------
613raw 0 x x Invalid
614------------------------------------------------------------------------------
615raw x x x Invalid
616------------------------------------------------------------------------------
617
618Table 2.1: /inet Special File Components
619
620 In general, TCP is the preferred mechanism to use. It is the
621simplest protocol to understand and to use. Use the others only if
622circumstances demand low-overhead.
623
624
625File: gawkinet.info, Node: Comparing Protocols, Prev: Special File Fields, Up: Gawk Special Files
626
6272.1.2 Comparing Protocols
628-------------------------
629
630This node develops a pair of programs (sender and receiver) that do
631nothing but send a timestamp from one machine to another. The sender
632and the receiver are implemented with each of the three protocols
633available and demonstrate the differences between them.
634
635* Menu:
636
637* File /inet/tcp:: The TCP special file.
638* File /inet/udp:: The UDP special file.
639* File /inet/raw:: The RAW special file.
640
641
642File: gawkinet.info, Node: File /inet/tcp, Next: File /inet/udp, Prev: Comparing Protocols, Up: Comparing Protocols
643
6442.1.2.1 `/inet/tcp'
645...................
646
647Once again, always use TCP. (Use UDP when low overhead is a necessity,
648and use RAW for network experimentation.) The first example is the
649sender program:
650
651 # Server
652 BEGIN {
653 print strftime() |& "/inet/tcp/8888/0/0"
654 close("/inet/tcp/8888/0/0")
655 }
656
657The receiver is very simple:
658
659 # Client
660 BEGIN {
661 "/inet/tcp/0/localhost/8888" |& getline
662 print $0
663 close("/inet/tcp/0/localhost/8888")
664 }
665
666TCP guarantees that the bytes arrive at the receiving end in exactly
667the same order that they were sent. No byte is lost (except for broken
668connections), doubled, or out of order. Some overhead is necessary to
669accomplish this, but this is the price to pay for a reliable service.
670It does matter which side starts first. The sender/server has to be
671started first, and it waits for the receiver to read a line.
672
673
674File: gawkinet.info, Node: File /inet/udp, Next: File /inet/raw, Prev: File /inet/tcp, Up: Comparing Protocols
675
6762.1.2.2 `/inet/udp'
677...................
678
679The server and client programs that use UDP are almost identical to
680their TCP counterparts; only the PROTOCOL has changed. As before, it
681does matter which side starts first. The receiving side blocks and
682waits for the sender. In this case, the receiver/client has to be
683started first:
684
685 # Server
686 BEGIN {
687 print strftime() |& "/inet/udp/8888/0/0"
688 close("/inet/udp/8888/0/0")
689 }
690
691The receiver is almost identical to the TCP receiver:
692
693 # Client
694 BEGIN {
695 "/inet/udp/0/localhost/8888" |& getline
696 print $0
697 close("/inet/udp/0/localhost/8888")
698 }
699
700UDP cannot guarantee that the datagrams at the receiving end will
701arrive in exactly the same order they were sent. Some datagrams could be
702lost, some doubled, and some out of order. But no overhead is necessary
703to accomplish this. This unreliable behavior is good enough for tasks
704such as data acquisition, logging, and even stateless services like NFS.
705
706
707File: gawkinet.info, Node: File /inet/raw, Prev: File /inet/udp, Up: Comparing Protocols
708
7092.1.2.3 `/inet/raw'
710...................
711
712This is an IP-level protocol. Only `root' is allowed to access this
713special file. It is meant to be the basis for implementing and
714experimenting with transport-level protocols.(1) In the most general
715case, the sender has to supply the encapsulating header bytes in front
716of the packet and the receiver has to strip the additional bytes from
717the message.
718
719RAW receivers cannot receive packets sent with TCP or UDP because the
720operating system does not deliver the packets to a RAW receiver. The
721operating system knows about some of the protocols on top of IP and
722decides on its own which packet to deliver to which process. (d.c.)
723Therefore, the UDP receiver must be used for receiving UDP datagrams
724sent with the RAW sender. This is a dark corner, not only of `gawk',
725but also of TCP/IP.
726
727For extended experimentation with protocols, look into the approach
728implemented in a tool called SPAK. This tool reflects the hierarchical
729layering of protocols (encapsulation) in the way data streams are piped
730out of one program into the next one. It shows which protocol is based
731on which other (lower-level) protocol by looking at the command-line
732ordering of the program calls. Cleverly thought out, SPAK is much
733better than `gawk''s `/inet' for learning the meaning of each and every
734bit in the protocol headers.
735
736The next example uses the RAW protocol to emulate the behavior of UDP.
737The sender program is the same as above, but with some additional bytes
738that fill the places of the UDP fields:
739
740 BEGIN {
741 Message = "Hello world\n"
742 SourcePort = 0
743 DestinationPort = 8888
744 MessageLength = length(Message)+8
745 RawService = "/inet/raw/0/localhost/0"
746 printf("%c%c%c%c%c%c%c%c%s",
747 SourcePort/256, SourcePort%256,
748 DestinationPort/256, DestinationPort%256,
749 MessageLength/256, MessageLength%256,
750 0, 0, Message) |& RawService
751 fflush(RawService)
752 close(RawService)
753 }
754
755Since this program tries to emulate the behavior of UDP, it checks if
756the RAW sender is understood by the UDP receiver but not if the RAW
757receiver can understand the UDP sender. In a real network, the RAW
758receiver is hardly of any use because it gets every IP packet that
759comes across the network. There are usually so many packets that `gawk'
760would be too slow for processing them. Only on a network with little
761traffic can the IP-level receiver program be tested. Programs for
762analyzing IP traffic on modem or ISDN channels should be possible.
763
764Port numbers do not have a meaning when using `/inet/raw'. Their fields
765have to be `0'. Only TCP and UDP use ports. Receiving data from
766`/inet/raw' is difficult, not only because of processing speed but also
767because data is usually binary and not restricted to ASCII. This
768implies that line separation with `RS' does not work as usual.
769
770---------- Footnotes ----------
771
772(1) This special file is reserved, but not otherwise currently
773implemented.
774
775
776File: gawkinet.info, Node: TCP Connecting, Next: Troubleshooting, Prev: Gawk Special Files, Up: Using Networking
777
7782.2 Establishing a TCP Connection
779=================================
780
781Let's observe a network connection at work. Type in the following
782program and watch the output. Within a second, it connects via TCP
783(`/inet/tcp') to the machine it is running on (`localhost') and asks
784the service `daytime' on the machine what time it is:
785
786 BEGIN {
787 "/inet/tcp/0/localhost/daytime" |& getline
788 print $0
789 close("/inet/tcp/0/localhost/daytime")
790 }
791
792Even experienced `awk' users will find the second line strange in two
793respects:
794
795 * A special file is used as a shell command that pipes its output
796 into `getline'. One would rather expect to see the special file
797 being read like any other file (`getline <
798 "/inet/tcp/0/localhost/daytime")'.
799
800 * The operator `|&' has not been part of any `awk' implementation
801 (until now). It is actually the only extension of the `awk'
802 language needed (apart from the special files) to introduce
803 network access.
804
805The `|&' operator was introduced in `gawk' 3.1 in order to overcome the
806crucial restriction that access to files and pipes in `awk' is always
807unidirectional. It was formerly impossible to use both access modes on
808the same file or pipe. Instead of changing the whole concept of file
809access, the `|&' operator behaves exactly like the usual pipe operator
810except for two additions:
811
812 * Normal shell commands connected to their `gawk' program with a `|&'
813 pipe can be accessed bidirectionally. The `|&' turns out to be a
814 quite general, useful, and natural extension of `awk'.
815
816 * Pipes that consist of a special file name for network connections
817 are not executed as shell commands. Instead, they can be read and
818 written to, just like a full-duplex network connection.
819
820In the earlier example, the `|&' operator tells `getline' to read a
821line from the special file `/inet/tcp/0/localhost/daytime'. We could
822also have printed a line into the special file. But instead we just
823read a line with the time, printed it, and closed the connection.
824(While we could just let `gawk' close the connection by finishing the
825program, in this Info file we are pedantic and always explicitly close
826the connections.)
827
828
829File: gawkinet.info, Node: Troubleshooting, Next: Interacting, Prev: TCP Connecting, Up: Using Networking
830
8312.3 Troubleshooting Connection Problems
832=======================================
833
834It may well be that for some reason the program shown in the previous
835example does not run on your machine. When looking at possible reasons
836for this, you will learn much about typical problems that arise in
837network programming. First of all, your implementation of `gawk' may
838not support network access because it is a pre-3.1 version or you do
839not have a network interface in your machine. Perhaps your machine
840uses some other protocol, such as DECnet or Novell's IPX. For the rest
841of this major node, we will assume you work on a Unix machine that
842supports TCP/IP. If the previous example program does not run on your
843machine, it may help to replace the name `localhost' with the name of
844your machine or its IP address. If it does, you could replace
845`localhost' with the name of another machine in your vicinity--this
846way, the program connects to another machine. Now you should see the
847date and time being printed by the program, otherwise your machine may
848not support the `daytime' service. Try changing the service to
849`chargen' or `ftp'. This way, the program connects to other services
850that should give you some response. If you are curious, you should have
851a look at your `/etc/services' file. It could look like this:
852
853 # /etc/services:
854 #
855 # Network services, Internet style
856 #
857 # Name Number/Protcol Alternate name # Comments
858
859 echo 7/tcp
860 echo 7/udp
861 discard 9/tcp sink null
862 discard 9/udp sink null
863 daytime 13/tcp
864 daytime 13/udp
865 chargen 19/tcp ttytst source
866 chargen 19/udp ttytst source
867 ftp 21/tcp
868 telnet 23/tcp
869 smtp 25/tcp mail
870 finger 79/tcp
871 www 80/tcp http # WorldWideWeb HTTP
872 www 80/udp # HyperText Transfer Protocol
873 pop-2 109/tcp postoffice # POP version 2
874 pop-2 109/udp
875 pop-3 110/tcp # POP version 3
876 pop-3 110/udp
877 nntp 119/tcp readnews untp # USENET News
878 irc 194/tcp # Internet Relay Chat
879 irc 194/udp
880 ...
881
882Here, you find a list of services that traditional Unix machines usually
883support. If your GNU/Linux machine does not do so, it may be that these
884services are switched off in some startup script. Systems running some
885flavor of Microsoft Windows usually do _not_ support these services.
886Nevertheless, it _is_ possible to do networking with `gawk' on Microsoft
887Windows.(1) The first column of the file gives the name of the service,
888and the second column gives a unique number and the protocol that one
889can use to connect to this service. The rest of the line is treated as
890a comment. You see that some services (`echo') support TCP as well as
891UDP.
892
893---------- Footnotes ----------
894
895(1) Microsoft prefered to ignore the TCP/IP family of protocols until
8961995. Then came the rise of the Netscape browser as a landmark "killer
897application." Microsoft added TCP/IP support and their own browser to
898Microsoft Windows 95 at the last minute. They even back-ported their
899TCP/IP implementation to Microsoft Windows for Workgroups 3.11, but it
900was a rather rudimentary and half-hearted implementation. Nevertheless,
901the equivalent of `/etc/services' resides under
902`C:\WINNT\system32\drivers\etc\services' on Microsoft Windows 2000.
903
904
905File: gawkinet.info, Node: Interacting, Next: Setting Up, Prev: Troubleshooting, Up: Using Networking
906
9072.4 Interacting with a Network Service
908======================================
909
910The next program makes use of the possibility to really interact with a
911network service by printing something into the special file. It asks the
912so-called `finger' service if a user of the machine is logged in. When
913testing this program, try to change `localhost' to some other machine
914name in your local network:
915
916 BEGIN {
917 NetService = "/inet/tcp/0/localhost/finger"
918 print "NAME" |& NetService
919 while ((NetService |& getline) > 0)
920 print $0
921 close(NetService)
922 }
923
924After telling the service on the machine which user to look for, the
925program repeatedly reads lines that come as a reply. When no more lines
926are coming (because the service has closed the connection), the program
927also closes the connection. Try replacing `"NAME"' with your login name
928(or the name of someone else logged in). For a list of all users
929currently logged in, replace NAME with an empty string (`""').
930
931The final `close' command could be safely deleted from the above
932script, because the operating system closes any open connection by
933default when a script reaches the end of execution. In order to avoid
934portability problems, it is best to always close connections explicitly.
935With the Linux kernel, for example, proper closing results in flushing
936of buffers. Letting the close happen by default may result in
937discarding buffers.
938
939When looking at `/etc/services' you may have noticed that the `daytime'
940service is also available with `udp'. In the earlier example, change
941`tcp' to `udp', and change `finger' to `daytime'. After starting the
942modified program, you see the expected day and time message. The
943program then hangs, because it waits for more lines coming from the
944service. However, they never come. This behavior is a consequence of the
945differences between TCP and UDP. When using UDP, neither party is
946automatically informed about the other closing the connection.
947Continuing to experiment this way reveals many other subtle differences
948between TCP and UDP. To avoid such trouble, one should always remember
949the advice Douglas E. Comer and David Stevens give in Volume III of
950their series `Internetworking With TCP' (page 14):
951
952 When designing client-server applications, beginners are strongly
953 advised to use TCP because it provides reliable,
954 connection-oriented communication. Programs only use UDP if the
955 application protocol handles reliability, the application requires
956 hardware broadcast or multicast, or the application cannot
957 tolerate virtual circuit overhead.
958
959
960File: gawkinet.info, Node: Setting Up, Next: Email, Prev: Interacting, Up: Using Networking
961
9622.5 Setting Up a Service
963========================
964
965The preceding programs behaved as clients that connect to a server
966somewhere on the Internet and request a particular service. Now we set
967up such a service to mimic the behavior of the `daytime' service. Such
968a server does not know in advance who is going to connect to it over
969the network. Therefore, we cannot insert a name for the host to connect
970to in our special file name.
971
972Start the following program in one window. Notice that the service does
973not have the name `daytime', but the number `8888'. From looking at
974`/etc/services', you know that names like `daytime' are just mnemonics
975for predetermined 16-bit integers. Only the system administrator
976(`root') could enter our new service into `/etc/services' with an
977appropriate name. Also notice that the service name has to be entered
978into a different field of the special file name because we are setting
979up a server, not a client:
980
981 BEGIN {
982 print strftime() |& "/inet/tcp/8888/0/0"
983 close("/inet/tcp/8888/0/0")
984 }
985
986Now open another window on the same machine. Copy the client program
987given as the first example (*note Establishing a TCP Connection: TCP
988Connecting.) to a new file and edit it, changing the name `daytime' to
989`8888'. Then start the modified client. You should get a reply like
990this:
991
992 Sat Sep 27 19:08:16 CEST 1997
993
994Both programs explicitly close the connection.
995
996Now we will intentionally make a mistake to see what happens when the
997name `8888' (the so-called port) is already used by another service.
998Start the server program in both windows. The first one works, but the
999second one complains that it could not open the connection. Each port
1000on a single machine can only be used by one server program at a time.
1001Now terminate the server program and change the name `8888' to `echo'.
1002After restarting it, the server program does not run any more, and you
1003know why: there is already an `echo' service running on your machine.
1004But even if this isn't true, you would not get your own `echo' server
1005running on a Unix machine, because the ports with numbers smaller than
10061024 (`echo' is at port 7) are reserved for `root'. On machines
1007running some flavor of Microsoft Windows, there is no restriction that
1008reserves ports 1 to 1024 for a privileged user; hence, you can start an
1009`echo' server there.
1010
1011Turning this short server program into something really useful is
1012simple. Imagine a server that first reads a file name from the client
1013through the network connection, then does something with the file and
1014sends a result back to the client. The server-side processing could be:
1015
1016 BEGIN {
1017 NetService = "/inet/tcp/8888/0/0"
1018 NetService |& getline
1019 CatPipe = ("cat " $1) # sets $0 and the fields
1020 while ((CatPipe | getline) > 0)
1021 print $0 |& NetService
1022 close(NetService)
1023 }
1024
1025and we would have a remote copying facility. Such a server reads the
1026name of a file from any client that connects to it and transmits the
1027contents of the named file across the net. The server-side processing
1028could also be the execution of a command that is transmitted across the
1029network. From this example, you can see how simple it is to open up a
1030security hole on your machine. If you allow clients to connect to your
1031machine and execute arbitrary commands, anyone would be free to do `rm
1032-rf *'.
1033
1034
1035File: gawkinet.info, Node: Email, Next: Web page, Prev: Setting Up, Up: Using Networking
1036
10372.6 Reading Email
1038=================
1039
1040The distribution of email is usually done by dedicated email servers
1041that communicate with your machine using special protocols. To receive
1042email, we will use the Post Office Protocol (POP). Sending can be done
1043with the much older Simple Mail Transfer Protocol (SMTP).
1044
1045When you type in the following program, replace the EMAILHOST by the
1046name of your local email server. Ask your administrator if the server
1047has a POP service, and then use its name or number in the program below.
1048Now the program is ready to connect to your email server, but it will
1049not succeed in retrieving your mail because it does not yet know your
1050login name or password. Replace them in the program and it shows you
1051the first email the server has in store:
1052
1053 BEGIN {
1054 POPService = "/inet/tcp/0/EMAILHOST/pop3"
1055 RS = ORS = "\r\n"
1056 print "user NAME" |& POPService
1057 POPService |& getline
1058 print "pass PASSWORD" |& POPService
1059 POPService |& getline
1060 print "retr 1" |& POPService
1061 POPService |& getline
1062 if ($1 != "+OK") exit
1063 print "quit" |& POPService
1064 RS = "\r\n\\.\r\n"
1065 POPService |& getline
1066 print $0
1067 close(POPService)
1068 }
1069
1070The record separators `RS' and `ORS' are redefined because the protocol
1071(POP) requires CR-LF to separate lines. After identifying yourself to
1072the email service, the command `retr 1' instructs the service to send
1073the first of all your email messages in line. If the service replies
1074with something other than `+OK', the program exits; maybe there is no
1075email. Otherwise, the program first announces that it intends to finish
1076reading email, and then redefines `RS' in order to read the entire
1077email as multiline input in one record. From the POP RFC, we know that
1078the body of the email always ends with a single line containing a
1079single dot. The program looks for this using `RS = "\r\n\\.\r\n"'.
1080When it finds this sequence in the mail message, it quits. You can
1081invoke this program as often as you like; it does not delete the
1082message it reads, but instead leaves it on the server.
1083
1084
1085File: gawkinet.info, Node: Web page, Next: Primitive Service, Prev: Email, Up: Using Networking
1086
10872.7 Reading a Web Page
1088======================
1089
1090Retrieving a web page from a web server is as simple as retrieving
1091email from an email server. We only have to use a similar, but not
1092identical, protocol and a different port. The name of the protocol is
1093HyperText Transfer Protocol (HTTP) and the port number is usually 80.
1094As in the preceding node, ask your administrator about the name of your
1095local web server or proxy web server and its port number for HTTP
1096requests.
1097
1098The following program employs a rather crude approach toward retrieving
1099a web page. It uses the prehistoric syntax of HTTP 0.9, which almost all
1100web servers still support. The most noticeable thing about it is that
1101the program directs the request to the local proxy server whose name
1102you insert in the special file name (which in turn calls
1103`www.yahoo.com'):
1104
1105 BEGIN {
1106 RS = ORS = "\r\n"
1107 HttpService = "/inet/tcp/0/PROXY/80"
1108 print "GET http://www.yahoo.com" |& HttpService
1109 while ((HttpService |& getline) > 0)
1110 print $0
1111 close(HttpService)
1112 }
1113
1114Again, lines are separated by a redefined `RS' and `ORS'. The `GET'
1115request that we send to the server is the only kind of HTTP request
1116that existed when the web was created in the early 1990s. HTTP calls
1117this `GET' request a "method," which tells the service to transmit a
1118web page (here the home page of the Yahoo! search engine). Version 1.0
1119added the request methods `HEAD' and `POST'. The current version of
1120HTTP is 1.1,(1) and knows the additional request methods `OPTIONS',
1121`PUT', `DELETE', and `TRACE'. You can fill in any valid web address,
1122and the program prints the HTML code of that page to your screen.
1123
1124Notice the similarity between the responses of the POP and HTTP
1125services. First, you get a header that is terminated by an empty line,
1126and then you get the body of the page in HTML. The lines of the
1127headers also have the same form as in POP. There is the name of a
1128parameter, then a colon, and finally the value of that parameter.
1129
1130Images (`.png' or `.gif' files) can also be retrieved this way, but
1131then you get binary data that should be redirected into a file. Another
1132application is calling a CGI (Common Gateway Interface) script on some
1133server. CGI scripts are used when the contents of a web page are not
1134constant, but generated instantly at the moment you send a request for
1135the page. For example, to get a detailed report about the current
1136quotes of Motorola stock shares, call a CGI script at Yahoo! with the
1137following:
1138
1139 get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
1140 print get |& HttpService
1141
1142You can also request weather reports this way.
1143
1144---------- Footnotes ----------
1145
1146(1) Version 1.0 of HTTP was defined in RFC 1945. HTTP 1.1 was
1147initially specified in RFC 2068. In June 1999, RFC 2068 was made
1148obsolete by RFC 2616, an update without any substantial changes.
1149
1150
1151File: gawkinet.info, Node: Primitive Service, Next: Interacting Service, Prev: Web page, Up: Using Networking
1152
11532.8 A Primitive Web Service
1154===========================
1155
1156Now we know enough about HTTP to set up a primitive web service that
1157just says `"Hello, world"' when someone connects to it with a browser.
1158Compared to the situation in the preceding node, our program changes
1159the role. It tries to behave just like the server we have observed.
1160Since we are setting up a server here, we have to insert the port
1161number in the `localport' field of the special file name. The other two
1162fields (HOSTNAME and REMOTEPORT) have to contain a `0' because we do
1163not know in advance which host will connect to our service.
1164
1165In the early 1990s, all a server had to do was send an HTML document and
1166close the connection. Here, we adhere to the modern syntax of HTTP.
1167The steps are as follows:
1168
1169 1. Send a status line telling the web browser that everything is okay.
1170
1171 2. Send a line to tell the browser how many bytes follow in the body
1172 of the message. This was not necessary earlier because both
1173 parties knew that the document ended when the connection closed.
1174 Nowadays it is possible to stay connected after the transmission
1175 of one web page. This is to avoid the network traffic necessary
1176 for repeatedly establishing TCP connections for requesting several
1177 images. Thus, there is the need to tell the receiving party how
1178 many bytes will be sent. The header is terminated as usual with an
1179 empty line.
1180
1181 3. Send the `"Hello, world"' body in HTML. The useless `while' loop
1182 swallows the request of the browser. We could actually omit the
1183 loop, and on most machines the program would still work. First,
1184 start the following program:
1185
1186 BEGIN {
1187 RS = ORS = "\r\n"
1188 HttpService = "/inet/tcp/8080/0/0"
1189 Hello = "<HTML><HEAD>" \
1190 "<TITLE>A Famous Greeting</TITLE></HEAD>" \
1191 "<BODY><H1>Hello, world</H1></BODY></HTML>"
1192 Len = length(Hello) + length(ORS)
1193 print "HTTP/1.0 200 OK" |& HttpService
1194 print "Content-Length: " Len ORS |& HttpService
1195 print Hello |& HttpService
1196 while ((HttpService |& getline) > 0)
1197 continue;
1198 close(HttpService)
1199 }
1200
1201Now, on the same machine, start your favorite browser and let it point
1202to `http://localhost:8080' (the browser needs to know on which port our
1203server is listening for requests). If this does not work, the browser
1204probably tries to connect to a proxy server that does not know your
1205machine. If so, change the browser's configuration so that the browser
1206does not try to use a proxy to connect to your machine.
1207
1208
1209File: gawkinet.info, Node: Interacting Service, Next: Simple Server, Prev: Primitive Service, Up: Using Networking
1210
12112.9 A Web Service with Interaction
1212==================================
1213
1214This node shows how to set up a simple web server. The subnode is a
1215library file that we will use with all the examples in *Note Some
1216Applications and Techniques::.
1217
1218* Menu:
1219
1220* CGI Lib:: A simple CGI library.
1221
1222Setting up a web service that allows user interaction is more difficult
1223and shows us the limits of network access in `gawk'. In this node, we
1224develop a main program (a `BEGIN' pattern and its action) that will
1225become the core of event-driven execution controlled by a graphical
1226user interface (GUI). Each HTTP event that the user triggers by some
1227action within the browser is received in this central procedure.
1228Parameters and menu choices are extracted from this request, and an
1229appropriate measure is taken according to the user's choice. For
1230example:
1231
1232 BEGIN {
1233 if (MyHost == "") {
1234 "uname -n" | getline MyHost
1235 close("uname -n")
1236 }
1237 if (MyPort == 0) MyPort = 8080
1238 HttpService = "/inet/tcp/" MyPort "/0/0"
1239 MyPrefix = "http://" MyHost ":" MyPort
1240 SetUpServer()
1241 while ("awk" != "complex") {
1242 # header lines are terminated this way
1243 RS = ORS = "\r\n"
1244 Status = 200 # this means OK
1245 Reason = "OK"
1246 Header = TopHeader
1247 Document = TopDoc
1248 Footer = TopFooter
1249 if (GETARG["Method"] == "GET") {
1250 HandleGET()
1251 } else if (GETARG["Method"] == "HEAD") {
1252 # not yet implemented
1253 } else if (GETARG["Method"] != "") {
1254 print "bad method", GETARG["Method"]
1255 }
1256 Prompt = Header Document Footer
1257 print "HTTP/1.0", Status, Reason |& HttpService
1258 print "Connection: Close" |& HttpService
1259 print "Pragma: no-cache" |& HttpService
1260 len = length(Prompt) + length(ORS)
1261 print "Content-length:", len |& HttpService
1262 print ORS Prompt |& HttpService
1263 # ignore all the header lines
1264 while ((HttpService |& getline) > 0)
1265 ;
1266 # stop talking to this client
1267 close(HttpService)
1268 # wait for new client request
1269 HttpService |& getline
1270 # do some logging
1271 print systime(), strftime(), $0
1272 # read request parameters
1273 CGI_setup($1, $2, $3)
1274 }
1275 }
1276
1277This web server presents menu choices in the form of HTML links.
1278Therefore, it has to tell the browser the name of the host it is
1279residing on. When starting the server, the user may supply the name of
1280the host from the command line with `gawk -v MyHost="Rumpelstilzchen"'.
1281If the user does not do this, the server looks up the name of the host
1282it is running on for later use as a web address in HTML documents. The
1283same applies to the port number. These values are inserted later into
1284the HTML content of the web pages to refer to the home system.
1285
1286Each server that is built around this core has to initialize some
1287application-dependent variables (such as the default home page) in a
1288procedure `SetUpServer', which is called immediately before entering the
1289infinite loop of the server. For now, we will write an instance that
1290initiates a trivial interaction. With this home page, the client user
1291can click on two possible choices, and receive the current date either
1292in human-readable format or in seconds since 1970:
1293
1294 function SetUpServer() {
1295 TopHeader = "<HTML><HEAD>"
1296 TopHeader = TopHeader \
1297 "<title>My name is GAWK, GNU AWK</title></HEAD>"
1298 TopDoc = "<BODY><h2>\
1299 Do you prefer your date <A HREF=" MyPrefix \
1300 "/human>human</A> or \
1301 <A HREF=" MyPrefix "/POSIX>POSIXed</A>?</h2>" ORS ORS
1302 TopFooter = "</BODY></HTML>"
1303 }
1304
1305On the first run through the main loop, the default line terminators are
1306set and the default home page is copied to the actual home page. Since
1307this is the first run, `GETARG["Method"]' is not initialized yet, hence
1308the case selection over the method does nothing. Now that the home page
1309is initialized, the server can start communicating to a client browser.
1310
1311It does so by printing the HTTP header into the network connection
1312(`print ... |& HttpService'). This command blocks execution of the
1313server script until a client connects. If this server script is
1314compared with the primitive one we wrote before, you will notice two
1315additional lines in the header. The first instructs the browser to
1316close the connection after each request. The second tells the browser
1317that it should never try to _remember_ earlier requests that had
1318identical web addresses (no caching). Otherwise, it could happen that
1319the browser retrieves the time of day in the previous example just once,
1320and later it takes the web page from the cache, always displaying the
1321same time of day although time advances each second.
1322
1323Having supplied the initial home page to the browser with a valid
1324document stored in the parameter `Prompt', it closes the connection and
1325waits for the next request. When the request comes, a log line is
1326printed that allows us to see which request the server receives. The
1327final step in the loop is to call the function `CGI_setup', which reads
1328all the lines of the request (coming from the browser), processes them,
1329and stores the transmitted parameters in the array `PARAM'. The complete
1330text of these application-independent functions can be found in *Note A
1331Simple CGI Library: CGI Lib. For now, we use a simplified version of
1332`CGI_setup':
1333
1334 function CGI_setup( method, uri, version, i) {
1335 delete GETARG; delete MENU; delete PARAM
1336 GETARG["Method"] = $1
1337 GETARG["URI"] = $2
1338 GETARG["Version"] = $3
1339 i = index($2, "?")
1340 # is there a "?" indicating a CGI request?
1341 if (i > 0) {
1342 split(substr($2, 1, i-1), MENU, "[/:]")
1343 split(substr($2, i+1), PARAM, "&")
1344 for (i in PARAM) {
1345 j = index(PARAM[i], "=")
1346 GETARG[substr(PARAM[i], 1, j-1)] = \
1347 substr(PARAM[i], j+1)
1348 }
1349 } else { # there is no "?", no need for splitting PARAMs
1350 split($2, MENU, "[/:]")
1351 }
1352 }
1353
1354At first, the function clears all variables used for global storage of
1355request parameters. The rest of the function serves the purpose of
1356filling the global parameters with the extracted new values. To
1357accomplish this, the name of the requested resource is split into parts
1358and stored for later evaluation. If the request contains a `?', then
1359the request has CGI variables seamlessly appended to the web address.
1360Everything in front of the `?' is split up into menu items, and
1361everything behind the `?' is a list of `VARIABLE=VALUE' pairs
1362(separated by `&') that also need splitting. This way, CGI variables are
1363isolated and stored. This procedure lacks recognition of special
1364characters that are transmitted in coded form(1). Here, any optional
1365request header and body parts are ignored. We do not need header
1366parameters and the request body. However, when refining our approach or
1367working with the `POST' and `PUT' methods, reading the header and body
1368becomes inevitable. Header parameters should then be stored in a global
1369array as well as the body.
1370
1371On each subsequent run through the main loop, one request from a
1372browser is received, evaluated, and answered according to the user's
1373choice. This can be done by letting the value of the HTTP method guide
1374the main loop into execution of the procedure `HandleGET', which
1375evaluates the user's choice. In this case, we have only one
1376hierarchical level of menus, but in the general case, menus are nested.
1377The menu choices at each level are separated by `/', just as in file
1378names. Notice how simple it is to construct menus of arbitrary depth:
1379
1380 function HandleGET() {
1381 if ( MENU[2] == "human") {
1382 Footer = strftime() TopFooter
1383 } else if (MENU[2] == "POSIX") {
1384 Footer = systime() TopFooter
1385 }
1386 }
1387
1388The disadvantage of this approach is that our server is slow and can
1389handle only one request at a time. Its main advantage, however, is that
1390the server consists of just one `gawk' program. No need for installing
1391an `httpd', and no need for static separate HTML files, CGI scripts, or
1392`root' privileges. This is rapid prototyping. This program can be
1393started on the same host that runs your browser. Then let your browser
1394point to `http://localhost:8080'.
1395
1396It is also possible to include images into the HTML pages. Most
1397browsers support the not very well-known `.xbm' format, which may
1398contain only monochrome pictures but is an ASCII format. Binary images
1399are possible but not so easy to handle. Another way of including images
1400is to generate them with a tool such as GNUPlot, by calling the tool
1401with the `system' function or through a pipe.
1402
1403---------- Footnotes ----------
1404
1405(1) As defined in RFC 2068.
1406
1407
1408File: gawkinet.info, Node: CGI Lib, Prev: Interacting Service, Up: Interacting Service
1409
14102.9.1 A Simple CGI Library
1411--------------------------
1412
1413 HTTP is like being married: you have to be able to handle whatever
1414 you're given, while being very careful what you send back.
1415 Phil Smith III,
1416 `http://www.netfunny.com/rhf/jokes/99/Mar/http.html'
1417
1418In *Note A Web Service with Interaction: Interacting Service, we saw
1419the function `CGI_setup' as part of the web server "core logic"
1420framework. The code presented there handles almost everything necessary
1421for CGI requests. One thing it doesn't do is handle encoded characters
1422in the requests. For example, an `&' is encoded as a percent sign
1423followed by the hexadecimal value: `%26'. These encoded values should
1424be decoded. Following is a simple library to perform these tasks.
1425This code is used for all web server examples used throughout the rest
1426of this Info file. If you want to use it for your own web server,
1427store the source code into a file named `inetlib.awk'. Then you can
1428include these functions into your code by placing the following
1429statement into your program (on the first line of your script):
1430
1431 @include inetlib.awk
1432
1433But beware, this mechanism is only possible if you invoke your web
1434server script with `igawk' instead of the usual `awk' or `gawk'. Here
1435is the code:
1436
1437 # CGI Library and core of a web server
1438 # Global arrays
1439 # GETARG --- arguments to CGI GET command
1440 # MENU --- menu items (path names)
1441 # PARAM --- parameters of form x=y
1442
1443 # Optional variable MyHost contains host address
1444 # Optional variable MyPort contains port number
1445 # Needs TopHeader, TopDoc, TopFooter
1446 # Sets MyPrefix, HttpService, Status, Reason
1447
1448 BEGIN {
1449 if (MyHost == "") {
1450 "uname -n" | getline MyHost
1451 close("uname -n")
1452 }
1453 if (MyPort == 0) MyPort = 8080
1454 HttpService = "/inet/tcp/" MyPort "/0/0"
1455 MyPrefix = "http://" MyHost ":" MyPort
1456 SetUpServer()
1457 while ("awk" != "complex") {
1458 # header lines are terminated this way
1459 RS = ORS = "\r\n"
1460 Status = 200 # this means OK
1461 Reason = "OK"
1462 Header = TopHeader
1463 Document = TopDoc
1464 Footer = TopFooter
1465 if (GETARG["Method"] == "GET") {
1466 HandleGET()
1467 } else if (GETARG["Method"] == "HEAD") {
1468 # not yet implemented
1469 } else if (GETARG["Method"] != "") {
1470 print "bad method", GETARG["Method"]
1471 }
1472 Prompt = Header Document Footer
1473 print "HTTP/1.0", Status, Reason |& HttpService
1474 print "Connection: Close" |& HttpService
1475 print "Pragma: no-cache" |& HttpService
1476 len = length(Prompt) + length(ORS)
1477 print "Content-length:", len |& HttpService
1478 print ORS Prompt |& HttpService
1479 # ignore all the header lines
1480 while ((HttpService |& getline) > 0)
1481 continue
1482 # stop talking to this client
1483 close(HttpService)
1484 # wait for new client request
1485 HttpService |& getline
1486 # do some logging
1487 print systime(), strftime(), $0
1488 CGI_setup($1, $2, $3)
1489 }
1490 }
1491
1492 function CGI_setup( method, uri, version, i)
1493 {
1494 delete GETARG
1495 delete MENU
1496 delete PARAM
1497 GETARG["Method"] = method
1498 GETARG["URI"] = uri
1499 GETARG["Version"] = version
1500
1501 i = index(uri, "?")
1502 if (i > 0) { # is there a "?" indicating a CGI request?
1503 split(substr(uri, 1, i-1), MENU, "[/:]")
1504 split(substr(uri, i+1), PARAM, "&")
1505 for (i in PARAM) {
1506 PARAM[i] = _CGI_decode(PARAM[i])
1507 j = index(PARAM[i], "=")
1508 GETARG[substr(PARAM[i], 1, j-1)] = \
1509 substr(PARAM[i], j+1)
1510 }
1511 } else { # there is no "?", no need for splitting PARAMs
1512 split(uri, MENU, "[/:]")
1513 }
1514 for (i in MENU) # decode characters in path
1515 if (i > 4) # but not those in host name
1516 MENU[i] = _CGI_decode(MENU[i])
1517 }
1518
1519This isolates details in a single function, `CGI_setup'. Decoding of
1520encoded characters is pushed off to a helper function, `_CGI_decode'.
1521The use of the leading underscore (`_') in the function name is
1522intended to indicate that it is an "internal" function, although there
1523is nothing to enforce this:
1524
1525 function _CGI_decode(str, hexdigs, i, pre, code1, code2,
1526 val, result)
1527 {
1528 hexdigs = "123456789abcdef"
1529
1530 i = index(str, "%")
1531 if (i == 0) # no work to do
1532 return str
1533
1534 do {
1535 pre = substr(str, 1, i-1) # part before %xx
1536 code1 = substr(str, i+1, 1) # first hex digit
1537 code2 = substr(str, i+2, 1) # second hex digit
1538 str = substr(str, i+3) # rest of string
1539
1540 code1 = tolower(code1)
1541 code2 = tolower(code2)
1542 val = index(hexdigs, code1) * 16 \
1543 + index(hexdigs, code2)
1544
1545 result = result pre sprintf("%c", val)
1546 i = index(str, "%")
1547 } while (i != 0)
1548 if (length(str) > 0)
1549 result = result str
1550 return result
1551 }
1552
1553This works by splitting the string apart around an encoded character.
1554The two digits are converted to lowercase characters and looked up in a
1555string of hex digits. Note that `0' is not in the string on purpose;
1556`index' returns zero when it's not found, automatically giving the
1557correct value! Once the hexadecimal value is converted from characters
1558in a string into a numerical value, `sprintf' converts the value back
1559into a real character. The following is a simple test harness for the
1560above functions:
1561
1562 BEGIN {
1563 CGI_setup("GET",
1564 "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
1565 "&percent=a %25 sign",
1566 "1.0")
1567 for (i in MENU)
1568 printf "MENU[\"%s\"] = %s\n", i, MENU[i]
1569 for (i in PARAM)
1570 printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
1571 for (i in GETARG)
1572 printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
1573 }
1574
1575And this is the result when we run it:
1576
1577 $ gawk -f testserv.awk
1578 -| MENU["4"] = www.gnu.org
1579 -| MENU["5"] = cgi-bin
1580 -| MENU["6"] = foo
1581 -| MENU["1"] = http
1582 -| MENU["2"] =
1583 -| MENU["3"] =
1584 -| PARAM["1"] = p1=stuff
1585 -| PARAM["2"] = p2=stuff&junk
1586 -| PARAM["3"] = percent=a % sign
1587 -| GETARG["p1"] = stuff
1588 -| GETARG["percent"] = a % sign
1589 -| GETARG["p2"] = stuff&junk
1590 -| GETARG["Method"] = GET
1591 -| GETARG["Version"] = 1.0
1592 -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
1593 p2=stuff%26junk&percent=a %25 sign
1594
1595
1596File: gawkinet.info, Node: Simple Server, Next: Caveats, Prev: Interacting Service, Up: Using Networking
1597
15982.10 A Simple Web Server
1599========================
1600
1601In the preceding node, we built the core logic for event-driven GUIs.
1602In this node, we finally extend the core to a real application. No one
1603would actually write a commercial web server in `gawk', but it is
1604instructive to see that it is feasible in principle.
1605
1606The application is ELIZA, the famous program by Joseph Weizenbaum that
1607mimics the behavior of a professional psychotherapist when talking to
1608you. Weizenbaum would certainly object to this description, but this
1609is part of the legend around ELIZA. Take the site-independent core
1610logic and append the following code:
1611
1612 function SetUpServer() {
1613 SetUpEliza()
1614 TopHeader = \
1615 "<HTML><title>An HTTP-based System with GAWK</title>\
1616 <HEAD><META HTTP-EQUIV=\"Content-Type\"\
1617 CONTENT=\"text/html; charset=iso-8859-1\"></HEAD>\
1618 <BODY BGCOLOR=\"#ffffff\" TEXT=\"#000000\"\
1619 LINK=\"#0000ff\" VLINK=\"#0000ff\"\
1620 ALINK=\"#0000ff\"> <A NAME=\"top\">"
1621 TopDoc = "\
1622 <h2>Please choose one of the following actions:</h2>\
1623 <UL>\
1624 <LI>\
1625 <A HREF=" MyPrefix "/AboutServer>About this server</A>\
1626 </LI><LI>\
1627 <A HREF=" MyPrefix "/AboutELIZA>About Eliza</A></LI>\
1628 <LI>\
1629 <A HREF=" MyPrefix \
1630 "/StartELIZA>Start talking to Eliza</A></LI></UL>"
1631 TopFooter = "</BODY></HTML>"
1632 }
1633
1634`SetUpServer' is similar to the previous example, except for calling
1635another function, `SetUpEliza'. This approach can be used to implement
1636other kinds of servers. The only changes needed to do so are hidden in
1637the functions `SetUpServer' and `HandleGET'. Perhaps it might be
1638necessary to implement other HTTP methods. The `igawk' program that
1639comes with `gawk' may be useful for this process.
1640
1641When extending this example to a complete application, the first thing
1642to do is to implement the function `SetUpServer' to initialize the HTML
1643pages and some variables. These initializations determine the way your
1644HTML pages look (colors, titles, menu items, etc.).
1645
1646The function `HandleGET' is a nested case selection that decides which
1647page the user wants to see next. Each nesting level refers to a menu
1648level of the GUI. Each case implements a certain action of the menu. On
1649the deepest level of case selection, the handler essentially knows what
1650the user wants and stores the answer into the variable that holds the
1651HTML page contents:
1652
1653 function HandleGET() {
1654 # A real HTTP server would treat some parts of the URI as a file name.
1655 # We take parts of the URI as menu choices and go on accordingly.
1656 if(MENU[2] == "AboutServer") {
1657 Document = "This is not a CGI script.\
1658 This is an httpd, an HTML file, and a CGI script all \
1659 in one GAWK script. It needs no separate www-server, \
1660 no installation, and no root privileges.\
1661 <p>To run it, do this:</p><ul>\
1662 <li> start this script with \"gawk -f httpserver.awk\",</li>\
1663 <li> and on the same host let your www browser open location\
1664 \"http://localhost:8080\"</li>\
1665 </ul>\<p>\ Details of HTTP come from:</p><ul>\
1666 <li>Hethmon: Illustrated Guide to HTTP</p>\
1667 <li>RFC 2068</li></ul><p>JK 14.9.1997</p>"
1668 } else if (MENU[2] == "AboutELIZA") {
1669 Document = "This is an implementation of the famous ELIZA\
1670 program by Joseph Weizenbaum. It is written in GAWK and\
1671 /bin/sh: expad: command not found
1672 } else if (MENU[2] == "StartELIZA") {
1673 gsub(/\+/, " ", GETARG["YouSay"])
1674 # Here we also have to substitute coded special characters
1675 Document = "<form method=GET>" \
1676 "<h3>" ElizaSays(GETARG["YouSay"]) "</h3>\
1677 <p><input type=text name=YouSay value=\"\" size=60>\
1678 <br><input type=submit value=\"Tell her about it\"></p></form>"
1679 }
1680 }
1681
1682Now we are down to the heart of ELIZA, so you can see how it works.
1683Initially the user does not say anything; then ELIZA resets its money
1684counter and asks the user to tell what comes to mind open heartedly.
1685The subsequent answers are converted to uppercase characters and stored
1686for later comparison. ELIZA presents the bill when being confronted with
1687a sentence that contains the phrase "shut up." Otherwise, it looks for
1688keywords in the sentence, conjugates the rest of the sentence, remembers
1689the keyword for later use, and finally selects an answer from the set of
1690possible answers:
1691
1692 function ElizaSays(YouSay) {
1693 if (YouSay == "") {
1694 cost = 0
1695 answer = "HI, IM ELIZA, TELL ME YOUR PROBLEM"
1696 } else {
1697 q = toupper(YouSay)
1698 gsub("'", "", q)
1699 if(q == qold) {
1700 answer = "PLEASE DONT REPEAT YOURSELF !"
1701 } else {
1702 if (index(q, "SHUT UP") > 0) {
1703 answer = "WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $"\
1704 int(100*rand()+30+cost/100)
1705 } else {
1706 qold = q
1707 w = "-" # no keyword recognized yet
1708 for (i in k) { # search for keywords
1709 if (index(q, i) > 0) {
1710 w = i
1711 break
1712 }
1713 }
1714 if (w == "-") { # no keyword, take old subject
1715 w = wold
1716 subj = subjold
1717 } else { # find subject
1718 subj = substr(q, index(q, w) + length(w)+1)
1719 wold = w
1720 subjold = subj # remember keyword and subject
1721 }
1722 for (i in conj)
1723 gsub(i, conj[i], q) # conjugation
1724 # from all answers to this keyword, select one randomly
1725 answer = r[indices[int(split(k[w], indices) * rand()) + 1]]
1726 # insert subject into answer
1727 gsub("_", subj, answer)
1728 }
1729 }
1730 }
1731 cost += length(answer) # for later payment : 1 cent per character
1732 return answer
1733 }
1734
1735In the long but simple function `SetUpEliza', you can see tables for
1736conjugation, keywords, and answers.(1) The associative array `k'
1737contains indices into the array of answers `r'. To choose an answer,
1738ELIZA just picks an index randomly:
1739
1740 function SetUpEliza() {
1741 srand()
1742 wold = "-"
1743 subjold = " "
1744
1745 # table for conjugation
1746 conj[" ARE " ] = " AM "
1747 conj["WERE " ] = "WAS "
1748 conj[" YOU " ] = " I "
1749 conj["YOUR " ] = "MY "
1750 conj[" IVE " ] =\
1751 conj[" I HAVE " ] = " YOU HAVE "
1752 conj[" YOUVE " ] =\
1753 conj[" YOU HAVE "] = " I HAVE "
1754 conj[" IM " ] =\
1755 conj[" I AM " ] = " YOU ARE "
1756 conj[" YOURE " ] =\
1757 conj[" YOU ARE " ] = " I AM "
1758
1759 # table of all answers
1760 r[1] = "DONT YOU BELIEVE THAT I CAN _"
1761 r[2] = "PERHAPS YOU WOULD LIKE TO BE ABLE TO _ ?"
1762 ...
1763
1764 # table for looking up answers that
1765 # fit to a certain keyword
1766 k["CAN YOU"] = "1 2 3"
1767 k["CAN I"] = "4 5"
1768 k["YOU ARE"] =\
1769 k["YOURE"] = "6 7 8 9"
1770 ...
1771
1772 }
1773
1774Some interesting remarks and details (including the original source code
1775of ELIZA) are found on Mark Humphrys' home page. Yahoo! also has a
1776page with a collection of ELIZA-like programs. Many of them are written
1777in Java, some of them disclosing the Java source code, and a few even
1778explain how to modify the Java source code.
1779
1780---------- Footnotes ----------
1781
1782(1) The version shown here is abbreviated. The full version comes with
1783the `gawk' distribution.
1784
1785
1786File: gawkinet.info, Node: Caveats, Next: Challenges, Prev: Simple Server, Up: Using Networking
1787
17882.11 Network Programming Caveats
1789================================
1790
1791By now it should be clear that debugging a networked application is more
1792complicated than debugging a single-process single-hosted application.
1793The behavior of a networked application sometimes looks noncausal
1794because it is not reproducible in a strong sense. Whether a network
1795application works or not sometimes depends on the following:
1796
1797 * How crowded the underlying network is
1798
1799 * If the party at the other end is running or not
1800
1801 * The state of the party at the other end
1802
1803The most difficult problems for a beginner arise from the hidden states
1804of the underlying network. After closing a TCP connection, it's often
1805necessary to wait a short while before reopening the connection. Even
1806more difficult is the establishment of a connection that previously
1807ended with a "broken pipe." Those connections have to "time out" for a
1808minute or so before they can reopen. Check this with the command
1809`netstat -a', which provides a list of still "active" connections.
1810
1811
1812File: gawkinet.info, Node: Challenges, Prev: Caveats, Up: Using Networking
1813
18142.12 Where To Go From Here
1815==========================
1816
1817Now, you have learned enough to build your own application. You could,
1818for example, take part in the Loebner Contest to win $100,000. The
1819Loebner Prize is the first formal instantiation of a Turing Test. Hugh
1820Loebner agreed with The Cambridge Center for Behavioral Studies to
1821underwrite a contest designed to implement the Turing Test. Dr.
1822Loebner pledged a Grand Prize of $100,000 for the first computer whose
1823responses were indistinguishable from a human's. Each year an annual
1824prize of $2000 and a bronze medal is awarded to the _most_ human
1825computer. The winner of the annual contest is the best entry relative
1826to other entries that year, irrespective of how good it is in an
1827absolute sense. Here is an example of a conversation with the winning
1828program of 1997:
1829
1830 (c)1997 Cambridge Center For Behavioral Studies all rights reserved [CONVERSE]
1831 [Intelligent Research Ltd.] Start at: [1997/04/29 08:59:40]
1832 PROGRAM[10:39:20]Hi, my name is Catherine, what's yours?
1833 JUDGE02[10:39:36]My name is Nathan.
1834 PROGRAM[10:39:37]Well, it's nice to meet you, Nathan.
1835 PROGRAM[10:39:45]Did you see that story on CNN last night about the lesbian
1836 PROGRAM[10:40:03]couple who came out at a White House party on Sunday?
1837 JUDGE02[10:40:45]No. I'm sorry I missed that. How did the Prez take it?
1838 PROGRAM[10:40:47]Ellen Degeneres was one of them - she was kissing her lover
1839 PROGRAM[10:41:08]in front of Bill Clinton. Do you think he was very
1840 PROGRAM[10:41:21]embarrassed?
1841 JUDGE02[10:41:37]You tell me. I didn't see it.
1842 PROGRAM[10:41:38]Are you still a Friend of Bill Clinton?
1843 JUDGE02[10:42:22]Yes. I would like to talk about something else. Do you know anything about sculpture.
1844 PROGRAM[10:42:24]Well, I suppose he's doing his best like most of us, but he
1845 PROGRAM[10:42:35]does seem to attract an UNUSUAL amount of bad publicity,
1846 PROGRAM[10:42:51]wouldnt you say--can it all be bad luck?
1847 JUDGE02[10:43:20]I asked you if you knew anything about sculpture.
1848 PROGRAM[10:43:21]OK, but what happens now--does it all drift on, with
1849 PROGRAM[10:43:36]everyone waiting to see how much his friends have said about
1850 PROGRAM[10:43:47]him under threat of lots and lots of jail-or will something
1851 PROGRAM[10:44:01]dramatic actually happen?
1852 JUDGE02[10:44:19]Do you know what sculpture is?
1853 PROGRAM[10:44:20]Well, youre more a fan of his than I am I guess---though I
1854 PROGRAM[10:44:34]cant SWEAR what Id have done in that locked hotel room, I
1855 PROGRAM[10:44:51]must say.
1856 JUDGE02[10:45:26]Have you ever been to the Salmagundi Club?
1857 PROGRAM[10:45:29]Never. to go back to the original discussion Time to get
1858 PROGRAM[10:45:43]off this one I think and onto something more down to earth!
1859 JUDGE02[10:46:20]What is more down to earth than art?
1860 PROGRAM[10:46:21]I don't know what -- to be honest. -
1861 PROGRAM[10:46:30]I suppose it depends on where you're coming from, but as the
1862 PROGRAM[10:46:46]song goes in My Fair Lady, "Why can't a woman be more like a
1863 PROGRAM[10:47:03]man? "
1864 JUDGE02[10:47:52]I liked that song but I prefer "Why can't children be more like we are"
1865 JUDGE02[10:48:10]from another play the name of which I cannot remember.
1866 PROGRAM[10:48:12]What, if anything, do you enjoy about being a man?
1867 JUDGE02[10:49:01]I would enjoy all aspects of life no matter what my gender.
1868 PROGRAM[10:49:02]Do you think of yourself as being attractive?
1869
1870This program insists on always speaking about the same story around Bill
1871Clinton. You see, even a program with a rather narrow mind can behave so
1872much like a human being that it can win this prize. It is quite common
1873to let these programs talk to each other via network connections. But
1874during the competition itself, the program and its computer have to be
1875present at the place the competition is held. We all would love to see
1876a `gawk' program win in such an event. Maybe it is up to you to
1877accomplish this?
1878
1879Some other ideas for useful networked applications:
1880 * Read the file `doc/awkforai.txt' in the `gawk' distribution. It
1881 was written by Ronald P. Loui (Associate Professor of Computer
1882 Science, at Washington University in St. Louis,
1883 <loui@ai.wustl.edu>) and summarizes why he teaches `gawk' to
1884 students of Artificial Intelligence. Here are some passages from
1885 the text:
1886
1887 The GAWK manual can be consumed in a single lab session and
1888 the language can be mastered by the next morning by the
1889 average student. GAWK's automatic initialization, implicit
1890 coercion, I/O support and lack of pointers forgive many of
1891 the mistakes that young programmers are likely to make.
1892 Those who have seen C but not mastered it are happy to see
1893 that GAWK retains some of the same sensibilities while adding
1894 what must be regarded as spoonsful of syntactic sugar.
1895 ...
1896 There are further simple answers. Probably the best is the
1897 fact that increasingly, undergraduate AI programming is
1898 involving the Web. Oren Etzioni (University of Washington,
1899 Seattle) has for a while been arguing that the "softbot" is
1900 replacing the mechanical engineers' robot as the most
1901 glamorous AI testbed. If the artifact whose behavior needs
1902 to be controlled in an intelligent way is the software agent,
1903 then a language that is well-suited to controlling the
1904 software environment is the appropriate language. That would
1905 imply a scripting language. If the robot is KAREL, then the
1906 right language is "turn left; turn right." If the robot is
1907 Netscape, then the right language is something that can
1908 generate `netscape -remote
1909 'openURL(http://cs.wustl.edu/~loui)'' with elan.
1910 ...
1911 AI programming requires high-level thinking. There have
1912 always been a few gifted programmers who can write high-level
1913 programs in assembly language. Most however need the ambient
1914 abstraction to have a higher floor.
1915 ...
1916 Second, inference is merely the expansion of notation. No
1917 matter whether the logic that underlies an AI program is
1918 fuzzy, probabilistic, deontic, defeasible, or deductive, the
1919 logic merely defines how strings can be transformed into
1920 other strings. A language that provides the best support for
1921 string processing in the end provides the best support for
1922 logic, for the exploration of various logics, and for most
1923 forms of symbolic processing that AI might choose to call
1924 "reasoning" instead of "logic." The implication is that
1925 PROLOG, which saves the AI programmer from having to write a
1926 unifier, saves perhaps two dozen lines of GAWK code at the
1927 expense of strongly biasing the logic and representational
1928 expressiveness of any approach.
1929
1930 Now that `gawk' itself can connect to the Internet, it should be
1931 obvious that it is suitable for writing intelligent web agents.
1932
1933 * `awk' is strong at pattern recognition and string processing. So,
1934 it is well suited to the classic problem of language translation.
1935 A first try could be a program that knows the 100 most frequent
1936 English words and their counterparts in German or French. The
1937 service could be implemented by regularly reading email with the
1938 program above, replacing each word by its translation and sending
1939 the translation back via SMTP. Users would send English email to
1940 their translation service and get back a translated email message
1941 in return. As soon as this works, more effort can be spent on a
1942 real translation program.
1943
1944 * Another dialogue-oriented application (on the verge of ridicule)
1945 is the email "support service." Troubled customers write an email
1946 to an automatic `gawk' service that reads the email. It looks for
1947 keywords in the mail and assembles a reply email accordingly. By
1948 carefully investigating the email header, and repeating these
1949 keywords through the reply email, it is rather simple to give the
1950 customer a feeling that someone cares. Ideally, such a service
1951 would search a database of previous cases for solutions. If none
1952 exists, the database could, for example, consist of all the
1953 newsgroups, mailing lists and FAQs on the Internet.
1954
1955
1956File: gawkinet.info, Node: Some Applications and Techniques, Next: Links, Prev: Using Networking, Up: Top
1957
19583 Some Applications and Techniques
1959**********************************
1960
1961In this major node, we look at a number of self-contained scripts, with
1962an emphasis on concise networking. Along the way, we work towards
1963creating building blocks that encapsulate often needed functions of the
1964networking world, show new techniques that broaden the scope of
1965problems that can be solved with `gawk', and explore leading edge
1966technology that may shape the future of networking.
1967
1968We often refer to the site-independent core of the server that we built
1969in *Note A Simple Web Server: Simple Server. When building new and
1970nontrivial servers, we always copy this building block and append new
1971instances of the two functions `SetUpServer' and `HandleGET'.
1972
1973This makes a lot of sense, since this scheme of event-driven execution
1974provides `gawk' with an interface to the most widely accepted standard
1975for GUIs: the web browser. Now, `gawk' can rival even Tcl/Tk.
1976
1977Tcl and `gawk' have much in common. Both are simple scripting languages
1978that allow us to quickly solve problems with short programs. But Tcl
1979has Tk on top of it, and `gawk' had nothing comparable up to now. While
1980Tcl needs a large and ever-changing library (Tk, which was bound to the
1981X Window System until recently), `gawk' needs just the networking
1982interface and some kind of browser on the client's side. Besides better
1983portability, the most important advantage of this approach (embracing
1984well-established standards such HTTP and HTML) is that _we do not need
1985to change the language_. We let others do the work of fighting over
1986protocols and standards. We can use HTML, JavaScript, VRML, or
1987whatever else comes along to do our work.
1988
1989* Menu:
1990
1991* PANIC:: An Emergency Web Server.
1992* GETURL:: Retrieving Web Pages.
1993* REMCONF:: Remote Configuration Of Embedded Systems.
1994* URLCHK:: Look For Changed Web Pages.
1995* WEBGRAB:: Extract Links From A Page.
1996* STATIST:: Graphing A Statistical Distribution.
1997* MAZE:: Walking Through A Maze In Virtual Reality.
1998* MOBAGWHO:: A Simple Mobile Agent.
1999* STOXPRED:: Stock Market Prediction As A Service.
2000* PROTBASE:: Searching Through A Protein Database.
2001
2002
2003File: gawkinet.info, Node: PANIC, Next: GETURL, Prev: Some Applications and Techniques, Up: Some Applications and Techniques
2004
20053.1 PANIC: An Emergency Web Server
2006==================================
2007
2008At first glance, the `"Hello, world"' example in *Note A Primitive Web
2009Service: Primitive Service, seems useless. By adding just a few lines,
2010we can turn it into something useful.
2011
2012The PANIC program tells everyone who connects that the local site is
2013not working. When a web server breaks down, it makes a difference if
2014customers get a strange "network unreachable" message, or a short
2015message telling them that the server has a problem. In such an
2016emergency, the hard disk and everything on it (including the regular
2017web service) may be unavailable. Rebooting the web server off a
2018diskette makes sense in this setting.
2019
2020To use the PANIC program as an emergency web server, all you need are
2021the `gawk' executable and the program below on a diskette. By default,
2022it connects to port 8080. A different value may be supplied on the
2023command line:
2024
2025 BEGIN {
2026 RS = ORS = "\r\n"
2027 if (MyPort == 0) MyPort = 8080
2028 HttpService = "/inet/tcp/" MyPort "/0/0"
2029 Hello = "<HTML><HEAD><TITLE>Out Of Service</TITLE>" \
2030 "</HEAD><BODY><H1>" \
2031 "This site is temporarily out of service." \
2032 "</H1></BODY></HTML>"
2033 Len = length(Hello) + length(ORS)
2034 while ("awk" != "complex") {
2035 print "HTTP/1.0 200 OK" |& HttpService
2036 print "Content-Length: " Len ORS |& HttpService
2037 print Hello |& HttpService
2038 while ((HttpService |& getline) > 0)
2039 continue;
2040 close(HttpService)
2041 }
2042 }
2043
2044
2045File: gawkinet.info, Node: GETURL, Next: REMCONF, Prev: PANIC, Up: Some Applications and Techniques
2046
20473.2 GETURL: Retrieving Web Pages
2048================================
2049
2050GETURL is a versatile building block for shell scripts that need to
2051retrieve files from the Internet. It takes a web address as a
2052command-line parameter and tries to retrieve the contents of this
2053address. The contents are printed to standard output, while the header
2054is printed to `/dev/stderr'. A surrounding shell script could analyze
2055the contents and extract the text or the links. An ASCII browser could
2056be written around GETURL. But more interestingly, web robots are
2057straightforward to write on top of GETURL. On the Internet, you can find
2058several programs of the same name that do the same job. They are usually
2059much more complex internally and at least 10 times longer.
2060
2061At first, GETURL checks if it was called with exactly one web address.
2062Then, it checks if the user chose to use a special proxy server whose
2063name is handed over in a variable. By default, it is assumed that the
2064local machine serves as proxy. GETURL uses the `GET' method by default
2065to access the web page. By handing over the name of a different method
2066(such as `HEAD'), it is possible to choose a different behavior. With
2067the `HEAD' method, the user does not receive the body of the page
2068content, but does receive the header:
2069
2070 BEGIN {
2071 if (ARGC != 2) {
2072 print "GETURL - retrieve Web page via HTTP 1.0"
2073 print "IN:\n the URL as a command-line parameter"
2074 print "PARAM(S):\n -v Proxy=MyProxy"
2075 print "OUT:\n the page content on stdout"
2076 print " the page header on stderr"
2077 print "JK 16.05.1997"
2078 print "ADR 13.08.2000"
2079 exit
2080 }
2081 URL = ARGV[1]; ARGV[1] = ""
2082 if (Proxy == "") Proxy = "127.0.0.1"
2083 if (ProxyPort == 0) ProxyPort = 80
2084 if (Method == "") Method = "GET"
2085 HttpService = "/inet/tcp/0/" Proxy "/" ProxyPort
2086 ORS = RS = "\r\n\r\n"
2087 print Method " " URL " HTTP/1.0" |& HttpService
2088 HttpService |& getline Header
2089 print Header > "/dev/stderr"
2090 while ((HttpService |& getline) > 0)
2091 printf "%s", $0
2092 close(HttpService)
2093 }
2094
2095This program can be changed as needed, but be careful with the last
2096lines. Make sure transmission of binary data is not corrupted by
2097additional line breaks. Even as it is now, the byte sequence
2098`"\r\n\r\n"' would disappear if it were contained in binary data. Don't
2099get caught in a trap when trying a quick fix on this one.
2100
2101
2102File: gawkinet.info, Node: REMCONF, Next: URLCHK, Prev: GETURL, Up: Some Applications and Techniques
2103
21043.3 REMCONF: Remote Configuration of Embedded Systems
2105=====================================================
2106
2107Today, you often find powerful processors in embedded systems.
2108Dedicated network routers and controllers for all kinds of machinery
2109are examples of embedded systems. Processors like the Intel 80x86 or
2110the AMD Elan are able to run multitasking operating systems, such as
2111XINU or GNU/Linux in embedded PCs. These systems are small and usually
2112do not have a keyboard or a display. Therefore it is difficult to set
2113up their configuration. There are several widespread ways to set them
2114up:
2115
2116 * DIP switches
2117
2118 * Read Only Memories such as EPROMs
2119
2120 * Serial lines or some kind of keyboard
2121
2122 * Network connections via `telnet' or SNMP
2123
2124 * HTTP connections with HTML GUIs
2125
2126In this node, we look at a solution that uses HTTP connections to
2127control variables of an embedded system that are stored in a file.
2128Since embedded systems have tight limits on resources like memory, it
2129is difficult to employ advanced techniques such as SNMP and HTTP
2130servers. `gawk' fits in quite nicely with its single executable which
2131needs just a short script to start working. The following program
2132stores the variables in a file, and a concurrent process in the
2133embedded system may read the file. The program uses the
2134site-independent part of the simple web server that we developed in
2135*Note A Web Service with Interaction: Interacting Service. As
2136mentioned there, all we have to do is to write two new procedures
2137`SetUpServer' and `HandleGET':
2138
2139 function SetUpServer() {
2140 TopHeader = "<HTML><title>Remote Configuration</title>"
2141 TopDoc = "<BODY>\
2142 <h2>Please choose one of the following actions:</h2>\
2143 <UL>\
2144 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
2145 <LI><A HREF=" MyPrefix "/ReadConfig>Read Configuration</A></LI>\
2146 <LI><A HREF=" MyPrefix "/CheckConfig>Check Configuration</A></LI>\
2147 <LI><A HREF=" MyPrefix "/ChangeConfig>Change Configuration</A></LI>\
2148 <LI><A HREF=" MyPrefix "/SaveConfig>Save Configuration</A></LI>\
2149 </UL>"
2150 TopFooter = "</BODY></HTML>"
2151 if (ConfigFile == "") ConfigFile = "config.asc"
2152 }
2153
2154The function `SetUpServer' initializes the top level HTML texts as
2155usual. It also initializes the name of the file that contains the
2156configuration parameters and their values. In case the user supplies a
2157name from the command line, that name is used. The file is expected to
2158contain one parameter per line, with the name of the parameter in
2159column one and the value in column two.
2160
2161The function `HandleGET' reflects the structure of the menu tree as
2162usual. The first menu choice tells the user what this is all about. The
2163second choice reads the configuration file line by line and stores the
2164parameters and their values. Notice that the record separator for this
2165file is `"\n"', in contrast to the record separator for HTTP. The third
2166menu choice builds an HTML table to show the contents of the
2167configuration file just read. The fourth choice does the real work of
2168changing parameters, and the last one just saves the configuration into
2169a file:
2170
2171 function HandleGET() {
2172 if(MENU[2] == "AboutServer") {
2173 Document = "This is a GUI for remote configuration of an\
2174 embedded system. It is is implemented as one GAWK script."
2175 } else if (MENU[2] == "ReadConfig") {
2176 RS = "\n"
2177 while ((getline < ConfigFile) > 0)
2178 config[$1] = $2;
2179 close(ConfigFile)
2180 RS = "\r\n"
2181 Document = "Configuration has been read."
2182 } else if (MENU[2] == "CheckConfig") {
2183 Document = "<TABLE BORDER=1 CELLPADDING=5>"
2184 for (i in config)
2185 Document = Document "<TR><TD>" i "</TD>" \
2186 "<TD>" config[i] "</TD></TR>"
2187 Document = Document "</TABLE>"
2188 } else if (MENU[2] == "ChangeConfig") {
2189 if ("Param" in GETARG) { # any parameter to set?
2190 if (GETARG["Param"] in config) { # is parameter valid?
2191 config[GETARG["Param"]] = GETARG["Value"]
2192 Document = (GETARG["Param"] " = " GETARG["Value"] ".")
2193 } else {
2194 Document = "Parameter <b>" GETARG["Param"] "</b> is invalid."
2195 }
2196 } else {
2197 Document = "<FORM method=GET><h4>Change one parameter</h4>\
2198 <TABLE BORDER CELLPADDING=5>\
2199 <TR><TD>Parameter</TD><TD>Value</TD></TR>\
2200 <TR><TD><input type=text name=Param value=\"\" size=20></TD>\
2201 <TD><input type=text name=Value value=\"\" size=40></TD>\
2202 </TR></TABLE><input type=submit value=\"Set\"></FORM>"
2203 }
2204 } else if (MENU[2] == "SaveConfig") {
2205 for (i in config)
2206 printf("%s %s\n", i, config[i]) > ConfigFile
2207 close(ConfigFile)
2208 Document = "Configuration has been saved."
2209 }
2210 }
2211
2212We could also view the configuration file as a database. From this
2213point of view, the previous program acts like a primitive database
2214server. Real SQL database systems also make a service available by
2215providing a TCP port that clients can connect to. But the application
2216level protocols they use are usually proprietary and also change from
2217time to time. This is also true for the protocol that MiniSQL uses.
2218
2219
2220File: gawkinet.info, Node: URLCHK, Next: WEBGRAB, Prev: REMCONF, Up: Some Applications and Techniques
2221
22223.4 URLCHK: Look for Changed Web Pages
2223======================================
2224
2225Most people who make heavy use of Internet resources have a large
2226bookmark file with pointers to interesting web sites. It is impossible
2227to regularly check by hand if any of these sites have changed. A program
2228is needed to automatically look at the headers of web pages and tell
2229which ones have changed. URLCHK does the comparison after using GETURL
2230with the `HEAD' method to retrieve the header.
2231
2232Like GETURL, this program first checks that it is called with exactly
2233one command-line parameter. URLCHK also takes the same command-line
2234variables `Proxy' and `ProxyPort' as GETURL, because these variables
2235are handed over to GETURL for each URL that gets checked. The one and
2236only parameter is the name of a file that contains one line for each
2237URL. In the first column, we find the URL, and the second and third
2238columns hold the length of the URL's body when checked for the two last
2239times. Now, we follow this plan:
2240
2241 1. Read the URLs from the file and remember their most recent lengths
2242
2243 2. Delete the contents of the file
2244
2245 3. For each URL, check its new length and write it into the file
2246
2247 4. If the most recent and the new length differ, tell the user
2248
2249It may seem a bit peculiar to read the URLs from a file together with
2250their two most recent lengths, but this approach has several
2251advantages. You can call the program again and again with the same
2252file. After running the program, you can regenerate the changed URLs by
2253extracting those lines that differ in their second and third columns:
2254
2255 BEGIN {
2256 if (ARGC != 2) {
2257 print "URLCHK - check if URLs have changed"
2258 print "IN:\n the file with URLs as a command-line parameter"
2259 print " file contains URL, old length, new length"
2260 print "PARAMS:\n -v Proxy=MyProxy -v ProxyPort=8080"
2261 print "OUT:\n same as file with URLs"
2262 print "JK 02.03.1998"
2263 exit
2264 }
2265 URLfile = ARGV[1]; ARGV[1] = ""
2266 if (Proxy != "") Proxy = " -v Proxy=" Proxy
2267 if (ProxyPort != "") ProxyPort = " -v ProxyPort=" ProxyPort
2268 while ((getline < URLfile) > 0)
2269 Length[$1] = $3 + 0
2270 close(URLfile) # now, URLfile is read in and can be updated
2271 GetHeader = "gawk " Proxy ProxyPort " -v Method=\"HEAD\" -f geturl.awk "
2272 for (i in Length) {
2273 GetThisHeader = GetHeader i " 2>&1"
2274 while ((GetThisHeader | getline) > 0)
2275 if (toupper($0) ~ /CONTENT-LENGTH/) NewLength = $2 + 0
2276 close(GetThisHeader)
2277 print i, Length[i], NewLength > URLfile
2278 if (Length[i] != NewLength) # report only changed URLs
2279 print i, Length[i], NewLength
2280 }
2281 close(URLfile)
2282 }
2283
2284Another thing that may look strange is the way GETURL is called.
2285Before calling GETURL, we have to check if the proxy variables need to
2286be passed on. If so, we prepare strings that will become part of the
2287command line later. In `GetHeader', we store these strings together
2288with the longest part of the command line. Later, in the loop over the
2289URLs, `GetHeader' is appended with the URL and a redirection operator
2290to form the command that reads the URL's header over the Internet.
2291GETURL always produces the headers over `/dev/stderr'. That is the
2292reason why we need the redirection operator to have the header piped in.
2293
2294This program is not perfect because it assumes that changing URLs
2295results in changed lengths, which is not necessarily true. A more
2296advanced approach is to look at some other header line that holds time
2297information. But, as always when things get a bit more complicated,
2298this is left as an exercise to the reader.
2299
2300
2301File: gawkinet.info, Node: WEBGRAB, Next: STATIST, Prev: URLCHK, Up: Some Applications and Techniques
2302
23033.5 WEBGRAB: Extract Links from a Page
2304======================================
2305
2306Sometimes it is necessary to extract links from web pages. Browsers do
2307it, web robots do it, and sometimes even humans do it. Since we have a
2308tool like GETURL at hand, we can solve this problem with some help from
2309the Bourne shell:
2310
2311 BEGIN { RS = "http://[#%&\\+\\-\\./0-9\\:;\\?A-Z_a-z\\~]*" }
2312 RT != "" {
2313 command = ("gawk -v Proxy=MyProxy -f geturl.awk " RT \
2314 " > doc" NR ".html")
2315 print command
2316 }
2317
2318Notice that the regular expression for URLs is rather crude. A precise
2319regular expression is much more complex. But this one works rather
2320well. One problem is that it is unable to find internal links of an
2321HTML document. Another problem is that `ftp', `telnet', `news',
2322`mailto', and other kinds of links are missing in the regular
2323expression. However, it is straightforward to add them, if doing so is
2324necessary for other tasks.
2325
2326This program reads an HTML file and prints all the HTTP links that it
2327finds. It relies on `gawk''s ability to use regular expressions as
2328record separators. With `RS' set to a regular expression that matches
2329links, the second action is executed each time a non-empty link is
2330found. We can find the matching link itself in `RT'.
2331
2332The action could use the `system' function to let another GETURL
2333retrieve the page, but here we use a different approach. This simple
2334program prints shell commands that can be piped into `sh' for
2335execution. This way it is possible to first extract the links, wrap
2336shell commands around them, and pipe all the shell commands into a
2337file. After editing the file, execution of the file retrieves exactly
2338those files that we really need. In case we do not want to edit, we can
2339retrieve all the pages like this:
2340
2341 gawk -f geturl.awk http://www.suse.de | gawk -f webgrab.awk | sh
2342
2343After this, you will find the contents of all referenced documents in
2344files named `doc*.html' even if they do not contain HTML code. The
2345most annoying thing is that we always have to pass the proxy to GETURL.
2346If you do not like to see the headers of the web pages appear on the
2347screen, you can redirect them to `/dev/null'. Watching the headers
2348appear can be quite interesting, because it reveals interesting details
2349such as which web server the companies use. Now, it is clear how the
2350clever marketing people use web robots to determine the market shares
2351of Microsoft and Netscape in the web server market.
2352
2353Port 80 of any web server is like a small hole in a repellent firewall.
2354After attaching a browser to port 80, we usually catch a glimpse of the
2355bright side of the server (its home page). With a tool like GETURL at
2356hand, we are able to discover some of the more concealed or even
2357"indecent" services (i.e., lacking conformity to standards of quality).
2358It can be exciting to see the fancy CGI scripts that lie there,
2359revealing the inner workings of the server, ready to be called:
2360
2361 * With a command such as:
2362
2363 gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/
2364
2365 some servers give you a directory listing of the CGI files.
2366 Knowing the names, you can try to call some of them and watch for
2367 useful results. Sometimes there are executables in such directories
2368 (such as Perl interpreters) that you may call remotely. If there
2369 are subdirectories with configuration data of the web server, this
2370 can also be quite interesting to read.
2371
2372 * The well-known Apache web server usually has its CGI files in the
2373 directory `/cgi-bin'. There you can often find the scripts
2374 `test-cgi' and `printenv'. Both tell you some things about the
2375 current connection and the installation of the web server. Just
2376 call:
2377
2378 gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/test-cgi
2379 gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/printenv
2380
2381 * Sometimes it is even possible to retrieve system files like the web
2382 server's log file--possibly containing customer data--or even the
2383 file `/etc/passwd'. (We don't recommend this!)
2384
2385*Caution:* Although this may sound funny or simply irrelevant, we are
2386talking about severe security holes. Try to explore your own system
2387this way and make sure that none of the above reveals too much
2388information about your system.
2389
2390
2391File: gawkinet.info, Node: STATIST, Next: MAZE, Prev: WEBGRAB, Up: Some Applications and Techniques
2392
23933.6 STATIST: Graphing a Statistical Distribution
2394================================================
2395
2396In the HTTP server examples we've shown thus far, we never present an
2397image to the browser and its user. Presenting images is one task.
2398Generating images that reflect some user input and presenting these
2399dynamically generated images is another. In this node, we use GNUPlot
2400for generating `.png', `.ps', or `.gif' files.(1)
2401
2402The program we develop takes the statistical parameters of two samples
2403and computes the t-test statistics. As a result, we get the
2404probabilities that the means and the variances of both samples are the
2405same. In order to let the user check plausibility, the program presents
2406an image of the distributions. The statistical computation follows
2407`Numerical Recipes in C: The Art of Scientific Computing' by William H.
2408Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.
2409Since `gawk' does not have a built-in function for the computation of
2410the beta function, we use the `ibeta' function of GNUPlot. As a side
2411effect, we learn how to use GNUPlot as a sophisticated calculator. The
2412comparison of means is done as in `tutest', paragraph 14.2, page 613,
2413and the comparison of variances is done as in `ftest', page 611 in
2414`Numerical Recipes'.
2415
2416As usual, we take the site-independent code for servers and append our
2417own functions `SetUpServer' and `HandleGET':
2418
2419 function SetUpServer() {
2420 TopHeader = "<HTML><title>Statistics with GAWK</title>"
2421 TopDoc = "<BODY>\
2422 <h2>Please choose one of the following actions:</h2>\
2423 <UL>\
2424 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
2425 <LI><A HREF=" MyPrefix "/EnterParameters>Enter Parameters</A></LI>\
2426 </UL>"
2427 TopFooter = "</BODY></HTML>"
2428 GnuPlot = "gnuplot 2>&1"
2429 m1=m2=0; v1=v2=1; n1=n2=10
2430 }
2431
2432Here, you see the menu structure that the user sees. Later, we will see
2433how the program structure of the `HandleGET' function reflects the menu
2434structure. What is missing here is the link for the image we generate.
2435In an event-driven environment, request, generation, and delivery of
2436images are separated.
2437
2438Notice the way we initialize the `GnuPlot' command string for the pipe.
2439By default, GNUPlot outputs the generated image via standard output, as
2440well as the results of `print'(ed) calculations via standard error.
2441The redirection causes standard error to be mixed into standard output,
2442enabling us to read results of calculations with `getline'. By
2443initializing the statistical parameters with some meaningful defaults,
2444we make sure the user gets an image the first time he uses the program.
2445
2446Following is the rather long function `HandleGET', which implements the
2447contents of this service by reacting to the different kinds of requests
2448from the browser. Before you start playing with this script, make sure
2449that your browser supports JavaScript and that it also has this option
2450switched on. The script uses a short snippet of JavaScript code for
2451delayed opening of a window with an image. A more detailed explanation
2452follows:
2453
2454 function HandleGET() {
2455 if(MENU[2] == "AboutServer") {
2456 Document = "This is a GUI for a statistical computation.\
2457 It compares means and variances of two distributions.\
2458 It is implemented as one GAWK script and uses GNUPLOT."
2459 } else if (MENU[2] == "EnterParameters") {
2460 Document = ""
2461 if ("m1" in GETARG) { # are there parameters to compare?
2462 Document = Document "<SCRIPT LANGUAGE=\"JavaScript\">\
2463 setTimeout(\"window.open(\\\"" MyPrefix "/Image" systime()\
2464 "\\\",\\\"dist\\\", \\\"status=no\\\");\", 1000); </SCRIPT>"
2465 m1 = GETARG["m1"]; v1 = GETARG["v1"]; n1 = GETARG["n1"]
2466 m2 = GETARG["m2"]; v2 = GETARG["v2"]; n2 = GETARG["n2"]
2467 t = (m1-m2)/sqrt(v1/n1+v2/n2)
2468 df = (v1/n1+v2/n2)*(v1/n1+v2/n2)/((v1/n1)*(v1/n1)/(n1-1) \
2469 + (v2/n2)*(v2/n2) /(n2-1))
2470 if (v1>v2) {
2471 f = v1/v2
2472 df1 = n1 - 1
2473 df2 = n2 - 1
2474 } else {
2475 f = v2/v1
2476 df1 = n2 - 1
2477 df2 = n1 - 1
2478 }
2479 print "pt=ibeta(" df/2 ",0.5," df/(df+t*t) ")" |& GnuPlot
2480 print "pF=2.0*ibeta(" df2/2 "," df1/2 "," \
2481 df2/(df2+df1*f) ")" |& GnuPlot
2482 print "print pt, pF" |& GnuPlot
2483 RS="\n"; GnuPlot |& getline; RS="\r\n" # $1 is pt, $2 is pF
2484 print "invsqrt2pi=1.0/sqrt(2.0*pi)" |& GnuPlot
2485 print "nd(x)=invsqrt2pi/sd*exp(-0.5*((x-mu)/sd)**2)" |& GnuPlot
2486 print "set term png small color" |& GnuPlot
2487 #print "set term postscript color" |& GnuPlot
2488 #print "set term gif medium size 320,240" |& GnuPlot
2489 print "set yrange[-0.3:]" |& GnuPlot
2490 print "set label 'p(m1=m2) =" $1 "' at 0,-0.1 left" |& GnuPlot
2491 print "set label 'p(v1=v2) =" $2 "' at 0,-0.2 left" |& GnuPlot
2492 print "plot mu=" m1 ",sd=" sqrt(v1) ", nd(x) title 'sample 1',\
2493 mu=" m2 ",sd=" sqrt(v2) ", nd(x) title 'sample 2'" |& GnuPlot
2494 print "quit" |& GnuPlot
2495 GnuPlot |& getline Image
2496 while ((GnuPlot |& getline) > 0)
2497 Image = Image RS $0
2498 close(GnuPlot)
2499 }
2500 Document = Document "\
2501 <h3>Do these samples have the same Gaussian distribution?</h3>\
2502 <FORM METHOD=GET> <TABLE BORDER CELLPADDING=5>\
2503 <TR>\
2504 <TD>1. Mean </TD>
2505 <TD><input type=text name=m1 value=" m1 " size=8></TD>\
2506 <TD>1. Variance</TD>
2507 <TD><input type=text name=v1 value=" v1 " size=8></TD>\
2508 <TD>1. Count </TD>
2509 <TD><input type=text name=n1 value=" n1 " size=8></TD>\
2510 </TR><TR>\
2511 <TD>2. Mean </TD>
2512 <TD><input type=text name=m2 value=" m2 " size=8></TD>\
2513 <TD>2. Variance</TD>
2514 <TD><input type=text name=v2 value=" v2 " size=8></TD>\
2515 <TD>2. Count </TD>
2516 <TD><input type=text name=n2 value=" n2 " size=8></TD>\
2517 </TR> <input type=submit value=\"Compute\">\
2518 </TABLE></FORM><BR>"
2519 } else if (MENU[2] ~ "Image") {
2520 Reason = "OK" ORS "Content-type: image/png"
2521 #Reason = "OK" ORS "Content-type: application/x-postscript"
2522 #Reason = "OK" ORS "Content-type: image/gif"
2523 Header = Footer = ""
2524 Document = Image
2525 }
2526 }
2527
2528As usual, we give a short description of the service in the first menu
2529choice. The third menu choice shows us that generation and presentation
2530of an image are two separate actions. While the latter takes place
2531quite instantly in the third menu choice, the former takes place in the
2532much longer second choice. Image data passes from the generating action
2533to the presenting action via the variable `Image' that contains a
2534complete `.png' image, which is otherwise stored in a file. If you
2535prefer `.ps' or `.gif' images over the default `.png' images, you may
2536select these options by uncommenting the appropriate lines. But
2537remember to do so in two places: when telling GNUPlot which kind of
2538images to generate, and when transmitting the image at the end of the
2539program.
2540
2541Looking at the end of the program, the way we pass the `Content-type'
2542to the browser is a bit unusual. It is appended to the `OK' of the
2543first header line to make sure the type information becomes part of the
2544header. The other variables that get transmitted across the network are
2545made empty, because in this case we do not have an HTML document to
2546transmit, but rather raw image data to contain in the body.
2547
2548Most of the work is done in the second menu choice. It starts with a
2549strange JavaScript code snippet. When first implementing this server,
2550we used a short `"<IMG SRC=" MyPrefix "/Image>"' here. But then
2551browsers got smarter and tried to improve on speed by requesting the
2552image and the HTML code at the same time. When doing this, the browser
2553tries to build up a connection for the image request while the request
2554for the HTML text is not yet completed. The browser tries to connect to
2555the `gawk' server on port 8080 while port 8080 is still in use for
2556transmission of the HTML text. The connection for the image cannot be
2557built up, so the image appears as "broken" in the browser window. We
2558solved this problem by telling the browser to open a separate window
2559for the image, but only after a delay of 1000 milliseconds. By this
2560time, the server should be ready for serving the next request.
2561
2562But there is one more subtlety in the JavaScript code. Each time the
2563JavaScript code opens a window for the image, the name of the image is
2564appended with a timestamp (`systime'). Why this constant change of
2565name for the image? Initially, we always named the image `Image', but
2566then the Netscape browser noticed the name had _not_ changed since the
2567previous request and displayed the previous image (caching behavior).
2568The server core is implemented so that browsers are told _not_ to cache
2569anything. Obviously HTTP requests do not always work as expected. One
2570way to circumvent the cache of such overly smart browsers is to change
2571the name of the image with each request. These three lines of JavaScript
2572caused us a lot of trouble.
2573
2574The rest can be broken down into two phases. At first, we check if
2575there are statistical parameters. When the program is first started,
2576there usually are no parameters because it enters the page coming from
2577the top menu. Then, we only have to present the user a form that he
2578can use to change statistical parameters and submit them. Subsequently,
2579the submission of the form causes the execution of the first phase
2580because _now_ there _are_ parameters to handle.
2581
2582Now that we have parameters, we know there will be an image available.
2583Therefore we insert the JavaScript code here to initiate the opening of
2584the image in a separate window. Then, we prepare some variables that
2585will be passed to GNUPlot for calculation of the probabilities. Prior
2586to reading the results, we must temporarily change `RS' because GNUPlot
2587separates lines with newlines. After instructing GNUPlot to generate a
2588`.png' (or `.ps' or `.gif') image, we initiate the insertion of some
2589text, explaining the resulting probabilities. The final `plot' command
2590actually generates the image data. This raw binary has to be read in
2591carefully without adding, changing, or deleting a single byte. Hence
2592the unusual initialization of `Image' and completion with a `while'
2593loop.
2594
2595When using this server, it soon becomes clear that it is far from being
2596perfect. It mixes source code of six scripting languages or protocols:
2597
2598 * GNU `awk' implements a server for the protocol:
2599
2600 * HTTP which transmits:
2601
2602 * HTML text which contains a short piece of:
2603
2604 * JavaScript code opening a separate window.
2605
2606 * A Bourne shell script is used for piping commands into:
2607
2608 * GNUPlot to generate the image to be opened.
2609
2610After all this work, the GNUPlot image opens in the JavaScript window
2611where it can be viewed by the user.
2612
2613It is probably better not to mix up so many different languages. The
2614result is not very readable. Furthermore, the statistical part of the
2615server does not take care of invalid input. Among others, using
2616negative variances will cause invalid results.
2617
2618---------- Footnotes ----------
2619
2620(1) Due to licensing problems, the default installation of GNUPlot
2621disables the generation of `.gif' files. If your installed version
2622does not accept `set term gif', just download and install the most
2623recent version of GNUPlot and the GD library
2624(http://www.boutell.com/gd/) by Thomas Boutell. Otherwise you still
2625have the chance to generate some ASCII-art style images with GNUPlot by
2626using `set term dumb'. (We tried it and it worked.)
2627
2628
2629File: gawkinet.info, Node: MAZE, Next: MOBAGWHO, Prev: STATIST, Up: Some Applications and Techniques
2630
26313.7 MAZE: Walking Through a Maze In Virtual Reality
2632===================================================
2633
2634 In the long run, every program becomes rococo, and then rubble.
2635 Alan Perlis
2636
2637By now, we know how to present arbitrary `Content-type's to a browser.
2638In this node, our server will present a 3D world to our browser. The
26393D world is described in a scene description language (VRML, Virtual
2640Reality Modeling Language) that allows us to travel through a
2641perspective view of a 2D maze with our browser. Browsers with a VRML
2642plugin enable exploration of this technology. We could do one of those
2643boring `Hello world' examples here, that are usually presented when
2644introducing novices to VRML. If you have never written any VRML code,
2645have a look at the VRML FAQ. Presenting a static VRML scene is a bit
2646trivial; in order to expose `gawk''s new capabilities, we will present
2647a dynamically generated VRML scene. The function `SetUpServer' is very
2648simple because it only sets the default HTML page and initializes the
2649random number generator. As usual, the surrounding server lets you
2650browse the maze.
2651
2652 function SetUpServer() {
2653 TopHeader = "<HTML><title>Walk through a maze</title>"
2654 TopDoc = "\
2655 <h2>Please choose one of the following actions:</h2>\
2656 <UL>\
2657 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A>\
2658 <LI><A HREF=" MyPrefix "/VRMLtest>Watch a simple VRML scene</A>\
2659 </UL>"
2660 TopFooter = "</HTML>"
2661 srand()
2662 }
2663
2664The function `HandleGET' is a bit longer because it first computes the
2665maze and afterwards generates the VRML code that is sent across the
2666network. As shown in the STATIST example (*note STATIST::), we set the
2667type of the content to VRML and then store the VRML representation of
2668the maze as the page content. We assume that the maze is stored in a 2D
2669array. Initially, the maze consists of walls only. Then, we add an
2670entry and an exit to the maze and let the rest of the work be done by
2671the function `MakeMaze'. Now, only the wall fields are left in the
2672maze. By iterating over the these fields, we generate one line of VRML
2673code for each wall field.
2674
2675 function HandleGET() {
2676 if (MENU[2] == "AboutServer") {
2677 Document = "If your browser has a VRML 2 plugin,\
2678 this server shows you a simple VRML scene."
2679 } else if (MENU[2] == "VRMLtest") {
2680 XSIZE = YSIZE = 11 # initially, everything is wall
2681 for (y = 0; y < YSIZE; y++)
2682 for (x = 0; x < XSIZE; x++)
2683 Maze[x, y] = "#"
2684 delete Maze[0, 1] # entry is not wall
2685 delete Maze[XSIZE-1, YSIZE-2] # exit is not wall
2686 MakeMaze(1, 1)
2687 Document = "\
2688 #VRML V2.0 utf8\n\
2689 Group {\n\
2690 children [\n\
2691 PointLight {\n\
2692 ambientIntensity 0.2\n\
2693 color 0.7 0.7 0.7\n\
2694 location 0.0 8.0 10.0\n\
2695 }\n\
2696 DEF B1 Background {\n\
2697 skyColor [0 0 0, 1.0 1.0 1.0 ]\n\
2698 skyAngle 1.6\n\
2699 groundColor [1 1 1, 0.8 0.8 0.8, 0.2 0.2 0.2 ]\n\
2700 groundAngle [ 1.2 1.57 ]\n\
2701 }\n\
2702 DEF Wall Shape {\n\
2703 geometry Box {size 1 1 1}\n\
2704 appearance Appearance { material Material { diffuseColor 0 0 1 } }\n\
2705 }\n\
2706 DEF Entry Viewpoint {\n\
2707 position 0.5 1.0 5.0\n\
2708 orientation 0.0 0.0 -1.0 0.52\n\
2709 }\n"
2710 for (i in Maze) {
2711 split(i, t, SUBSEP)
2712 Document = Document " Transform { translation "
2713 Document = Document t[1] " 0 -" t[2] " children USE Wall }\n"
2714 }
2715 Document = Document " ] # end of group for world\n}"
2716 Reason = "OK" ORS "Content-type: model/vrml"
2717 Header = Footer = ""
2718 }
2719 }
2720
2721Finally, we have a look at `MakeMaze', the function that generates the
2722`Maze' array. When entered, this function assumes that the array has
2723been initialized so that each element represents a wall element and the
2724maze is initially full of wall elements. Only the entrance and the exit
2725of the maze should have been left free. The parameters of the function
2726tell us which element must be marked as not being a wall. After this,
2727we take a look at the four neighbouring elements and remember which we
2728have already treated. Of all the neighbouring elements, we take one at
2729random and walk in that direction. Therefore, the wall element in that
2730direction has to be removed and then, we call the function recursively
2731for that element. The maze is only completed if we iterate the above
2732procedure for _all_ neighbouring elements (in random order) and for our
2733present element by recursively calling the function for the present
2734element. This last iteration could have been done in a loop, but it is
2735done much simpler recursively.
2736
2737Notice that elements with coordinates that are both odd are assumed to
2738be on our way through the maze and the generating process cannot
2739terminate as long as there is such an element not being `delete'd. All
2740other elements are potentially part of the wall.
2741
2742 function MakeMaze(x, y) {
2743 delete Maze[x, y] # here we are, we have no wall here
2744 p = 0 # count unvisited fields in all directions
2745 if (x-2 SUBSEP y in Maze) d[p++] = "-x"
2746 if (x SUBSEP y-2 in Maze) d[p++] = "-y"
2747 if (x+2 SUBSEP y in Maze) d[p++] = "+x"
2748 if (x SUBSEP y+2 in Maze) d[p++] = "+y"
2749 if (p>0) { # if there are univisited fields, go there
2750 p = int(p*rand()) # choose one unvisited field at random
2751 if (d[p] == "-x") { delete Maze[x - 1, y]; MakeMaze(x - 2, y)
2752 } else if (d[p] == "-y") { delete Maze[x, y - 1]; MakeMaze(x, y - 2)
2753 } else if (d[p] == "+x") { delete Maze[x + 1, y]; MakeMaze(x + 2, y)
2754 } else if (d[p] == "+y") { delete Maze[x, y + 1]; MakeMaze(x, y + 2)
2755 } # we are back from recursion
2756 MakeMaze(x, y); # try again while there are unvisited fields
2757 }
2758 }
2759
2760
2761File: gawkinet.info, Node: MOBAGWHO, Next: STOXPRED, Prev: MAZE, Up: Some Applications and Techniques
2762
27633.8 MOBAGWHO: a Simple Mobile Agent
2764===================================
2765
2766 There are two ways of constructing a software design: One way is to
2767 make it so simple that there are obviously no deficiencies, and the
2768 other way is to make it so complicated that there are no obvious
2769 deficiencies.
2770 C. A. R. Hoare
2771
2772A "mobile agent" is a program that can be dispatched from a computer and
2773transported to a remote server for execution. This is called
2774"migration", which means that a process on another system is started
2775that is independent from its originator. Ideally, it wanders through a
2776network while working for its creator or owner. In places like the UMBC
2777Agent Web, people are quite confident that (mobile) agents are a
2778software engineering paradigm that enables us to significantly increase
2779the efficiency of our work. Mobile agents could become the mediators
2780between users and the networking world. For an unbiased view at this
2781technology, see the remarkable paper `Mobile Agents: Are they a good
2782idea?'.(1)
2783
2784When trying to migrate a process from one system to another, a server
2785process is needed on the receiving side. Depending on the kind of
2786server process, several ways of implementation come to mind. How the
2787process is implemented depends upon the kind of server process:
2788
2789 * HTTP can be used as the protocol for delivery of the migrating
2790 process. In this case, we use a common web server as the receiving
2791 server process. A universal CGI script mediates between migrating
2792 process and web server. Each server willing to accept migrating
2793 agents makes this universal service available. HTTP supplies the
2794 `POST' method to transfer some data to a file on the web server.
2795 When a CGI script is called remotely with the `POST' method
2796 instead of the usual `GET' method, data is transmitted from the
2797 client process to the standard input of the server's CGI script.
2798 So, to implement a mobile agent, we must not only write the agent
2799 program to start on the client side, but also the CGI script to
2800 receive the agent on the server side.
2801
2802 * The `PUT' method can also be used for migration. HTTP does not
2803 require a CGI script for migration via `PUT'. However, with common
2804 web servers there is no advantage to this solution, because web
2805 servers such as Apache require explicit activation of a special
2806 `PUT' script.
2807
2808 * `Agent Tcl' pursues a different course; it relies on a dedicated
2809 server process with a dedicated protocol specialized for receiving
2810 mobile agents.
2811
2812Our agent example abuses a common web server as a migration tool. So,
2813it needs a universal CGI script on the receiving side (the web server).
2814The receiving script is activated with a `POST' request when placed
2815into a location like `/httpd/cgi-bin/PostAgent.sh'. Make sure that the
2816server system uses a version of `gawk' that supports network access
2817(Version 3.1 or later; verify with `gawk --version').
2818
2819 #!/bin/sh
2820 MobAg=/tmp/MobileAgent.$$
2821 # direct script to mobile agent file
2822 cat > $MobAg
2823 # execute agent concurrently
2824 gawk -f $MobAg $MobAg > /dev/null &
2825 # HTTP header, terminator and body
2826 gawk 'BEGIN { print "\r\nAgent started" }'
2827 rm $MobAg # delete script file of agent
2828
2829By making its process id (`$$') part of the unique file name, the
2830script avoids conflicts between concurrent instances of the script.
2831First, all lines from standard input (the mobile agent's source code)
2832are copied into this unique file. Then, the agent is started as a
2833concurrent process and a short message reporting this fact is sent to
2834the submitting client. Finally, the script file of the mobile agent is
2835removed because it is no longer needed. Although it is a short script,
2836there are several noteworthy points:
2837
2838Security
2839 _There is none_. In fact, the CGI script should never be made
2840 available on a server that is part of the Internet because everyone
2841 would be allowed to execute arbitrary commands with it. This
2842 behavior is acceptable only when performing rapid prototyping.
2843
2844Self-Reference
2845 Each migrating instance of an agent is started in a way that
2846 enables it to read its own source code from standard input and use
2847 the code for subsequent migrations. This is necessary because it
2848 needs to treat the agent's code as data to transmit. `gawk' is not
2849 the ideal language for such a job. Lisp and Tcl are more suitable
2850 because they do not make a distinction between program code and
2851 data.
2852
2853Independence
2854 After migration, the agent is not linked to its former home in any
2855 way. By reporting `Agent started', it waves "Goodbye" to its
2856 origin. The originator may choose to terminate or not.
2857
2858The originating agent itself is started just like any other command-line
2859script, and reports the results on standard output. By letting the name
2860of the original host migrate with the agent, the agent that migrates to
2861a host far away from its origin can report the result back home.
2862Having arrived at the end of the journey, the agent establishes a
2863connection and reports the results. This is the reason for determining
2864the name of the host with `uname -n' and storing it in `MyOrigin' for
2865later use. We may also set variables with the `-v' option from the
2866command line. This interactivity is only of importance in the context
2867of starting a mobile agent; therefore this `BEGIN' pattern and its
2868action do not take part in migration:
2869
2870 BEGIN {
2871 if (ARGC != 2) {
2872 print "MOBAG - a simple mobile agent"
2873 print "CALL:\n gawk -f mobag.awk mobag.awk"
2874 print "IN:\n the name of this script as a command-line parameter"
2875 print "PARAM:\n -v MyOrigin=myhost.com"
2876 print "OUT:\n the result on stdout"
2877 print "JK 29.03.1998 01.04.1998"
2878 exit
2879 }
2880 if (MyOrigin == "") {
2881 "uname -n" | getline MyOrigin
2882 close("uname -n")
2883 }
2884 }
2885
2886Since `gawk' cannot manipulate and transmit parts of the program
2887directly, the source code is read and stored in strings. Therefore,
2888the program scans itself for the beginning and the ending of functions.
2889Each line in between is appended to the code string until the end of
2890the function has been reached. A special case is this part of the
2891program itself. It is not a function. Placing a similar framework
2892around it causes it to be treated like a function. Notice that this
2893mechanism works for all the functions of the source code, but it cannot
2894guarantee that the order of the functions is preserved during migration:
2895
2896 #ReadMySelf
2897 /^function / { FUNC = $2 }
2898 /^END/ || /^#ReadMySelf/ { FUNC = $1 }
2899 FUNC != "" { MOBFUN[FUNC] = MOBFUN[FUNC] RS $0 }
2900 (FUNC != "") && (/^}/ || /^#EndOfMySelf/) \
2901 { FUNC = "" }
2902 #EndOfMySelf
2903
2904The web server code in *Note A Web Service with Interaction:
2905Interacting Service, was first developed as a site-independent core.
2906Likewise, the `gawk'-based mobile agent starts with an
2907agent-independent core, to which can be appended application-dependent
2908functions. What follows is the only application-independent function
2909needed for the mobile agent:
2910
2911 function migrate(Destination, MobCode, Label) {
2912 MOBVAR["Label"] = Label
2913 MOBVAR["Destination"] = Destination
2914 RS = ORS = "\r\n"
2915 HttpService = "/inet/tcp/0/" Destination
2916 for (i in MOBFUN)
2917 MobCode = (MobCode "\n" MOBFUN[i])
2918 MobCode = MobCode "\n\nBEGIN {"
2919 for (i in MOBVAR)
2920 MobCode = (MobCode "\n MOBVAR[\"" i "\"] = \"" MOBVAR[i] "\"")
2921 MobCode = MobCode "\n}\n"
2922 print "POST /cgi-bin/PostAgent.sh HTTP/1.0" |& HttpService
2923 print "Content-length:", length(MobCode) ORS |& HttpService
2924 printf "%s", MobCode |& HttpService
2925 while ((HttpService |& getline) > 0)
2926 print $0
2927 close(HttpService)
2928 }
2929
2930The `migrate' function prepares the aforementioned strings containing
2931the program code and transmits them to a server. A consequence of this
2932modular approach is that the `migrate' function takes some parameters
2933that aren't needed in this application, but that will be in future
2934ones. Its mandatory parameter `Destination' holds the name (or IP
2935address) of the server that the agent wants as a host for its code. The
2936optional parameter `MobCode' may contain some `gawk' code that is
2937inserted during migration in front of all other code. The optional
2938parameter `Label' may contain a string that tells the agent what to do
2939in program execution after arrival at its new home site. One of the
2940serious obstacles in implementing a framework for mobile agents is that
2941it does not suffice to migrate the code. It is also necessary to
2942migrate the state of execution of the agent. In contrast to `Agent
2943Tcl', this program does not try to migrate the complete set of
2944variables. The following conventions are used:
2945
2946 * Each variable in an agent program is local to the current host and
2947 does _not_ migrate.
2948
2949 * The array `MOBFUN' shown above is an exception. It is handled by
2950 the function `migrate' and does migrate with the application.
2951
2952 * The other exception is the array `MOBVAR'. Each variable that
2953 takes part in migration has to be an element of this array.
2954 `migrate' also takes care of this.
2955
2956Now it's clear what happens to the `Label' parameter of the function
2957`migrate'. It is copied into `MOBVAR["Label"]' and travels alongside
2958the other data. Since travelling takes place via HTTP, records must be
2959separated with `"\r\n"' in `RS' and `ORS' as usual. The code assembly
2960for migration takes place in three steps:
2961
2962 * Iterate over `MOBFUN' to collect all functions verbatim.
2963
2964 * Prepare a `BEGIN' pattern and put assignments to mobile variables
2965 into the action part.
2966
2967 * Transmission itself resembles GETURL: the header with the request
2968 and the `Content-length' is followed by the body. In case there is
2969 any reply over the network, it is read completely and echoed to
2970 standard output to avoid irritating the server.
2971
2972The application-independent framework is now almost complete. What
2973follows is the `END' pattern that is executed when the mobile agent has
2974finished reading its own code. First, it checks whether it is already
2975running on a remote host or not. In case initialization has not yet
2976taken place, it starts `MyInit'. Otherwise (later, on a remote host), it
2977starts `MyJob':
2978
2979 END {
2980 if (ARGC != 2) exit # stop when called with wrong parameters
2981 if (MyOrigin != "") # is this the originating host?
2982 MyInit() # if so, initialize the application
2983 else # we are on a host with migrated data
2984 MyJob() # so we do our job
2985 }
2986
2987All that's left to extend the framework into a complete application is
2988to write two application-specific functions: `MyInit' and `MyJob'. Keep
2989in mind that the former is executed once on the originating host, while
2990the latter is executed after each migration:
2991
2992 function MyInit() {
2993 MOBVAR["MyOrigin"] = MyOrigin
2994 MOBVAR["Machines"] = "localhost/80 max/80 moritz/80 castor/80"
2995 split(MOBVAR["Machines"], Machines) # which host is the first?
2996 migrate(Machines[1], "", "") # go to the first host
2997 while (("/inet/tcp/8080/0/0" |& getline) > 0) # wait for result
2998 print $0 # print result
2999 close("/inet/tcp/8080/0/0")
3000 }
3001
3002As mentioned earlier, this agent takes the name of its origin
3003(`MyOrigin') with it. Then, it takes the name of its first destination
3004and goes there for further work. Notice that this name has the port
3005number of the web server appended to the name of the server, because
3006the function `migrate' needs it this way to create the `HttpService'
3007variable. Finally, it waits for the result to arrive. The `MyJob'
3008function runs on the remote host:
3009
3010 function MyJob() {
3011 # forget this host
3012 sub(MOBVAR["Destination"], "", MOBVAR["Machines"])
3013 MOBVAR["Result"]=MOBVAR["Result"] SUBSEP SUBSEP MOBVAR["Destination"] ":"
3014 while (("who" | getline) > 0) # who is logged in?
3015 MOBVAR["Result"] = MOBVAR["Result"] SUBSEP $0
3016 close("who")
3017 if (index(MOBVAR["Machines"], "/") > 0) { # any more machines to visit?
3018 split(MOBVAR["Machines"], Machines) # which host is next?
3019 migrate(Machines[1], "", "") # go there
3020 } else { # no more machines
3021 gsub(SUBSEP, "\n", MOBVAR["Result"]) # send result to origin
3022 print MOBVAR["Result"] |& "/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080"
3023 close("/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080")
3024 }
3025 }
3026
3027After migrating, the first thing to do in `MyJob' is to delete the name
3028of the current host from the list of hosts to visit. Now, it is time to
3029start the real work by appending the host's name to the result string,
3030and reading line by line who is logged in on this host. A very
3031annoying circumstance is the fact that the elements of `MOBVAR' cannot
3032hold the newline character (`"\n"'). If they did, migration of this
3033string did not work because the string didn't obey the syntax rule for
3034a string in `gawk'. `SUBSEP' is used as a temporary replacement. If
3035the list of hosts to visit holds at least one more entry, the agent
3036migrates to that place to go on working there. Otherwise, we replace
3037the `SUBSEP's with a newline character in the resulting string, and
3038report it to the originating host, whose name is stored in
3039`MOBVAR["MyOrigin"]'.
3040
3041---------- Footnotes ----------
3042
3043(1) `http://www.research.ibm.com/massive/mobag.ps'
3044
3045
3046File: gawkinet.info, Node: STOXPRED, Next: PROTBASE, Prev: MOBAGWHO, Up: Some Applications and Techniques
3047
30483.9 STOXPRED: Stock Market Prediction As A Service
3049==================================================
3050
3051 Far out in the uncharted backwaters of the unfashionable end of
3052 the Western Spiral arm of the Galaxy lies a small unregarded
3053 yellow sun.
3054
3055 Orbiting this at a distance of roughly ninety-two million miles is
3056 an utterly insignificant little blue-green planet whose
3057 ape-descendent life forms are so amazingly primitive that they
3058 still think digital watches are a pretty neat idea.
3059
3060 This planet has -- or rather had -- a problem, which was this:
3061 most of the people living on it were unhappy for pretty much of
3062 the time. Many solutions were suggested for this problem, but
3063 most of these were largely concerned with the movements of small
3064 green pieces of paper, which is odd because it wasn't the small
3065 green pieces of paper that were unhappy.
3066 Douglas Adams, `The Hitch Hiker's Guide to the Galaxy'
3067
3068Valuable services on the Internet are usually _not_ implemented as
3069mobile agents. There are much simpler ways of implementing services.
3070All Unix systems provide, for example, the `cron' service. Unix system
3071users can write a list of tasks to be done each day, each week, twice a
3072day, or just once. The list is entered into a file named `crontab'.
3073For example, to distribute a newsletter on a daily basis this way, use
3074`cron' for calling a script each day early in the morning.
3075
3076 # run at 8 am on weekdays, distribute the newsletter
3077 0 8 * * 1-5 $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
3078
3079The script first looks for interesting information on the Internet,
3080assembles it in a nice form and sends the results via email to the
3081customers.
3082
3083The following is an example of a primitive newsletter on stock market
3084prediction. It is a report which first tries to predict the change of
3085each share in the Dow Jones Industrial Index for the particular day.
3086Then it mentions some especially promising shares as well as some
3087shares which look remarkably bad on that day. The report ends with the
3088usual disclaimer which tells every child _not_ to try this at home and
3089hurt anybody.
3090
3091 Good morning Uncle Scrooge,
3092
3093 This is your daily stock market report for Monday, October 16, 2000.
3094 Here are the predictions for today:
3095
3096 AA neutral
3097 GE up
3098 JNJ down
3099 MSFT neutral
3100 ...
3101 UTX up
3102 DD down
3103 IBM up
3104 MO down
3105 WMT up
3106 DIS up
3107 INTC up
3108 MRK down
3109 XOM down
3110 EK down
3111 IP down
3112
3113 The most promising shares for today are these:
3114
3115 INTC http://biz.yahoo.com/n/i/intc.html
3116
3117 The stock shares to avoid today are these:
3118
3119 EK http://biz.yahoo.com/n/e/ek.html
3120 IP http://biz.yahoo.com/n/i/ip.html
3121 DD http://biz.yahoo.com/n/d/dd.html
3122 ...
3123
3124The script as a whole is rather long. In order to ease the pain of
3125studying other people's source code, we have broken the script up into
3126meaningful parts which are invoked one after the other. The basic
3127structure of the script is as follows:
3128
3129 BEGIN {
3130 Init()
3131 ReadQuotes()
3132 CleanUp()
3133 Prediction()
3134 Report()
3135 SendMail()
3136 }
3137
3138The earlier parts store data into variables and arrays which are
3139subsequently used by later parts of the script. The `Init' function
3140first checks if the script is invoked correctly (without any
3141parameters). If not, it informs the user of the correct usage. What
3142follows are preparations for the retrieval of the historical quote
3143data. The names of the 30 stock shares are stored in an array `name'
3144along with the current date in `day', `month', and `year'.
3145
3146All users who are separated from the Internet by a firewall and have to
3147direct their Internet accesses to a proxy must supply the name of the
3148proxy to this script with the `-v Proxy=NAME' option. For most users,
3149the default proxy and port number should suffice.
3150
3151 function Init() {
3152 if (ARGC != 1) {
3153 print "STOXPRED - daily stock share prediction"
3154 print "IN:\n no parameters, nothing on stdin"
3155 print "PARAM:\n -v Proxy=MyProxy -v ProxyPort=80"
3156 print "OUT:\n commented predictions as email"
3157 print "JK 09.10.2000"
3158 exit
3159 }
3160 # Remember ticker symbols from Dow Jones Industrial Index
3161 StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
3162 SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
3163 MRK XOM EK IP", name);
3164 # Remember the current date as the end of the time series
3165 day = strftime("%d")
3166 month = strftime("%m")
3167 year = strftime("%Y")
3168 if (Proxy == "") Proxy = "chart.yahoo.com"
3169 if (ProxyPort == 0) ProxyPort = 80
3170 YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
3171 }
3172
3173There are two really interesting parts in the script. One is the
3174function which reads the historical stock quotes from an Internet
3175server. The other is the one that does the actual prediction. In the
3176following function we see how the quotes are read from the Yahoo
3177server. The data which comes from the server is in CSV format
3178(comma-separated values):
3179
3180 Date,Open,High,Low,Close,Volume
3181 9-Oct-00,22.75,22.75,21.375,22.375,7888500
3182 6-Oct-00,23.8125,24.9375,21.5625,22,10701100
3183 5-Oct-00,24.4375,24.625,23.125,23.50,5810300
3184
3185Lines contain values of the same time instant, whereas columns are
3186separated by commas and contain the kind of data that is described in
3187the header (first) line. At first, `gawk' is instructed to separate
3188columns by commas (`FS = ","'). In the loop that follows, a connection
3189to the Yahoo server is first opened, then a download takes place, and
3190finally the connection is closed. All this happens once for each ticker
3191symbol. In the body of this loop, an Internet address is built up as a
3192string according to the rules of the Yahoo server. The starting and
3193ending date are chosen to be exactly the same, but one year apart in
3194the past. All the action is initiated within the `printf' command which
3195transmits the request for data to the Yahoo server.
3196
3197In the inner loop, the server's data is first read and then scanned
3198line by line. Only lines which have six columns and the name of a month
3199in the first column contain relevant data. This data is stored in the
3200two-dimensional array `quote'; one dimension being time, the other
3201being the ticker symbol. During retrieval of the first stock's data,
3202the calendar names of the time instances are stored in the array `day'
3203because we need them later.
3204
3205 function ReadQuotes() {
3206 # Retrieve historical data for each ticker symbol
3207 FS = ","
3208 for (stock = 1; stock <= StockCount; stock++) {
3209 URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
3210 "&a=" month "&b=" day "&c=" year-1 \
3211 "&d=" month "&e=" day "&f=" year \
3212 "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
3213 printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
3214 while ((YahooData |& getline) > 0) {
3215 if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) {
3216 if (stock == 1)
3217 days[++daycount] = $1;
3218 quote[$1, stock] = $5
3219 }
3220 }
3221 close(YahooData)
3222 }
3223 FS = " "
3224 }
3225
3226Now that we _have_ the data, it can be checked once again to make sure
3227that no individual stock is missing or invalid, and that all the stock
3228quotes are aligned correctly. Furthermore, we renumber the time
3229instances. The most recent day gets day number 1 and all other days get
3230consecutive numbers. All quotes are rounded toward the nearest whole
3231number in US Dollars.
3232
3233 function CleanUp() {
3234 # clean up time series; eliminate incomplete data sets
3235 for (d = 1; d <= daycount; d++) {
3236 for (stock = 1; stock <= StockCount; stock++)
3237 if (! ((days[d], stock) in quote))
3238 stock = StockCount + 10
3239 if (stock > StockCount + 1)
3240 continue
3241 datacount++
3242 for (stock = 1; stock <= StockCount; stock++)
3243 data[datacount, stock] = int(0.5 + quote[days[d], stock])
3244 }
3245 delete quote
3246 delete days
3247 }
3248
3249Now we have arrived at the second really interesting part of the whole
3250affair. What we present here is a very primitive prediction algorithm:
3251_If a stock fell yesterday, assume it will also fall today; if it rose
3252yesterday, assume it will rise today_. (Feel free to replace this
3253algorithm with a smarter one.) If a stock changed in the same direction
3254on two consecutive days, this is an indication which should be
3255highlighted. Two-day advances are stored in `hot' and two-day declines
3256in `avoid'.
3257
3258The rest of the function is a sanity check. It counts the number of
3259correct predictions in relation to the total number of predictions one
3260could have made in the year before.
3261
3262 function Prediction() {
3263 # Predict each ticker symbol by prolonging yesterday's trend
3264 for (stock = 1; stock <= StockCount; stock++) {
3265 if (data[1, stock] > data[2, stock]) {
3266 predict[stock] = "up"
3267 } else if (data[1, stock] < data[2, stock]) {
3268 predict[stock] = "down"
3269 } else {
3270 predict[stock] = "neutral"
3271 }
3272 if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
3273 hot[stock] = 1
3274 if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
3275 avoid[stock] = 1
3276 }
3277 # Do a plausibility check: how many predictions proved correct?
3278 for (s = 1; s <= StockCount; s++) {
3279 for (d = 1; d <= datacount-2; d++) {
3280 if (data[d+1, s] > data[d+2, s]) {
3281 UpCount++
3282 } else if (data[d+1, s] < data[d+2, s]) {
3283 DownCount++
3284 } else {
3285 NeutralCount++
3286 }
3287 if (((data[d, s] > data[d+1, s]) && (data[d+1, s] > data[d+2, s])) ||
3288 ((data[d, s] < data[d+1, s]) && (data[d+1, s] < data[d+2, s])) ||
3289 ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
3290 CorrectCount++
3291 }
3292 }
3293 }
3294
3295At this point the hard work has been done: the array `predict' contains
3296the predictions for all the ticker symbols. It is up to the function
3297`Report' to find some nice words to introduce the desired information.
3298
3299 function Report() {
3300 # Generate report
3301 report = "\nThis is your daily "
3302 report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
3303 report = report "Here are the predictions for today:\n\n"
3304 for (stock = 1; stock <= StockCount; stock++)
3305 report = report "\t" name[stock] "\t" predict[stock] "\n"
3306 for (stock in hot) {
3307 if (HotCount++ == 0)
3308 report = report "\nThe most promising shares for today are these:\n\n"
3309 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
3310 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
3311 }
3312 for (stock in avoid) {
3313 if (AvoidCount++ == 0)
3314 report = report "\nThe stock shares to avoid today are these:\n\n"
3315 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
3316 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
3317 }
3318 report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
3319 report = report " losers. When using this kind\nof prediction scheme for"
3320 report = report " the 12 months which lie behind us,\nwe get " UpCount
3321 report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
3322 report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
3323 report = report " predictions " CorrectCount " proved correct next day.\n"
3324 report = report "A success rate of "\
3325 int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
3326 report = report "Random choice would have produced a 33% success rate.\n"
3327 report = report "Disclaimer: Like every other prediction of the stock\n"
3328 report = report "market, this report is, of course, complete nonsense.\n"
3329 report = report "If you are stupid enough to believe these predictions\n"
3330 report = report "you should visit a doctor who can treat your ailment."
3331 }
3332
3333The function `SendMail' goes through the list of customers and opens a
3334pipe to the `mail' command for each of them. Each one receives an email
3335message with a proper subject heading and is addressed with his full
3336name.
3337
3338 function SendMail() {
3339 # send report to customers
3340 customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge"
3341 customer["more@utopia.org" ] = "Sir Thomas More"
3342 customer["spinoza@denhaag.nl" ] = "Baruch de Spinoza"
3343 customer["marx@highgate.uk" ] = "Karl Marx"
3344 customer["keynes@the.long.run" ] = "John Maynard Keynes"
3345 customer["bierce@devil.hell.org" ] = "Ambrose Bierce"
3346 customer["laplace@paris.fr" ] = "Pierre Simon de Laplace"
3347 for (c in customer) {
3348 MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
3349 print "Good morning " customer[c] "," | MailPipe
3350 print report "\n.\n" | MailPipe
3351 close(MailPipe)
3352 }
3353 }
3354
3355Be patient when running the script by hand. Retrieving the data for
3356all the ticker symbols and sending the emails may take several minutes
3357to complete, depending upon network traffic and the speed of the
3358available Internet link. The quality of the prediction algorithm is
3359likely to be disappointing. Try to find a better one. Should you find
3360one with a success rate of more than 50%, please tell us about it! It
3361is only for the sake of curiosity, of course. `:-)'
3362
3363
3364File: gawkinet.info, Node: PROTBASE, Prev: STOXPRED, Up: Some Applications and Techniques
3365
33663.10 PROTBASE: Searching Through A Protein Database
3367===================================================
3368
3369 Hoare's Law of Large Problems: Inside every large problem is a
3370 small problem struggling to get out.
3371
3372Yahoo's database of stock market data is just one among the many large
3373databases on the Internet. Another one is located at NCBI (National
3374Center for Biotechnology Information). Established in 1988 as a
3375national resource for molecular biology information, NCBI creates
3376public databases, conducts research in computational biology, develops
3377software tools for analyzing genome data, and disseminates biomedical
3378information. In this section, we look at one of NCBI's public services,
3379which is called BLAST (Basic Local Alignment Search Tool).
3380
3381You probably know that the information necessary for reproducing living
3382cells is encoded in the genetic material of the cells. The genetic
3383material is a very long chain of four base nucleotides. It is the order
3384of appearance (the sequence) of nucleotides which contains the
3385information about the substance to be produced. Scientists in
3386biotechnology often find a specific fragment, determine the nucleotide
3387sequence, and need to know where the sequence at hand comes from. This
3388is where the large databases enter the game. At NCBI, databases store
3389the knowledge about which sequences have ever been found and where they
3390have been found. When the scientist sends his sequence to the BLAST
3391service, the server looks for regions of genetic material in its
3392database which look the most similar to the delivered nucleotide
3393sequence. After a search time of some seconds or minutes the server
3394sends an answer to the scientist. In order to make access simple, NCBI
3395chose to offer their database service through popular Internet
3396protocols. There are four basic ways to use the so-called BLAST
3397services:
3398
3399 * The easiest way to use BLAST is through the web. Users may simply
3400 point their browsers at the NCBI home page and link to the BLAST
3401 pages. NCBI provides a stable URL that may be used to perform
3402 BLAST searches without interactive use of a web browser. This is
3403 what we will do later in this section. A demonstration client and
3404 a `README' file demonstrate how to access this URL.
3405
3406 * Currently, `blastcl3' is the standard network BLAST client. You
3407 can download `blastcl3' from the anonymous FTP location.
3408
3409 * BLAST 2.0 can be run locally as a full executable and can be used
3410 to run BLAST searches against private local databases, or
3411 downloaded copies of the NCBI databases. BLAST 2.0 executables may
3412 be found on the NCBI anonymous FTP server.
3413
3414 * The NCBI BLAST Email server is the best option for people without
3415 convenient access to the web. A similarity search can be performed
3416 by sending a properly formatted mail message containing the
3417 nucleotide or protein query sequence to <blast@ncbi.nlm.nih.gov>.
3418 The query sequence is compared against the specified database
3419 using the BLAST algorithm and the results are returned in an email
3420 message. For more information on formulating email BLAST searches,
3421 you can send a message consisting of the word "HELP" to the same
3422 address, <blast@ncbi.nlm.nih.gov>.
3423
3424Our starting point is the demonstration client mentioned in the first
3425option. The `README' file that comes along with the client explains
3426the whole process in a nutshell. In the rest of this section, we first
3427show what such requests look like. Then we show how to use `gawk' to
3428implement a client in about 10 lines of code. Finally, we show how to
3429interpret the result returned from the service.
3430
3431Sequences are expected to be represented in the standard IUB/IUPAC
3432amino acid and nucleic acid codes, with these exceptions: lower-case
3433letters are accepted and are mapped into upper-case; a single hyphen or
3434dash can be used to represent a gap of indeterminate length; and in
3435amino acid sequences, `U' and `*' are acceptable letters (see below).
3436Before submitting a request, any numerical digits in the query sequence
3437should either be removed or replaced by appropriate letter codes (e.g.,
3438`N' for unknown nucleic acid residue or `X' for unknown amino acid
3439residue). The nucleic acid codes supported are:
3440
3441 A --> adenosine M --> A C (amino)
3442 C --> cytidine S --> G C (strong)
3443 G --> guanine W --> A T (weak)
3444 T --> thymidine B --> G T C
3445 U --> uridine D --> G A T
3446 R --> G A (purine) H --> A C T
3447 Y --> T C (pyrimidine) V --> G C A
3448 K --> G T (keto) N --> A G C T (any)
3449 - gap of indeterminate length
3450
3451Now you know the alphabet of nucleotide sequences. The last two lines
3452of the following example query show you such a sequence, which is
3453obviously made up only of elements of the alphabet just described.
3454Store this example query into a file named `protbase.request'. You are
3455now ready to send it to the server with the demonstration client.
3456
3457 PROGRAM blastn
3458 DATALIB month
3459 EXPECT 0.75
3460 BEGIN
3461 >GAWK310 the gawking gene GNU AWK
3462 tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat
3463 caccaccatggacagcaaa
3464
3465The actual search request begins with the mandatory parameter `PROGRAM'
3466in the first column followed by the value `blastn' (the name of the
3467program) for searching nucleic acids. The next line contains the
3468mandatory search parameter `DATALIB' with the value `month' for the
3469newest nucleic acid sequences. The third line contains an optional
3470`EXPECT' parameter and the value desired for it. The fourth line
3471contains the mandatory `BEGIN' directive, followed by the query
3472sequence in FASTA/Pearson format. Each line of information must be
3473less than 80 characters in length.
3474
3475The "month" database contains all new or revised sequences released in
3476the last 30 days and is useful for searching against new sequences.
3477There are five different blast programs, `blastn' being the one that
3478compares a nucleotide query sequence against a nucleotide sequence
3479database.
3480
3481The last server directive that must appear in every request is the
3482`BEGIN' directive. The query sequence should immediately follow the
3483`BEGIN' directive and must appear in FASTA/Pearson format. A sequence
3484in FASTA/Pearson format begins with a single-line description. The
3485description line, which is required, is distinguished from the lines of
3486sequence data that follow it by having a greater-than (`>') symbol in
3487the first column. For the purposes of the BLAST server, the text of
3488the description is arbitrary.
3489
3490If you prefer to use a client written in `gawk', just store the
3491following 10 lines of code into a file named `protbase.awk' and use
3492this client instead. Invoke it with `gawk -f protbase.awk
3493protbase.request'. Then wait a minute and watch the result coming in.
3494In order to replicate the demonstration client's behaviour as closely
3495as possible, this client does not use a proxy server. We could also
3496have extended the client program in *Note Retrieving Web Pages: GETURL,
3497to implement the client request from `protbase.awk' as a special case.
3498
3499 { request = request "\n" $0 }
3500
3501 END {
3502 BLASTService = "/inet/tcp/0/www.ncbi.nlm.nih.gov/80"
3503 printf "POST /cgi-bin/BLAST/nph-blast_report HTTP/1.0\n" |& BLASTService
3504 printf "Content-Length: " length(request) "\n\n" |& BLASTService
3505 printf request |& BLASTService
3506 while ((BLASTService |& getline) > 0)
3507 print $0
3508 close(BLASTService)
3509 }
3510
3511The demonstration client from NCBI is 214 lines long (written in C) and
3512it is not immediately obvious what it does. Our client is so short that
3513it _is_ obvious what it does. First it loops over all lines of the
3514query and stores the whole query into a variable. Then the script
3515establishes an Internet connection to the NCBI server and transmits the
3516query by framing it with a proper HTTP request. Finally it receives and
3517prints the complete result coming from the server.
3518
3519Now, let us look at the result. It begins with an HTTP header, which you
3520can ignore. Then there are some comments about the query having been
3521filtered to avoid spuriously high scores. After this, there is a
3522reference to the paper that describes the software being used for
3523searching the data base. After a repitition of the original query's
3524description we find the list of significant alignments:
3525
3526 Sequences producing significant alignments: (bits) Value
3527
3528 gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733... 38 0.20
3529 gb|AC021056.12|AC021056 Homo sapiens chromosome 3 clone RP11-115... 38 0.20
3530 emb|AL160278.10|AL160278 Homo sapiens chromosome 9 clone RP11-57... 38 0.20
3531 emb|AL391139.11|AL391139 Homo sapiens chromosome X clone RP11-35... 38 0.20
3532 emb|AL365192.6|AL365192 Homo sapiens chromosome 6 clone RP3-421H... 38 0.20
3533 emb|AL138812.9|AL138812 Homo sapiens chromosome 11 clone RP1-276... 38 0.20
3534 gb|AC073881.3|AC073881 Homo sapiens chromosome 15 clone CTD-2169... 38 0.20
3535
3536This means that the query sequence was found in seven human chromosomes.
3537But the value 0.20 (20%) means that the probability of an accidental
3538match is rather high (20%) in all cases and should be taken into
3539account. You may wonder what the first column means. It is a key to
3540the specific database in which this occurence was found. The unique
3541sequence identifiers reported in the search results can be used as
3542sequence retrieval keys via the NCBI server. The syntax of sequence
3543header lines used by the NCBI BLAST server depends on the database from
3544which each sequence was obtained. The table below lists the
3545identifiers for the databases from which the sequences were derived.
3546
3547 Database Name Identifier Syntax
3548 ============================ ========================
3549 GenBank gb|accession|locus
3550 EMBL Data Library emb|accession|locus
3551 DDBJ, DNA Database of Japan dbj|accession|locus
3552 NBRF PIR pir||entry
3553 Protein Research Foundation prf||name
3554 SWISS-PROT sp|accession|entry name
3555 Brookhaven Protein Data Bank pdb|entry|chain
3556 Kabat's Sequences of Immuno... gnl|kabat|identifier
3557 Patents pat|country|number
3558 GenInfo Backbone Id bbs|number
3559
3560For example, an identifier might be `gb|AC021182.14|AC021182', where the
3561`gb' tag indicates that the identifier refers to a GenBank sequence,
3562`AC021182.14' is its GenBank ACCESSION, and `AC021182' is the GenBank
3563LOCUS. The identifier contains no spaces, so that a space indicates
3564the end of the identifier.
3565
3566Let us continue in the result listing. Each of the seven alignments
3567mentioned above is subsequently described in detail. We will have a
3568closer look at the first of them.
3569
3570 >gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733N23, WORKING DRAFT SEQUENCE, 4
3571 unordered pieces
3572 Length = 176383
3573
3574 Score = 38.2 bits (19), Expect = 0.20
3575 Identities = 19/19 (100%)
3576 Strand = Plus / Plus
3577
3578 Query: 35 tggtgaagtgtgtttcttg 53
3579 |||||||||||||||||||
3580 Sbjct: 69786 tggtgaagtgtgtttcttg 69804
3581
3582This alignment was located on the human chromosome 7. The fragment on
3583which part of the query was found had a total length of 176383. Only 19
3584of the nucleotides matched and the matching sequence ran from character
358535 to 53 in the query sequence and from 69786 to 69804 in the fragment
3586on chromosome 7. If you are still reading at this point, you are
3587probably interested in finding out more about Computational Biology and
3588you might appreciate the following hints.
3589
3590 1. There is a book called `Introduction to Computational Biology' by
3591 Michael S. Waterman, which is worth reading if you are seriously
3592 interested. You can find a good book review on the Internet.
3593
3594 2. While Waterman's book can explain to you the algorithms employed
3595 internally in the database search engines, most practicioners
3596 prefer to approach the subject differently. The applied side of
3597 Computational Biology is called Bioinformatics, and emphasizes the
3598 tools available for day-to-day work as well as how to actually
3599 _use_ them. One of the very few affordable books on Bioinformatics
3600 is `Developing Bioinformatics Computer Skills'.
3601
3602 3. The sequences _gawk_ and _gnuawk_ are in widespread use in the
3603 genetic material of virtually every earthly living being. Let us
3604 take this as a clear indication that the divine creator has
3605 intended `gawk' to prevail over other scripting languages such as
3606 `perl', `tcl', or `python' which are not even proper sequences.
3607 (:-)
3608
3609
3610File: gawkinet.info, Node: Links, Next: GNU Free Documentation License, Prev: Some Applications and Techniques, Up: Top
3611
36124 Related Links
3613***************
3614
3615This section lists the URLs for various items discussed in this major
3616node. They are presented in the order in which they appear.
3617
3618`Internet Programming with Python'
3619 `http://www.fsbassociates.com/books/python.htm'
3620
3621`Advanced Perl Programming'
3622 `http://www.oreilly.com/catalog/advperl'
3623
3624`Web Client Programming with Perl'
3625 `http://www.oreilly.com/catalog/webclient'
3626
3627Richard Stevens's home page and book
3628 `http://www.kohala.com/~rstevens'
3629
3630The SPAK home page
3631 `http://www.userfriendly.net/linux/RPM/contrib/libc6/i386/spak-0.6b-1.i386.html'
3632
3633Volume III of `Internetworking with TCP/IP', by Comer and Stevens
3634 `http://www.cs.purdue.edu/homes/dec/tcpip3s.cont.html'
3635
3636XBM Graphics File Format
3637 `http://www.wotsit.org/download.asp?f=xbm'
3638
3639GNUPlot
3640 `http://www.cs.dartmouth.edu/gnuplot_info.html'
3641
3642Mark Humphrys' Eliza page
3643 `http://www.compapp.dcu.ie/~humphrys/eliza.html'
3644
3645Yahoo! Eliza Information
3646 `http://dir.yahoo.com/Recreation/Games/Computer_Games/Internet_Games/Web_Games/Artificial_Intelligence'
3647
3648Java versions of Eliza
3649 `http://www.tjhsst.edu/Psych/ch1/eliza.html'
3650
3651Java versions of Eliza with source code
3652 `http://home.adelphia.net/~lifeisgood/eliza/eliza.htm'
3653
3654Eliza Programs with Explanations
3655 `http://chayden.net/chayden/eliza/Eliza.shtml'
3656
3657Loebner Contest
3658 `http://acm.org/~loebner/loebner-prize.htmlx'
3659
3660Tck/Tk Information
3661 `http://www.scriptics.com/'
3662
3663Intel 80x86 Processors
3664 `http://developer.intel.com/design/platform/embedpc/what_is.htm'
3665
3666AMD Elan Processors
3667 `http://www.amd.com/products/epd/processors/4.32bitcont/32bitcont/index.html'
3668
3669XINU
3670 `http://willow.canberra.edu.au/~chrisc/xinu.html'
3671
3672GNU/Linux
3673 `http://uclinux.lineo.com/'
3674
3675Embedded PCs
3676 `http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Computers/Hardware/Embedded_Control/'
3677
3678MiniSQL
3679 `http://www.hughes.com.au/library/'
3680
3681Market Share Surveys
3682 `http://www.netcraft.com/survey'
3683
3684`Numerical Recipes in C: The Art of Scientific Computing'
3685 `http://www.nr.com'
3686
3687VRML
3688 `http://www.vrml.org'
3689
3690The VRML FAQ
3691 `http://www.vrml.org/technicalinfo/specifications/specifications.htm#FAQ'
3692
3693The UMBC Agent Web
3694 `http://www.cs.umbc.edu/agents'
3695
3696Apache Web Server
3697 `http://www.apache.org'
3698
3699National Center for Biotechnology Information (NCBI)
3700 `http://www.ncbi.nlm.nih.gov'
3701
3702Basic Local Alignment Search Tool (BLAST)
3703 `http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html'
3704
3705NCBI Home Page
3706 `http://www.ncbi.nlm.nih.gov'
3707
3708BLAST Pages
3709 `http://www.ncbi.nlm.nih.gov/BLAST'
3710
3711BLAST Demonstration Client
3712 `ftp://ncbi.nlm.nih.gov/blast/blasturl/'
3713
3714BLAST anonymous FTP location
3715 `ftp://ncbi.nlm.nih.gov/blast/network/netblast/'
3716
3717BLAST 2.0 Executables
3718 `ftp://ncbi.nlm.nih.gov/blast/executables/'
3719
3720IUB/IUPAC Amino Acid and Nucleic Acid Codes
3721 `http://www.uthscsa.edu/geninfo/blastmail.html#item6'
3722
3723FASTA/Pearson Format
3724 `http://www.ncbi.nlm.nih.gov/BLAST/fasta.html'
3725
3726Fasta/Pearson Sequence in Java
3727 `http://www.kazusa.or.jp/java/codon_table_java/'
3728
3729Book Review of `Introduction to Computational Biology'
3730 `http://www.acm.org/crossroads/xrds5-1/introcb.html'
3731
3732`Developing Bioinformatics Computer Skills'
3733 `http://www.oreilly.com/catalog/bioskills/'
3734
3735
3736
3737File: gawkinet.info, Node: GNU Free Documentation License, Next: Index, Prev: Links, Up: Top
3738
3739GNU Free Documentation License
3740******************************
3741
3742 Version 1.2, November 2002
3743 Copyright (C) 2000,2001,2002 Free Software Foundation, Inc.
3744 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
3745
3746 Everyone is permitted to copy and distribute verbatim copies
3747 of this license document, but changing it is not allowed.
3748
3749 0. PREAMBLE
3750
3751 The purpose of this License is to make a manual, textbook, or other
3752 functional and useful document "free" in the sense of freedom: to
3753 assure everyone the effective freedom to copy and redistribute it,
3754 with or without modifying it, either commercially or
3755 noncommercially. Secondarily, this License preserves for the
3756 author and publisher a way to get credit for their work, while not
3757 being considered responsible for modifications made by others.
3758
3759 This License is a kind of "copyleft", which means that derivative
3760 works of the document must themselves be free in the same sense.
3761 It complements the GNU General Public License, which is a copyleft
3762 license designed for free software.
3763
3764 We have designed this License in order to use it for manuals for
3765 free software, because free software needs free documentation: a
3766 free program should come with manuals providing the same freedoms
3767 that the software does. But this License is not limited to
3768 software manuals; it can be used for any textual work, regardless
3769 of subject matter or whether it is published as a printed book.
3770 We recommend this License principally for works whose purpose is
3771 instruction or reference.
3772
3773 1. APPLICABILITY AND DEFINITIONS
3774
3775 This License applies to any manual or other work, in any medium,
3776 that contains a notice placed by the copyright holder saying it
3777 can be distributed under the terms of this License. Such a notice
3778 grants a world-wide, royalty-free license, unlimited in duration,
3779 to use that work under the conditions stated herein. The
3780 "Document", below, refers to any such manual or work. Any member
3781 of the public is a licensee, and is addressed as "you". You
3782 accept the license if you copy, modify or distribute the work in a
3783 way requiring permission under copyright law.
3784
3785 A "Modified Version" of the Document means any work containing the
3786 Document or a portion of it, either copied verbatim, or with
3787 modifications and/or translated into another language.
3788
3789 A "Secondary Section" is a named appendix or a front-matter section
3790 of the Document that deals exclusively with the relationship of the
3791 publishers or authors of the Document to the Document's overall
3792 subject (or to related matters) and contains nothing that could
3793 fall directly within that overall subject. (Thus, if the Document
3794 is in part a textbook of mathematics, a Secondary Section may not
3795 explain any mathematics.) The relationship could be a matter of
3796 historical connection with the subject or with related matters, or
3797 of legal, commercial, philosophical, ethical or political position
3798 regarding them.
3799
3800 The "Invariant Sections" are certain Secondary Sections whose
3801 titles are designated, as being those of Invariant Sections, in
3802 the notice that says that the Document is released under this
3803 License. If a section does not fit the above definition of
3804 Secondary then it is not allowed to be designated as Invariant.
3805 The Document may contain zero Invariant Sections. If the Document
3806 does not identify any Invariant Sections then there are none.
3807
3808 The "Cover Texts" are certain short passages of text that are
3809 listed, as Front-Cover Texts or Back-Cover Texts, in the notice
3810 that says that the Document is released under this License. A
3811 Front-Cover Text may be at most 5 words, and a Back-Cover Text may
3812 be at most 25 words.
3813
3814 A "Transparent" copy of the Document means a machine-readable copy,
3815 represented in a format whose specification is available to the
3816 general public, that is suitable for revising the document
3817 straightforwardly with generic text editors or (for images
3818 composed of pixels) generic paint programs or (for drawings) some
3819 widely available drawing editor, and that is suitable for input to
3820 text formatters or for automatic translation to a variety of
3821 formats suitable for input to text formatters. A copy made in an
3822 otherwise Transparent file format whose markup, or absence of
3823 markup, has been arranged to thwart or discourage subsequent
3824 modification by readers is not Transparent. An image format is
3825 not Transparent if used for any substantial amount of text. A
3826 copy that is not "Transparent" is called "Opaque".
3827
3828 Examples of suitable formats for Transparent copies include plain
3829 ASCII without markup, Texinfo input format, LaTeX input format,
3830 SGML or XML using a publicly available DTD, and
3831 standard-conforming simple HTML, PostScript or PDF designed for
3832 human modification. Examples of transparent image formats include
3833 PNG, XCF and JPG. Opaque formats include proprietary formats that
3834 can be read and edited only by proprietary word processors, SGML or
3835 XML for which the DTD and/or processing tools are not generally
3836 available, and the machine-generated HTML, PostScript or PDF
3837 produced by some word processors for output purposes only.
3838
3839 The "Title Page" means, for a printed book, the title page itself,
3840 plus such following pages as are needed to hold, legibly, the
3841 material this License requires to appear in the title page. For
3842 works in formats which do not have any title page as such, "Title
3843 Page" means the text near the most prominent appearance of the
3844 work's title, preceding the beginning of the body of the text.
3845
3846 A section "Entitled XYZ" means a named subunit of the Document
3847 whose title either is precisely XYZ or contains XYZ in parentheses
3848 following text that translates XYZ in another language. (Here XYZ
3849 stands for a specific section name mentioned below, such as
3850 "Acknowledgements", "Dedications", "Endorsements", or "History".)
3851 To "Preserve the Title" of such a section when you modify the
3852 Document means that it remains a section "Entitled XYZ" according
3853 to this definition.
3854
3855 The Document may include Warranty Disclaimers next to the notice
3856 which states that this License applies to the Document. These
3857 Warranty Disclaimers are considered to be included by reference in
3858 this License, but only as regards disclaiming warranties: any other
3859 implication that these Warranty Disclaimers may have is void and
3860 has no effect on the meaning of this License.
3861
3862 2. VERBATIM COPYING
3863
3864 You may copy and distribute the Document in any medium, either
3865 commercially or noncommercially, provided that this License, the
3866 copyright notices, and the license notice saying this License
3867 applies to the Document are reproduced in all copies, and that you
3868 add no other conditions whatsoever to those of this License. You
3869 may not use technical measures to obstruct or control the reading
3870 or further copying of the copies you make or distribute. However,
3871 you may accept compensation in exchange for copies. If you
3872 distribute a large enough number of copies you must also follow
3873 the conditions in section 3.
3874
3875 You may also lend copies, under the same conditions stated above,
3876 and you may publicly display copies.
3877
3878 3. COPYING IN QUANTITY
3879
3880 If you publish printed copies (or copies in media that commonly
3881 have printed covers) of the Document, numbering more than 100, and
3882 the Document's license notice requires Cover Texts, you must
3883 enclose the copies in covers that carry, clearly and legibly, all
3884 these Cover Texts: Front-Cover Texts on the front cover, and
3885 Back-Cover Texts on the back cover. Both covers must also clearly
3886 and legibly identify you as the publisher of these copies. The
3887 front cover must present the full title with all words of the
3888 title equally prominent and visible. You may add other material
3889 on the covers in addition. Copying with changes limited to the
3890 covers, as long as they preserve the title of the Document and
3891 satisfy these conditions, can be treated as verbatim copying in
3892 other respects.
3893
3894 If the required texts for either cover are too voluminous to fit
3895 legibly, you should put the first ones listed (as many as fit
3896 reasonably) on the actual cover, and continue the rest onto
3897 adjacent pages.
3898
3899 If you publish or distribute Opaque copies of the Document
3900 numbering more than 100, you must either include a
3901 machine-readable Transparent copy along with each Opaque copy, or
3902 state in or with each Opaque copy a computer-network location from
3903 which the general network-using public has access to download
3904 using public-standard network protocols a complete Transparent
3905 copy of the Document, free of added material. If you use the
3906 latter option, you must take reasonably prudent steps, when you
3907 begin distribution of Opaque copies in quantity, to ensure that
3908 this Transparent copy will remain thus accessible at the stated
3909 location until at least one year after the last time you
3910 distribute an Opaque copy (directly or through your agents or
3911 retailers) of that edition to the public.
3912
3913 It is requested, but not required, that you contact the authors of
3914 the Document well before redistributing any large number of
3915 copies, to give them a chance to provide you with an updated
3916 version of the Document.
3917
3918 4. MODIFICATIONS
3919
3920 You may copy and distribute a Modified Version of the Document
3921 under the conditions of sections 2 and 3 above, provided that you
3922 release the Modified Version under precisely this License, with
3923 the Modified Version filling the role of the Document, thus
3924 licensing distribution and modification of the Modified Version to
3925 whoever possesses a copy of it. In addition, you must do these
3926 things in the Modified Version:
3927
3928 A. Use in the Title Page (and on the covers, if any) a title
3929 distinct from that of the Document, and from those of
3930 previous versions (which should, if there were any, be listed
3931 in the History section of the Document). You may use the
3932 same title as a previous version if the original publisher of
3933 that version gives permission.
3934
3935 B. List on the Title Page, as authors, one or more persons or
3936 entities responsible for authorship of the modifications in
3937 the Modified Version, together with at least five of the
3938 principal authors of the Document (all of its principal
3939 authors, if it has fewer than five), unless they release you
3940 from this requirement.
3941
3942 C. State on the Title page the name of the publisher of the
3943 Modified Version, as the publisher.
3944
3945 D. Preserve all the copyright notices of the Document.
3946
3947 E. Add an appropriate copyright notice for your modifications
3948 adjacent to the other copyright notices.
3949
3950 F. Include, immediately after the copyright notices, a license
3951 notice giving the public permission to use the Modified
3952 Version under the terms of this License, in the form shown in
3953 the Addendum below.
3954
3955 G. Preserve in that license notice the full lists of Invariant
3956 Sections and required Cover Texts given in the Document's
3957 license notice.
3958
3959 H. Include an unaltered copy of this License.
3960
3961 I. Preserve the section Entitled "History", Preserve its Title,
3962 and add to it an item stating at least the title, year, new
3963 authors, and publisher of the Modified Version as given on
3964 the Title Page. If there is no section Entitled "History" in
3965 the Document, create one stating the title, year, authors,
3966 and publisher of the Document as given on its Title Page,
3967 then add an item describing the Modified Version as stated in
3968 the previous sentence.
3969
3970 J. Preserve the network location, if any, given in the Document
3971 for public access to a Transparent copy of the Document, and
3972 likewise the network locations given in the Document for
3973 previous versions it was based on. These may be placed in
3974 the "History" section. You may omit a network location for a
3975 work that was published at least four years before the
3976 Document itself, or if the original publisher of the version
3977 it refers to gives permission.
3978
3979 K. For any section Entitled "Acknowledgements" or "Dedications",
3980 Preserve the Title of the section, and preserve in the
3981 section all the substance and tone of each of the contributor
3982 acknowledgements and/or dedications given therein.
3983
3984 L. Preserve all the Invariant Sections of the Document,
3985 unaltered in their text and in their titles. Section numbers
3986 or the equivalent are not considered part of the section
3987 titles.
3988
3989 M. Delete any section Entitled "Endorsements". Such a section
3990 may not be included in the Modified Version.
3991
3992 N. Do not retitle any existing section to be Entitled
3993 "Endorsements" or to conflict in title with any Invariant
3994 Section.
3995
3996 O. Preserve any Warranty Disclaimers.
3997
3998 If the Modified Version includes new front-matter sections or
3999 appendices that qualify as Secondary Sections and contain no
4000 material copied from the Document, you may at your option
4001 designate some or all of these sections as invariant. To do this,
4002 add their titles to the list of Invariant Sections in the Modified
4003 Version's license notice. These titles must be distinct from any
4004 other section titles.
4005
4006 You may add a section Entitled "Endorsements", provided it contains
4007 nothing but endorsements of your Modified Version by various
4008 parties--for example, statements of peer review or that the text
4009 has been approved by an organization as the authoritative
4010 definition of a standard.
4011
4012 You may add a passage of up to five words as a Front-Cover Text,
4013 and a passage of up to 25 words as a Back-Cover Text, to the end
4014 of the list of Cover Texts in the Modified Version. Only one
4015 passage of Front-Cover Text and one of Back-Cover Text may be
4016 added by (or through arrangements made by) any one entity. If the
4017 Document already includes a cover text for the same cover,
4018 previously added by you or by arrangement made by the same entity
4019 you are acting on behalf of, you may not add another; but you may
4020 replace the old one, on explicit permission from the previous
4021 publisher that added the old one.
4022
4023 The author(s) and publisher(s) of the Document do not by this
4024 License give permission to use their names for publicity for or to
4025 assert or imply endorsement of any Modified Version.
4026
4027 5. COMBINING DOCUMENTS
4028
4029 You may combine the Document with other documents released under
4030 this License, under the terms defined in section 4 above for
4031 modified versions, provided that you include in the combination
4032 all of the Invariant Sections of all of the original documents,
4033 unmodified, and list them all as Invariant Sections of your
4034 combined work in its license notice, and that you preserve all
4035 their Warranty Disclaimers.
4036
4037 The combined work need only contain one copy of this License, and
4038 multiple identical Invariant Sections may be replaced with a single
4039 copy. If there are multiple Invariant Sections with the same name
4040 but different contents, make the title of each such section unique
4041 by adding at the end of it, in parentheses, the name of the
4042 original author or publisher of that section if known, or else a
4043 unique number. Make the same adjustment to the section titles in
4044 the list of Invariant Sections in the license notice of the
4045 combined work.
4046
4047 In the combination, you must combine any sections Entitled
4048 "History" in the various original documents, forming one section
4049 Entitled "History"; likewise combine any sections Entitled
4050 "Acknowledgements", and any sections Entitled "Dedications". You
4051 must delete all sections Entitled "Endorsements."
4052
4053 6. COLLECTIONS OF DOCUMENTS
4054
4055 You may make a collection consisting of the Document and other
4056 documents released under this License, and replace the individual
4057 copies of this License in the various documents with a single copy
4058 that is included in the collection, provided that you follow the
4059 rules of this License for verbatim copying of each of the
4060 documents in all other respects.
4061
4062 You may extract a single document from such a collection, and
4063 distribute it individually under this License, provided you insert
4064 a copy of this License into the extracted document, and follow
4065 this License in all other respects regarding verbatim copying of
4066 that document.
4067
4068 7. AGGREGATION WITH INDEPENDENT WORKS
4069
4070 A compilation of the Document or its derivatives with other
4071 separate and independent documents or works, in or on a volume of
4072 a storage or distribution medium, is called an "aggregate" if the
4073 copyright resulting from the compilation is not used to limit the
4074 legal rights of the compilation's users beyond what the individual
4075 works permit. When the Document is included an aggregate, this
4076 License does not apply to the other works in the aggregate which
4077 are not themselves derivative works of the Document.
4078
4079 If the Cover Text requirement of section 3 is applicable to these
4080 copies of the Document, then if the Document is less than one half
4081 of the entire aggregate, the Document's Cover Texts may be placed
4082 on covers that bracket the Document within the aggregate, or the
4083 electronic equivalent of covers if the Document is in electronic
4084 form. Otherwise they must appear on printed covers that bracket
4085 the whole aggregate.
4086
4087 8. TRANSLATION
4088
4089 Translation is considered a kind of modification, so you may
4090 distribute translations of the Document under the terms of section
4091 4. Replacing Invariant Sections with translations requires special
4092 permission from their copyright holders, but you may include
4093 translations of some or all Invariant Sections in addition to the
4094 original versions of these Invariant Sections. You may include a
4095 translation of this License, and all the license notices in the
4096 Document, and any Warrany Disclaimers, provided that you also
4097 include the original English version of this License and the
4098 original versions of those notices and disclaimers. In case of a
4099 disagreement between the translation and the original version of
4100 this License or a notice or disclaimer, the original version will
4101 prevail.
4102
4103 If a section in the Document is Entitled "Acknowledgements",
4104 "Dedications", or "History", the requirement (section 4) to
4105 Preserve its Title (section 1) will typically require changing the
4106 actual title.
4107
4108 9. TERMINATION
4109
4110 You may not copy, modify, sublicense, or distribute the Document
4111 except as expressly provided for under this License. Any other
4112 attempt to copy, modify, sublicense or distribute the Document is
4113 void, and will automatically terminate your rights under this
4114 License. However, parties who have received copies, or rights,
4115 from you under this License will not have their licenses
4116 terminated so long as such parties remain in full compliance.
4117
4118 10. FUTURE REVISIONS OF THIS LICENSE
4119
4120 The Free Software Foundation may publish new, revised versions of
4121 the GNU Free Documentation License from time to time. Such new
4122 versions will be similar in spirit to the present version, but may
4123 differ in detail to address new problems or concerns. See
4124 `http://www.gnu.org/copyleft/'.
4125
4126 Each version of the License is given a distinguishing version
4127 number. If the Document specifies that a particular numbered
4128 version of this License "or any later version" applies to it, you
4129 have the option of following the terms and conditions either of
4130 that specified version or of any later version that has been
4131 published (not as a draft) by the Free Software Foundation. If
4132 the Document does not specify a version number of this License,
4133 you may choose any version ever published (not as a draft) by the
4134 Free Software Foundation.
4135
4136ADDENDUM: How to use this License for your documents
4137====================================================
4138
4139To use this License in a document you have written, include a copy of
4140the License in the document and put the following copyright and license
4141notices just after the title page:
4142
4143 Copyright (C) YEAR YOUR NAME.
4144 Permission is granted to copy, distribute and/or modify this document
4145 under the terms of the GNU Free Documentation License, Version 1.2
4146 or any later version published by the Free Software Foundation;
4147 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
4148 A copy of the license is included in the section entitled ``GNU
4149 Free Documentation License''.
4150
4151If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
4152replace the "with...Texts." line with this:
4153
4154 with the Invariant Sections being LIST THEIR TITLES, with
4155 the Front-Cover Texts being LIST, and with the Back-Cover Texts
4156 being LIST.
4157
4158If you have Invariant Sections without Cover Texts, or some other
4159combination of the three, merge those two alternatives to suit the
4160situation.
4161
4162If your document contains nontrivial examples of program code, we
4163recommend releasing these examples in parallel under your choice of
4164free software license, such as the GNU General Public License, to
4165permit their use in free software.
4166
4167
4168File: gawkinet.info, Node: Index, Prev: GNU Free Documentation License, Up: Top
4169
4170Index
4171*****
4172
4173* Menu:
4174
4175* /inet/ files (gawk): Gawk Special Files. (line 490)
4176* /inet/raw special files (gawk): File /inet/raw. (line 712)
4177* /inet/tcp special files (gawk): File /inet/tcp. (line 647)
4178* /inet/udp special files (gawk): File /inet/udp. (line 679)
4179* advanced features, network connections: Troubleshooting. (line 834)
4180* agent <1>: MOBAGWHO. (line 2766)
4181* agent: Challenges. (line 1887)
4182* AI: Challenges. (line 1887)
4183* apache <1>: MOBAGWHO. (line 2802)
4184* apache: WEBGRAB. (line 2372)
4185* Bioinformatics: PROTBASE. (line 3590)
4186* BLAST, Basic Local Alignment Search Tool: PROTBASE. (line 3369)
4187* blocking: Making Connections. (line 383)
4188* Boutell, Thomas: STATIST. (line 2396)
4189* CGI (Common Gateway Interface): MOBAGWHO. (line 2802)
4190* CGI (Common Gateway Interface), dynamic web pages and: Web page.
4191 (line 1130)
4192* CGI (Common Gateway Interface), library: CGI Lib. (line 1418)
4193* clients: Making Connections. (line 369)
4194* Clinton, Bill: Challenges. (line 1870)
4195* Common Gateway Interface, See CGI: Web page. (line 1130)
4196* Computational Biology: PROTBASE. (line 3590)
4197* contest: Challenges. (line 1817)
4198* cron utility: STOXPRED. (line 3068)
4199* CSV format: STOXPRED. (line 3173)
4200* dark corner, RAW protocol: File /inet/raw. (line 719)
4201* Dow Jones Industrial Index: STOXPRED. (line 3089)
4202* ELIZA program: Simple Server. (line 1606)
4203* email: Email. (line 1045)
4204* FASTA/Pearson format: PROTBASE. (line 3465)
4205* FDL (Free Documentation License): GNU Free Documentation License.
4206 (line 3742)
4207* filenames, for network access: Gawk Special Files. (line 485)
4208* files, /inet/ (gawk): Gawk Special Files. (line 490)
4209* files, /inet/raw (gawk): File /inet/raw. (line 712)
4210* files, /inet/tcp (gawk): File /inet/tcp. (line 647)
4211* files, /inet/udp (gawk): File /inet/udp. (line 679)
4212* finger utility: Setting Up. (line 981)
4213* Free Documentation License (FDL): GNU Free Documentation License.
4214 (line 3742)
4215* FTP (File Transfer Protocol): Basic Protocols. (line 316)
4216* gawk, networking: Using Networking. (line 414)
4217* gawk, networking, connections <1>: TCP Connecting. (line 781)
4218* gawk, networking, connections: Special File Fields.
4219 (line 549)
4220* gawk, networking, filenames: Gawk Special Files. (line 485)
4221* gawk, networking, See Also email: Email. (line 1040)
4222* gawk, networking, service, establishing: Setting Up. (line 965)
4223* gawk, networking, troubleshooting: Caveats. (line 1791)
4224* gawk, web and, See web service: Interacting Service.
4225 (line 1214)
4226* getline command: TCP Connecting. (line 786)
4227* GETURL program: GETURL. (line 2050)
4228* GIF image format <1>: STATIST. (line 2396)
4229* GIF image format: Web page. (line 1130)
4230* GNU Free Documentation License: GNU Free Documentation License.
4231 (line 3742)
4232* GNU/Linux <1>: REMCONF. (line 2107)
4233* GNU/Linux <2>: Interacting. (line 931)
4234* GNU/Linux: Troubleshooting. (line 882)
4235* GNUPlot utility <1>: STATIST. (line 2396)
4236* GNUPlot utility: Interacting Service.
4237 (line 1396)
4238* Hoare, C.A.R. <1>: PROTBASE. (line 3369)
4239* Hoare, C.A.R.: MOBAGWHO. (line 2766)
4240* hostname field: Special File Fields.
4241 (line 529)
4242* HTML (Hypertext Markup Language): Web page. (line 1114)
4243* HTTP (Hypertext Transfer Protocol) <1>: Web page. (line 1090)
4244* HTTP (Hypertext Transfer Protocol): Basic Protocols. (line 316)
4245* HTTP (Hypertext Transfer Protocol), record separators and: Web page.
4246 (line 1114)
4247* HTTP server, core logic: Interacting Service.
4248 (line 1214)
4249* Humphrys, Mark: Simple Server. (line 1774)
4250* Hypertext Markup Language (HTML): Web page. (line 1114)
4251* Hypertext Transfer Protocol, See HTTP: Web page. (line 1090)
4252* image format: STATIST. (line 2396)
4253* images, in web pages: Interacting Service.
4254 (line 1396)
4255* images, retrieving over networks: Web page. (line 1130)
4256* input/output, two-way, See Also gawk, networking: Gawk Special Files.
4257 (line 475)
4258* Internet, See networks: Interacting. (line 952)
4259* JavaScript: STATIST. (line 2446)
4260* Linux <1>: REMCONF. (line 2107)
4261* Linux <2>: Interacting. (line 931)
4262* Linux: Troubleshooting. (line 882)
4263* Lisp: MOBAGWHO. (line 2858)
4264* localport field: Gawk Special Files. (line 490)
4265* Loebner, Hugh: Challenges. (line 1817)
4266* Loui, Ronald: Challenges. (line 1887)
4267* MAZE: MAZE. (line 2634)
4268* Microsoft Windows: WEBGRAB. (line 2343)
4269* Microsoft Windows, networking: Troubleshooting. (line 882)
4270* Microsoft Windows, networking, ports: Setting Up. (line 996)
4271* MiniSQL: REMCONF. (line 2212)
4272* MOBAGWHO program: MOBAGWHO. (line 2766)
4273* NCBI, National Center for Biotechnology Information: PROTBASE.
4274 (line 3369)
4275* networks, gawk and: Using Networking. (line 414)
4276* networks, gawk and, connections <1>: TCP Connecting. (line 781)
4277* networks, gawk and, connections: Special File Fields.
4278 (line 549)
4279* networks, gawk and, filenames: Gawk Special Files. (line 485)
4280* networks, gawk and, See Also email: Email. (line 1040)
4281* networks, gawk and, service, establishing: Setting Up. (line 965)
4282* networks, gawk and, troubleshooting: Caveats. (line 1791)
4283* networks, ports, reserved: Setting Up. (line 996)
4284* networks, ports, specifying: Special File Fields.
4285 (line 518)
4286* networks, See Also web pages: PANIC. (line 2008)
4287* Numerical Recipes: STATIST. (line 2414)
4288* ORS variable, HTTP and: Web page. (line 1114)
4289* ORS variable, POP and: Email. (line 1070)
4290* PANIC program: PANIC. (line 2008)
4291* Perl: Using Networking. (line 422)
4292* Perl, gawk networking and: Using Networking. (line 432)
4293* Perlis, Alan: MAZE. (line 2634)
4294* pipes, networking and: TCP Connecting. (line 805)
4295* PNG image format <1>: STATIST. (line 2396)
4296* PNG image format: Web page. (line 1130)
4297* POP (Post Office Protocol): Email. (line 1040)
4298* Post Office Protocol (POP): Email. (line 1040)
4299* PostScript: STATIST. (line 2528)
4300* PROLOG: Challenges. (line 1887)
4301* PROTBASE: PROTBASE. (line 3369)
4302* protocol field: Special File Fields.
4303 (line 511)
4304* PS image format: STATIST. (line 2396)
4305* Python: Using Networking. (line 422)
4306* Python, gawk networking and: Using Networking. (line 432)
4307* RAW protocol: File /inet/raw. (line 712)
4308* record separators, HTTP and: Web page. (line 1114)
4309* record separators, POP and: Email. (line 1070)
4310* REMCONF program: REMCONF. (line 2107)
4311* remoteport field: Gawk Special Files. (line 490)
4312* robot <1>: WEBGRAB. (line 2306)
4313* robot: Challenges. (line 1896)
4314* RS variable, HTTP and: Web page. (line 1114)
4315* RS variable, POP and: Email. (line 1070)
4316* servers <1>: Setting Up. (line 981)
4317* servers: Making Connections. (line 362)
4318* servers, as hosts: Special File Fields.
4319 (line 529)
4320* servers, HTTP: Interacting Service.
4321 (line 1214)
4322* servers, web: Simple Server. (line 1601)
4323* Simple Mail Transfer Protocol (SMTP): Email. (line 1040)
4324* SMTP (Simple Mail Transfer Protocol) <1>: Email. (line 1040)
4325* SMTP (Simple Mail Transfer Protocol): Basic Protocols. (line 316)
4326* SPAK utility: File /inet/raw. (line 727)
4327* STATIST program: STATIST. (line 2396)
4328* STOXPRED program: STOXPRED. (line 3051)
4329* synchronous communications: Making Connections. (line 383)
4330* Tcl/Tk: Using Networking. (line 422)
4331* Tcl/Tk, gawk and <1>: Some Applications and Techniques.
4332 (line 1977)
4333* Tcl/Tk, gawk and: Using Networking. (line 432)
4334* TCP (Transmission Control Protocol) <1>: File /inet/tcp. (line 647)
4335* TCP (Transmission Control Protocol): Using Networking. (line 437)
4336* TCP (Transmission Control Protocol), connection, establishing: TCP Connecting.
4337 (line 781)
4338* TCP (Transmission Control Protocol), UDP and: Interacting. (line 952)
4339* TCP/IP, protocols, selecting: Special File Fields.
4340 (line 511)
4341* TCP/IP, sockets and: Gawk Special Files. (line 475)
4342* Transmission Control Protocol, See TCP: Using Networking. (line 437)
4343* troubleshooting, gawk, networks: Caveats. (line 1791)
4344* troubleshooting, networks, connections: Troubleshooting. (line 834)
4345* troubleshooting, networks, timeouts: Caveats. (line 1803)
4346* UDP (User Datagram Protocol): File /inet/udp. (line 679)
4347* UDP (User Datagram Protocol), TCP and: Interacting. (line 952)
4348* Unix, network ports and: Setting Up. (line 996)
4349* URLCHK program: URLCHK. (line 2225)
4350* User Datagram Protocol, See UDP: File /inet/udp. (line 679)
4351* vertical bar (|), |& operator (I/O): TCP Connecting. (line 800)
4352* VRML: MAZE. (line 2634)
4353* web browsers, See web service: Interacting Service.
4354 (line 1214)
4355* web pages: Web page. (line 1090)
4356* web pages, images in: Interacting Service.
4357 (line 1396)
4358* web pages, retrieving: GETURL. (line 2050)
4359* web servers: Simple Server. (line 1601)
4360* web service <1>: PANIC. (line 2008)
4361* web service: Primitive Service. (line 1156)
4362* WEBGRAB program: WEBGRAB. (line 2306)
4363* Weizenbaum, Joseph: Simple Server. (line 1606)
4364* XBM image format: Interacting Service.
4365 (line 1396)
4366* Yahoo! <1>: STOXPRED. (line 3051)
4367* Yahoo!: REMCONF. (line 2107)
4368* | (vertical bar), |& operator (I/O): TCP Connecting. (line 800)
4369
4370
4371
4372Tag Table:
4373Node: Top2000
4374Node: Preface5688
4375Node: Introduction7063
4376Node: Stream Communications8088
4377Node: Datagram Communications9261
4378Node: The TCP/IP Protocols10892
4379Ref: The TCP/IP Protocols-Footnote-111576
4380Node: Basic Protocols11733
4381Node: Ports13055
4382Node: Making Connections14460
4383Ref: Making Connections-Footnote-117027
4384Ref: Making Connections-Footnote-217074
4385Node: Using Networking17255
4386Node: Gawk Special Files19609
4387Node: Special File Fields21609
4388Ref: table-inet-components25353
4389Node: Comparing Protocols28235
4390Node: File /inet/tcp28824
4391Node: File /inet/udp29844
4392Node: File /inet/raw30959
4393Ref: File /inet/raw-Footnote-133974
4394Node: TCP Connecting34051
4395Node: Troubleshooting36380
4396Ref: Troubleshooting-Footnote-139424
4397Node: Interacting39964
4398Node: Setting Up42684
4399Node: Email46166
4400Node: Web page48485
4401Ref: Web page-Footnote-151272
4402Node: Primitive Service51466
4403Node: Interacting Service54194
4404Ref: Interacting Service-Footnote-163291
4405Node: CGI Lib63320
4406Node: Simple Server70269
4407Ref: Simple Server-Footnote-177975
4408Node: Caveats78073
4409Node: Challenges79213
4410Node: Some Applications and Techniques87874
4411Node: PANIC90322
4412Node: GETURL92034
4413Node: REMCONF94650
4414Node: URLCHK100114
4415Node: WEBGRAB103937
4416Node: STATIST108367
4417Ref: STATIST-Footnote-1120029
4418Node: MAZE120471
4419Node: MOBAGWHO126646
4420Ref: MOBAGWHO-Footnote-1140547
4421Node: STOXPRED140599
4422Node: PROTBASE154809
4423Node: Links167844
4424Node: GNU Free Documentation License171278
4425Node: Index193671
4426
4427End Tag Table
Note: See TracBrowser for help on using the repository browser.