|
When I attended IPROF 98 in Sunnyvale
this year, I noticed that several people using web servers on their HP
3000 seem to have CGI programs written in COBOL, SPLash or similar languages
that are called from "intermediate" Perl or Shell scripts instead
of being invoked directly from the httpd server process.
As far as I understand, these "intermediate" scripts are mainly
used for some kind of parameter passing, for example by putting Posix environment
variables setup by the httpd parent process into the INFO string of the
COBOL or SPLash program, but the additional Perl interpreter or Posix Shell
process also introduces additional overhead and resource needs. Many lightweight
(native) CGI programs are written in C because handling the stdin/stdout
sockets or pipes as well as the getenv() access to Posix environment variables
is easily done in C, but (at least as of MPE/iX 5.5) it is not impossible
to do the same in other programming languages as well.
The important tips on this issue have been available in the Web Server
and CGI related white paper at http://www.hp.com/csy
for quite a while now, but as that white paper does not yet seem to have
a COBOL example (just C, Perl and Pascal so far), I thought I'd create
one in COBOL.
Calling the COBOL CGI with method get
Perform
Test with 10 lines
|
Test with 100 lines
|
Test with 1000 lines
Calling the COBOL CGI with method post
And here are some notes how all this works...
Please bear with me as I am neither an expert on designing fancy HTML
Forms (as you might already have noticed above ;-) nor a longtime COBOL
programmer (as you might find out when looking at the source code supplied
below). I hope that this stuff can still be useful for you.
Here is the COBOL source code for the CGI
program handling the above HTML Form.
(By the way, use View Source in your browser to explore the above
HTML form tags).
The two main "tricks" used are (a) to call the READX and PRINT
intrinsics for accessing the stdin and stdout socket or pipe and (b) to
call a few subroutines from the Posix C library to read some important
Posix environment variables preset by the httpd parent process.
Using READX and PRINT instead of the typical COBOL (or whatever
language) standard I/O verbs/routines is important because the CGI concept
uses the Posix process creation model where the httpd server uses fork()
to create a child process, does some setup to "wire" the child's
stdin and stdout to the open client connection socket or a pipe to the
parent process, presets a number of Posix environment variables and finally
calls exec() to replace the httpd program code by the configured CGI program
(in the child process context).
If you would use the COBOL standard I/O verbs like DISPLAY, then the
program initialization would (re)open its $STDIN and $STDLIST and thus
break the setup prepared by the httpd parent process. Your output, for
example, would probably end up in the server job listing instead of being
sent back to the client across the network socket connection.
Please notice that you need to be on MPE/iX 5.5 or later to use READX
and PRINT this way.
Reading the Posix Environment Variables is accomplished by calling
the getenv() subroutine from the Posix C library. As the getenv() function
returns a pointer to a C-style string (a sequence of characters terminated
by the ASCII character NUL), the program also uses functions like strlen()
and strncpy() to determine the length and copy an appropriate number of
characters into a working-storage section buffer. Some care must be taken
in the COBOL program as the Posix library routines deal with pointers (passed
by value).
As the COBOL program calls external routines from the Posix C library,
the compiled object file has to be linked with the /lib/libc.a NMRL using
appropriate options. Please see the comments at the beginning of the COBOL
source code for details regarding this.
You also should notice that Posix Environment Variables are not to be
confused with the regular CI variables (handled with the HPCIGETVAR intrinsic,
for example). The CI variables apply to the job/session context (and are
thus shared between different processes inside a job or session process
tree), whereas the Posix Env Vars belong to the process context (and are
not shared between different processes except that a child process created
by fork() does inherit its initial Posix Env Vars as a copy from the parent).
Miscellaneous notes...
The sample program does not (yet) perform extensive error checking.
Here it gives the benefit of brevity, but it might not be appropriate for
production usage. The intrinsic calls, for example, do not (yet) check
the condition code (which is poor programming style). Even the input parameters
sent by the client - either via the URL (passed by httpd parent in the
QUERY_STRING env var) for the GET method, or via stdin for the POST method
- are not checked quite thoroughly in this example. Additional checking
is recommended to avoid bad input from buggy forms or malicious clients
causing trouble by invalid characters or buffer overruns, for example.
The sample program also does not (yet) decode the input parameters,
which are typically URL encoded to deal with special characters. You will
notice this when examining the output of the POST Method example closely.
If the text supplied in the input field contains blanks or "special"
characters, then these are mapped to + chars or %xx sequences before the
client sends them. A full-blown CGI program must contain a subroutine to
decode these back to the original strings.
If you look at the source code, you will notice that it uses STRING
and UNSTRING verbs quite extensively. The latter are used to decode/break-up
the input fields from the HTML Form, the former are used to compose output
in the buffer before calling the PRINT intrinsic, for example. Using the
COBOL preprocessor macro facility will probably be quite helpful to make
such source code more compact, less error-prone and improve readability.
Keep this suggestion in mind when you start creating you own CGI programs.
A final note regarding buffered output: The GET Method part of the sample
program implements the output loop by calling the PRINT intrinsic for every
line sent, whereas the POST Method part of the sample program implements
the output loop by combining multiple lines of output in the buffer and
calling the PRINT intrinsic less frequently thus. Depending on the performance
and load of your 3000, you might see quite some difference in the 1000-line
loop example (in the download time of the resulting web page as well as
the resource usage on the 3000 side). I have supplied the GET and POST
parts this way to give an idea about these different approaches.
By the way, most of the issues discussed with this COBOL CGI example
only apply to HP 3000 Web Servers that are ported from the Unix domain
and thus use the CGI model based on fork() and exec() Posix library calls.
Things can be much easier when you are using the QWEBS web server from
http://www.qss.com which has been written
for MPE and thus supports features like calling XL subroutines from/within
the http server process.
Lars Appel, 26-Mar-98 / 10-Apr-98
|