Abstract: Pegasus is another web server for Plan 9 operating system. Pegasus enables natural management of web document: creator of web pages should be able to do everything for the pages. To enable such management in secure manner, the server is running in name space that is separated from other user's documents. This feature comes from novel ability of Plan 9: per process name space configuration. This paper presents briefly the feature of Pegasus.
Keyword: Internet security, web service, Plan 9, httpd, CGI, configurable name space
Author: Kenji Arisawa
Faculty of Business Administration, Aichi University
370 Kurozasa, Miyoshi-cho, Nishikamo-gun, Aichi-ken, 470-0296 Japan
Email: arisawa@aichi-u.ac.jp
Date: 2006/11/13 (revision)
World Wide Web has grown to be one of the most important information service today. Common Gateway Interface (CGI) plays important role in the service. Some services accept data that are written in input fields of a browser, and process the data so that the server can respond to the request of the client. Some web documents are created on the fly by the server so that the client can look most recent information constructed from accumulated data in database. These two examples are typical cases that servers use CGI mechanism in servicing web documents.
It is said that writing a secure program for CGI is difficult even for professional programmers. CGI programs working on regular input data without problems have sometimes security halls. You will find many examples of such cases on W3C Security FAQ.
The problem comes from the fact: commonly used CGI programs can access whole name space of the system as shown in Fig.1, where "real space", illustrated by outer rectangle, denotes the whole name space of the system.
We can find some efforts to overcome this problem by adding a wrapper program to an existing httpd: cgiwrapper enables user's CGI scripts to run as the user's privilege so that other user's files can be protected from these script; sbox is a sort of cgiwrapper that optionally restricts access space of user's CGI script to his/her home directory by using "chroot" system call of UNIX.
How does httpd for Plan 9? One of the novel features of Plan 9 is the concept of name space: the name space is dynamically configurable by each process. The configurability is much smarter than that of UNIX. The configured name space of Plan 9 never affects other processes except the child processes. The configured information is not in disk storage; the information is not left even if the httpd or any child process such as CGI process are accidentally terminated. Therefore we expect: we will have much smarter and much safer httpd than that of UNIX world.
The name space of official httpd from Bell-labs is illustrated in Fig.2.
Pegasus is another httpd in Plan 9 world. Pegasus is designed to be a secure and flexible httpd that is based on the novel feature of Plan 9. This paper shows how "per process name space configuration" can be applied to httpd and shows how the users get benefit from the ability.
This section illustrates briefly basic feature of Pegasus 2.2. Most of them are same as previous version. The important advance is in the support of CGI/1.1 and digest authentication.
Currently, Pegasus 2.3 is in testing. Bugs that are found in developing the new version are commented in footnote.
The location is fixed for each user. On the other hand, httpd roots of real host and virtual hosts are configured in "/sys/lib/httpd.rewrite". Let the httpd root for bob be "/usr/bob/www/", and the httpd root for carol be "/usr/carol/www/".
Basic structure of the name space under httpd root is uniform: each httpd root has a set of common subdirectories. Let $web be a path to the httpd root for a user. Then the directory
$web/doc/is document root of the user, and the directory
$web/etc/is the place for access control files, and the directories
$web/bin/$cputype/(where "$cputype" is "386" for PC compatible machines,) and
$web/bin/rc/are the places for executable binaries and scripts that is owned by the user. Following instructions in "/lib/namespace.httpd", Pegasus merges these executables with the executables that are offered by david. The merge is virtual, i.e., the result is not in disk torage. The magic comes from "bind" system call of Plan 9. The result will be seen in "/doc/" and "/bin/" in httpd space.
In servicing documents of alice, Pegasus will see only files under "/usr/alice/web/" and files that are provided by david as illustrated in Fig.4.
The figure shows that the file "/usr/alice/web/doc/index.html" will be seen as "/doc/index.html" by httpd. Thus the files outside of "/usr/alice/web/" are essentially hided from httpd except files that are made visible by david.
Tasks that are required by alice, bob, and carol are same: creating documents, removing documents, moving documents, modifying documents, setting access control, setting redirection, using CGI scripts, configuring CGI handling, etc. Web server should be designed as follows: all that is concerned with their documents should be in their own hands, and furthermore, their tasks should never interfere with each other.
The tasks of david, system administrator, should be minimum: only basic configuration that are common for all users as illustrated in Fig.5.
The concept of "httpd root" of Pegasus is different from that of "server root" of main httpd servers such as Apache. Traditional concept of server root is merely a directory in which httpd is installed. That does not set a boundary of name space for httpd. On the contrary, httpd root of Pegasus has an essential meaning: that sets a boundary of visible name space and invisible name space for httpd. Httpd root of Pegasus is multiple, uniform, and essential. Pegasus is designed to achieve the concept.
The file "$web/etc/handler" connects requested path to the action; the sample contents of the file are shown below.
# path mimetype hctl execpath arg ... /netlib/*/index.html text/html 0 /bin/ftp2html *.http - 1 $target */nph-*.cgi - 1 $target *.cgi text/html + $target *.html text/html 0 $target *.tt text/html 0 /bin/peep $target
First field named "path" shows a path pattern that is same as that used in Plan 9 shell "rc" but for two exceptions: slash '/' is not a special character and "/*/" matches "/". Thus "*.cgi" means a file with suffix "cgi". The field is compared with the requested path name on URI. More strictly speaking, requested path name from document root is compared. That is, if requested path is "/~alice/foo", then the first field is compared with "/foo". All terms "requested path" in this subsection follow to this meaning.
Second field named "mimetype" is the default mime type* for the HTTP header "Content-Type:". If '-' is specified and "hctl" is not '1', the "Content-Type:" is guessed by the server.
Third field named "hctl" takes three values '1','+', and '0' that mean control level to the http headers by the script; the meanings are as follows.
1 Full control by the script + Partial control by the script 0 No control by the script
If '1' is specified the script has responsibility to write all HTTP headers including "Content-Type" and "Content-Length"; the script is called non-parsed CGI in RFC3875. HTTP headers must be separated from HTML headers by a single blank line, i.e., a line that contains only "\n" code.
If '0', the script must not write HTTP header. The header is provided by httpd.
Finally if '+', the script may contain HTTP headers in compliance with CGI/1.1.
The fields greater than third define a command and its arguments that will be invoked when incoming requested path matches the first field. In these fields, $target is the path name prepended "/doc" to the requested path.
The last line in the sample handler is interpreted as follows: If a file, say "/foo/bar.tt", is requested, then "/bin/peep" is invoked with an argument "/doc/foo/bar.tt".
Of course files in fourth field must be executable. If the fourth field is $target, then the file must be executable. On the contrary, $target after fourth field needs neither be executable nor be existing.
Note that most browsers do not judge mime type by looking the content but judge only by looking the file suffix. Therefore it is safe to put popular file suffix adequate to the content.
Executable file with a suffix "http" and "html" first appeared in Ykhttpd by Charles Forsyth. Sample configuration shown above inherits his idea.
Plan 9 is based on UTF-8 encoding. Not only the path name is UTF-8, but also the system tools are consistently supporting UTF-8. Thus, we can execute the following commands without seeing any oddness which are still common in UNIX world.
term% mkdir こんにちは term% ls ... こんにちは ... term% cd こんにちはNowadays, UTF-8 encoding is accepted as a standard of Internet world. IRI(International Resource Identifiers) described in RFC3987 claims to stand on UTF-8 for path part of URI and main browsers are already supporting IRI. In Fig.6, You will observe that "こんにちは" is shown as it is in the URL field of the browser.
As shown in this figure, HTML documents that are placed in the directory "こんにちは" is accessible by the HTTP client.
For Plan 9, the support by httpd is very simple. The feature is already achieved by official httpd of Plan 9. In supporting IRI, Pegasus added only a little modification to official httpd for special feature of redirection of Pegasus. Plan 9 is excellent with IRI.
It is a task of browsers to interpret href value such as
<a href="http://ar.aichi-u.ac.jp/こんにちは/">こんにちは</a>and to transform it to an appropriate URI in accessing the host. Nowadays, many browsers have the ability. However UTF-8 string in query part of href value has still problems in some main browsers. Therefore an auxiliary tool named "hfmt" is provided to compensate the problem.
IDN(International Domain Name) described in RFC3490 is not supported yet, because it seems the needs is very small even in Japan.
httpd [-Mfmsuw] [-t timeout] [-p port] [-c certificate] [-n namespace] [-R rewrite] [...]where option "-s" means "server mode", option "-u" means that it runs as a user who invoked httpd, and option "-M" indicates that the httpd is invoked by mon. Other options are not explained in this paper. Readers can find the information from elsewhere.
In regular circumstance, httpd is accessed very frequently. In such case, running server mode is recommended.
If httpd is running as user "none", there exists a security problem: private data that is to be read by the httpd would not be protected from system users, because system users can become user "none". Therefore it is necessary for secure web service to run the httpd as some fictional user name, say, "web".
How to invoke httpd that runs as user "web" without password so that httpd can be automatically invoked in starting cpu server?
Mon is provided for this purpose. The ability is based on "devcap" that is introduced in Plan 9 ed.4.
Mon must be invoked by the privilege of host owner so that mon can invoke httpd as a daemon of any user. The command syntax is as follows:
mon [-d] [-u user] [-r req] cmdpath argument ...where option "-d" implies daemon mode, and option "-u user" indicates mon to invoke httpd as the given user. The option "-r req" is used for https support, where "req" is factotum format of X.509 certificate signing request*.
An example of the usage is shown below.
b=/usr/local/bin/$cputype $b/mon -du web $b/httpd -suM c=/sys/lib/tls/cert k=/sys/lib/tls/key $b/mon -du web -r $k $b/httpd -suM -p443 -c $cThis example is extracted from "/rc/bin/cpurc" of the author's cpu server that supports both http and https service. The output of "ps" command are shown below. They are live example.
ar% ps|grep 'httpd|mon' bootes 67 0:00 0:00 76K Await mon bootes 70 0:00 0:00 76K Await mon web 92245 0:03 4:15 1516K Open httpd web 92246 0:00 0:00 1516K Open httpd web 149886 0:00 0:00 1644K Pread httpd ar%We observe three httpds in this example. The first and the second httpds are servicing TCP port 80 and 443 respectively. Both were invoked by mon, and waiting for requests from clients. The third httpd is a child process of the first, and is now in service.
If httpd was not killed, PID of httpd that is invoked by mon would be the next to the PID of mon. However in the above case, the httpd was killed for update. What happens then? Mon monitors termination of the daemon so that mon can immediately invoke next new httpd. This facility is a protection against malicious CGI that tries to stop service by killing httpd.
Note that mon is running as user "bootes", not as user "web". This is because we should protect mon from CGI.
web:web:web:so that Pegasus can run as user "web".
Don't assign password for user "web".
Suppose a user, say alice, wants to have CGI environment, and wants to have files that must be readable only by httpd and by herself. How to do that? Preventing other system users to enter her httpd root will be enough:
chmod 750 $web chgrp web $webthough administrator's help is required to change the group*.
Then alice can safely make her file "foo" readable to user "web" by executing a command:
chmod 644 foo
# myname www # server name (default is your sysname) # base /usr/web # base directory for Pegasus # namespace /lib/namespace.httpd # name space configuration # rewrite /sys/lib/httpd.rewrite # system rewrite file # parameters below might be required tuning # allowbasic 0 # 1: allow basic authentication # parsetimeout 15 # timeout to parse header (in unit of sec) # waittimeout1 15 # wait timeout for a non-authenticated client # waittimeout2 900 # wait timeout for an authenticated client # alivetimeout 15 # keep alive timeout (in unit of sec) # cgitimeout 5 # timeout for CGI (in unit of sec) # posttimeout 900 # timeout to get POST data (in unit of sec) # connectlimit 300 # limit of total open tcp connections # maxpost 10 # post data size restriction (in unit of MB) # obstime 3 # observation time to detect burst access (in unit of sec) # maxaccess 20 # allowed max access in obstime # lockouttime 180 # lockout time for maxconnect and maxaccess (in unit of sec) # contmax 100 # max persistent continuation count for saftyBlank lines and lines beginning with '#' in this file are ignored. Following the '#', attribute and the default value are shown as a guide. Only first and second fields are consulted. Text that follows second '#' shows the meaning.
bind -a /usr/web/bin/$cputype /bin bind -a /usr/web/bin/rc /bin bind /sys/lib /usr/web/sys/lib bind /lib /usr/web/lib bind /bin /usr/web/bin bind /rc/lib /usr/web/rc/lib bind -c #e /usr/web/env bind #p /usr/web/proc bind #c /usr/web/dev # change the line below subject to your needs. bind /usr/bob/netlib /usr/bob/www/doc/netlibBind operation in the first line is to the "/bin" which contents are already configured in "/lib/namespace". We will see all executables of Plan 9 in the "/bin". Therefore, if david, system administrator, does want to restrict executables for CGI only to files in "/usr/web/bin/$objtype" and "/usr/web/bin/rc", then he would rather adopt
bind /usr/web/bin/$cputype /bininstead of the first line. The operation replaces the contents in "/bin" by the contents in "/usr/web/bin/$cputype".
The third line
bind /sys/lib /usr/web/sys/libexports "/sys/lib" to httpd space. Be careful in exporting "/sys/lib". There are files that must be secret; the example is "/sys/lib/tls/key". Examine the permission bit.
# car.shop.com is the virtual host for carol # the uri is http://car.shop.com http://car.shop.com */usr/carol/www # emili has moved to bell-labs /~emili http://plan9.bell-labs.com/~emili # real host must be the last / */usr/bob/wwwLines beginning with '#' are comments. Other lines consist of two fields.
The first field is a URI pattern. It begins with '/' or with scheme name followed by ':'. Request to the real host or the user is compared with the pattern that begins with '/'. The matching rule is very simple. The request that begins with the first field is regarded as matched. For example, if the request is "/~emili/foo/bar.html" then the request matches "/~emili".
The comparison is name-wise in name tree. Therefore "/~emilia/foo/bar.html" does not matches "/~emili" as it should be. The same matching rule is adopted to user level redirection in "$web/etc/rewrite" and access control in "$web/etc/allow" and "$web/etc/passwd".
The second field beginning with '*' shows the httpd root for real host or virtual host or regular user; the default httpd root for regular user is "/usr/$user/web/" where "$user" denotes the user name. Others are URI to which request is redirected. The reminder of the requested URI (i.e., "/foo/bar.html" in case the request is "/~emili/foo/bar.html") is appended in proper way to the second fields so that the request jumps to "http://plan9.bell-labs.com/~emili/foo/bar.html" and browsers can open the page there.
For regular documents, the following three directories are required.
For CGI, more directories will be required as shown below.
You can add more directories depending what to do in CGI script. The followings are the example.
Some overloaded clients might not be an attack. Then parameters in "/sys/lib/httpd.conf" should be adjusted. Service to the clients put into the blacklist is stopped during 3 minutes. Then service to the client is stopped during 3 minutes* and the IP address is put into the blacklist. However some overloaded clients might not be an attack. Then parameters in "/sys/lib/httpd.conf" should be adjusted.
We denote their httpd root by "$web" . The value is "/usr/alice/web/" for alice, and "/usr/bob/www/" for bob, etc. Then "$web/doc/" is a document root for the user.
The files "$web/etc/allow" and "$web/etc/passwd" control the access by IP address and by password respectively. Thus, access control is in user's hand as it should be.
The "$port" stands for a port number; the value is 80 for regular httpd service. Other port numbers are also acceptable.
Users can configure name space in the file "$web/etc/namespace_$port" standing on the name space offered by system administrator. The configuration is applied not only to CGI script but also to regular document. Note that any CGI script can additionally configure name space according to it's own needs in starting service.
Users can write their own rewrite rules in "$web/etc/rewrite". This feature is important for document management, because an access to some documents should be redirected to a secure scheme such as https.
Note that CGI handling is not the task of administrator. Users have a large variety of needs in constructing their web pages and the CGI scripts are connected to the web pages. Therefore it is desirable that users are allowed to have CGI scripts. In Pegasus, all about CGI are completely in user's hand. See "Execution handler" for more details.
/private https://car.shop.com/privatein "/usr/carol/www/etc/rewrite". This line implies that incoming request to "/private/foo" is redirected to "https://car.shop.com/private/foo", where "foo" stands for any path including empty path. Then, what happens if a request is already based on https?
Pegasus protects itself from redirection loop such as
/bar /baz /baz /baror more complicated redirection loops by counting loop counts if they appeared in the same TCP connection. The mechanism will work if the client is based on HTTP/1.1 and keeps connection on redirection to the same host and port*. Note that the redirection loop is safe also for clients based on HTTP/1.0 because these clients do not support automatic redirection.
The first example is the simplest one: incoming URI is same as redirected URI if the scheme is https. Then pegasus does not redirects the request. This feature makes the first example meaningful and useful.
Note that first field is a path from document root, and the second filed is a URI. Therefore alice, a regular user, must write as follows**:
/bar /~alice/bazif she wants to redirect "/bar" to her document "/baz".
/foo 192.168.*.* /foo/bar ~192.168.1?.* *A line that begins with "/" denotes path name. The name "/foo" matches "/foo/" and any deeper path such as "/foo/bar".
alice 3a58b912829a2e4b4720c3a41e58dd29 /private alice@heraThe single line supports both basic and digest authentication. The line consists of four fields:
user key path realmKey is produced by using shell commands of Plan 9 as follows:
echo -n 'user:realm:password' | md5sumFor example, assume that alice wants to set a password "black cat" to the realm "alice@hera", then alice will get the key as follows:
echo -n 'alice:alice@hera:black cat' | md5sumwhich will produce the key in the example.
Old format which supports only basic authentication is also supported for compatibility.
$web/doc/~is special.
This feature came from "/favicon.ico" problem:
certain browser tries to get "favicon.ico" of the real host every time in accessing user's(say, alice's) document. Then, old Pegasus closes the connection to direct the browser to the name space of real host. The browser comes back again to the name space of alice with new connection. The rash of redirections and reconnections will reduces the efficiency.
Pegasus 2.2 admits the needs for the user to refer some of files in real host keeping the connection and keeping the name space.
Now, Pegasus manipulates the problem as follows.
Suppose bob, a owner of real host, has a file "foo" in "/usr/bob/www/doc/~/", and suppose Pegasus is running in the name space of alice.
Then Pegasus can see the file in "/doc/~/"(in name space of alice), so that Pegasus can respond directly to the request with URI path "/foo"(a request to real host) by referencing "/~alice/~/foo"(path "/~/foo" in name space of alice) keeping the connection and keeping the name space.
Here, "foo" is explained as a file for simplicity, but any path is acceptable.
'%', ';', '?', '#', '"', '&', '<', '>', '=', '+', ' 'Current version of Pegasus does not support to use three letters ';', '?', '#' in path name.
If they were used in path name and the path name is referenced in href value in HTML document, then they must be encoded using hexadecimal %HH style**.
<a href="/foo.html">...</a>However according to HTML specification, the file "/foo.html" followed by "href=" does not mean the file in "/user/alice/web/doc/foo.html" but means the file in the document root of real host.
The correct expression is
<a href="/~alice/foo.html">...</a>if the "foo.html" is in her document root.
http://host[:port][/path][;params][?query]based on HTTP/1.1 defined in RFC2069. However the RFC was made obsolete by RFC2617 and the new syntax have been defined as
http://host[:port][/path][?query]which is different from old syntax in ";params" part. In the RFC, semicolon is kept as reserved character in path field but the semantic is unclear.
Pegasus has been utilized the "params" for CGI script: the parameters in "params" are passed as arguments to the script. That still works in regular situation. However if the path is in the area that requires authentication, Mac/Safari fails to be authenticated.
Therefore it is not recommended to use ";", though very handy in writing script.
In case that "hctl" is '+', Pegasus automatically attaches headers that are required in regular response to the request. Thus, it is enough for the script to write only special headers: if the header name is absent in regular HTTP headers, then the header is added; if the header name is already present in regular HTTP headers, then the header is replaced by new one in the script.
An example of output of this types of CGI is shown below.
Set-Cookie: cookie=something; expire=Sun, 6-Aug-2006 11:43:57 GMT; domain=ar.aichi-u.ac.jp; path=/test4; secure <html> <head> <title>Cookie sample</title> </head> <body> ... </body> </html>In case that "hctl" field is '+', HTTP header "Content-Length:" in the output of script will be ignored because server cares the header. On the contrary, if "hctl" field is '1', all header must be provided by the script including "Content-Length:".
In writing scripts, you can use ramdisk. The disk is always provided at "/tmp" by the server, and output of script is once stored in the file named "/tmp/...", and then sent to the client as if it is a regular file.
You need not worry about if the files in "/tmp" are peeped by others, because Plan 9 ramdisk is private to a process and its children. And also you need not worry about if the files in "/tmp" are left somewhere when the script is accidentally terminated, because the ramdisk vanishes as soon as the execution of script is finished or terminated. These features make the use of temporal files safe and reliable.
Subject to the specification, environment variable QUERY_STRING is set even if query is not present; then the value is null string. However Plan 9 shell rc does not recognize the $QUERY_STRING as a null string but recognizes as shell variable with no components. The substitution in rc script
QUERY_STRING=$"QUERY_STRINGwill produce the value requested by the RFC. All the null string environment variables have same problem.
Scripts written in rc do not need QUERY_STRING. Pegasus decodes the query internally and builds variables suitable for rc. Environment variable names that are created from the query begin with "QS_".
Nameless data such as "alice+bob&carol" in the query produce shell variable QS_ with the value
('alice bob' carol)which enables an assignment
$*=$QS_in rc script to produce arguments of rc.
Query data such as "name=alice&name=bob" will produce shell variable QS_name with the value
(alice bob)which is again a one-dimensional array. Thus, no query data is lost.