Rit
— Embedded Rc in Text —

Sep 2007
Kenji Arisawa

Rit is a PHP like text processor that is designed for use in Plan 9. Embedded scripts are between “${” and “}” in text. In the area we can use full functionality of Rc – the Plan 9 shell.

Introduction

Writing web documents has grown one of the most popular style in publishing today. Many of these documents are based on HTML and Javascript. However some of them need help with server side scripting code.

Consider the case that we need to introduce some server side codes to an existing Web document. It will require much modification to rewrite the document in accordance with grammar of the scripting language. If the server side codes can be embedded in existing document, amount of the modification will be much reduced.

There are several text processors for embedded code. PHP[4] is one of the most popular ones. Accoring to the doucument, PHP is a general-purpose scripting language that is especially suited for Web development and can be embedded into HTML. Rit[1] is similar to PHP in this respect. However the basic philosophy is much different.

PHP is a big software and provides new language for the embedded code. On the other contrary, Rit is small and simple. Rit does almost nothing to embedded script. Rit merely parses text to determine code areas; the areas are passed to Rc[2][3] so that they are interpreted and executed by Rc.

This approach has the following advantages:

The last advantage is important for users because users want to use their familiar tools.

Some people want to write in Perl and others, Python, etc. Thus there exist text processors that handle embedded Perl[6][7], embedded Python[8][9]. These scripting languages are powerful enough to handle input by themselves. However Python and Perl are so big. Loading big executable is itself a disadvantage to use in web services.

To the contrary, both Rit and Rc is small*. Rit just collaborates with Rc. Rc is well designed shell which enables users to use variety of tools if necessary. Thus, we can keep traditional developing style of UNIX or Plan 9: not standing on single big software, but standing on small neat tools.

Note*: Code sizes of Rit and Rc are 45KB(73KB) and 94KB(144KB) respectively. On the other hands, sizes of Python 2.4 and Perl 5.8 are 1.5MB(4.0MB) and 4.5MB(9.6MB) respectively. These values are on i386 PC. Values in parenthesis include symbol table. Sizes of Python and Perl depend on modules that are included in compilation.

Javascript is widely used in recent web documents. Some of code that would have been written for server side can be written in Javascript. Thus, code needed on server side is getting smaller, which means, in many cases, we need not big do-all software on server side.

History

Rit is developed inspired by PHP in 2004. The first version 1.0 was released in the late of 2004. The name came from “Rc In Text”. The syntax has been kept unchanged up to now. Reason of the update was mainly bug fix. However, there has been a problem: the grammar has not been defined in the documents. Thus, the evaluation order and the semantics have been left ambiguous. The problem is fixed in the current version 1.4.

Rules

A single symbol “$” plays the role in Rit. With the “$”, Rit has only five syntax rules.

Dollar “$” out of above rules is shown as it is.

Rc comment continues up to end of line in accordance with Rc grammar. In parsing code area, Rit skips Rc string as well as “{ }” nest of Rc block.

An additional rank rule is required to evaluate the effects of “$”:

  1. }$
  2. $$...$
  3. $NL , ${ , $var

Internals

Consider the case that we have two or more code areas as illustrated in Fig.1. These areas must be processed by the same Rc so that commands in these areas can affect to succeeding areas.

fig2

Fig.1: Structure of Rit text

Fig.2 shows how Rit text is processed by Rit and Rc.

fig1

Fig.2: Data flow

In this figure, “Rit Text” is a text with embedded Rc code. “Stdin” and “Stdout” are standard input and standard output respectively. Data flow from “Stdin” to “Rc” comes from commands that read data from “Stdin”. Communication channels between Rit and Rc is pipes: solid arrow is a named pipe and dashed arrow is a regular pipe. The former is used for passing code to Rc, and the latter, for synchronization.

Usage

Synopsis

	rit [-Dbces] [file [arg ...]] 

Description

Rit reads a file in the first argument of the command. Following the file “args” may be given that are passed to Rc so that Rc can get them as arguments. If no argument is given, Rit reads “stdin”.
File name “.” is special. That means standard input. The “.” is provided so that Rit can pass arguments to embedded code.

Reading data from “stdin” will not work if embedded code also read data from “stdin”.

Options

Examples

The use of Rit is almost trivial. Examples listed below are limited to the ones that are useful to understand the behavior of Rit.

Command execution and newline control

You can write Rc script in Rc code area.
For example

Date: ${date}

will produce

Date: Thu Dec 23 10:17:10 JST 2004

Note that we have two subsequent NLs: one from “date” command and another from NL in the text. In case that we need to suppress NL from a command, we have “}$”:

${date} continues nest line
${date}$ stays same line.

then the result will be:

Thu Dec 23 10:17:10 JST 2004
 continues nest line
Thu Dec 23 10:17:10 JST 2004 stays same line.

Empty command

Be careful with the example:

${date;}$ is equivalent to ${date}.

will result in

Thu Dec 23 10:17:10 JST 2004
 is equivalent to Thu Dec 23 10:17:10 JST 2004
.

Why the NL in the output of “date;” is not suppresse? This is because the last command in “date;” is empty. Note that “}$” operates to the last command.

Embedded shell variables

Let “alice” be assigned to a variable “user”, and Rit text be

User: $user
This is equivalent to
User: ${echo -n $user}

then the above three lines are converted to:

User: alice
This is equivalent to
User: alice

Newline escape

A dollar at the end of line is NL escape. Example:

This line has NL escape. $
same line.

will be converted to:

This line has NL escape. same line.

Most rc commands produce NL at the end. We can avoid redundant NL by putting “$” after “}” and/or before NL:

${pwd}$
this line will be next of pwd line.
${pwd}$$
this line stays in the same pwd line.

The result is

/usr/arisawa/src/pegasus-2.1/rit
this line will be next of pwd line.
/usr/arisawa/src/pegasus-2.1/ritthis line stays in the same pwd line.

Multi-lines in code areas

Rit has full functionality with rc. For example Rit allows multi-line script:

${
book='Alice in Wonder Land'
}$
${
echo -n 'echo test of multi-line:
line1: Carrol''s book:
line2: '$book'
line3: and we can use { and } in rc strings'}
Back slash newline escape in Rc command will work:
${echo -n one \
two}

These lines will be converted to:

echo test of multi-line:
line1: Carrol's book:
line2: Alice in Wonder Land
line3: and we can use { and } in rc strings
Back slash newline escape in Rc command will work:
one two

But be careful with the following example:

${
echo alice
}$
will produce one empty line.

results in

alice
will produce one empty line.

because there exists an empty command at the left side of “}$”.

Comment in Rc code

Rc comment continues up to NL in accordance with Rc grammar. Therefore example

${# This is a comment up to NL } this is also a part of comment
# this is also a comment
} # not a comment
${# comment line1 terminated by Rc NL escape\
continued comment line
} # not a comment
$${# This isn't a comment but a part of text }

will produce

 # not a comment
 # not a comment
${# This isn't a comment but a part of text }

Sequence of dollars

If a sequence of more than one dollars are appeared, then one dollar is simply discarded. Thus “$$$$home” is converted to “$$$home” and “$$$${not a rc script}” is converted to “$$${not a rc script}”. Dollars “$$$$” at the end of line is not a NL escape. That is just converted to “$$$”.

Argument variable $0, $1, $2, ...

Among $0, $1, $2, ..., variable “$0” is special. This is a file name currently processed. Remaining “$1”, “$2”, ... are arguments.

For example, let a file “foo” be

$0
$1
$2

then we have

term% rit foo alice bob
foo
alice
bob
term%

Rit executable

In most cases, Rit will be used in executable file. Then the meaning “$0”, “$1”, ... are shown by the following example:

term% cat>bar
#!/bin/rit -s
$0
$1
$2
term% chmod 755 bar
term% bar alice bob
./bar
alice
bob
term%

For the case we want only base name of “$0”, we have “-b” option:

term% cat>bar
#!/bin/rit -bs
$0
$1
$2
term% chmod 755 bar
term% bar alice bob
bar
alice
bob
term%

Termination

Rc block

	${echo exit 'some message'>[1=2];exit} 

will terminate Rit. Rc function “quit” is predefined in Rit:

	fn quit {echo exit $1 >[1=2];exit} 

Simple “exit” does not terminate Rit but next ${ } block will terminate Rit because of error.
For example

	${exit} ${} 

will terminate Rit.

Example 1:

term% rit
${quit}
term% echo $status

term%

Example 2:

term% rit
${quit abcd}
term% echo $status
rit 619: abcd
term%

where the number 619 is process ID.

Known Bugs

Echo -n

Use of “/bin/echo -n” can make a problem which comes from “0 byte write problem to a pipe”: let “foo” be any executable, then it is not guaranteed that the following two commands

	foo > bar 

and

	foo | cat > bar 

produce same “bar”.

Avoiding this problem, Rc function “echo” is predefined internally as

fn echo {
        if(~ $1 '-n'){
                shift
                if(~ $"* ?*)/bin/echo -n $*
        }
        if not /bin/echo $*
}

Last command with no output

Due to implementation of Rit

	${name=alice}$ echo $name 

does not produce “alice”. Assignment just before “}$” is not only no effect in suppressing NL but also can terminate Rit in certain conditions. Write instead

	${name=alice} $name 

or

	${name=alice;}$ $name 

Assignment is merely an example. Except for empty command, any command with no output has same problem.

Discussion

Backward operation “}$” is powerful. The operation will be required as long as Rit claims to be “general-purpose”. However backward operation makes things complicated. As far as HTML is concerned, “}$” is over-specification, because HTML is insensitive to NL.

As a matter of fact, I have no experience of writing web pages using “${ ... }” nor “${ ... }$” except in the syntax:

${
...
}$

where “...” is Rc code. In this syntax, “}$” can be replaced by “}” but I prefer “}$” because of clarity.

Clean porting of Rit to UNIX might be difficult because

References

[1] Rit source code
Kenji Arisawa
http://plan9.aichi-u.ac.jp/netlib/cmd/rit/

[2] Rc — The Plan 9 Shell
Tom Duff
http://plan9.bell-labs.com/sys/doc/rc.html

[3] Rc souce code
http://cm.bell-labs.com/sources/plan9/sys/src/cmd/rc/

[4] What is PHP?
http://jp.php.net/manual/en/introduction.php

[5] JavaServer Pages Technology
http://java.sun.com/products/jsp/

[6] Welcome to Mason
http://www.masonhq.com/

[7] Using The Embedded Perl Interpreter
http://nagios.sourceforge.net/docs/2_0/embeddedperl.html

[8] EmPy
http://www.alcyone.com/pyos/empy/

[9] Spyce
http://spyce.sourceforge.net/