DynaWeb: Interfacing large SGML repositories and the WWW

Gavin Thomas Nicol, Electronic Book Technologies, Japan


Abstract

Many companies are now establishing a presence the World Wide Web, and are facing the problem of how to make their data available in an efficient, cost effective, and presentable manner. For large documents in non-HTML formats, the traditional approach has been to convert the data to a large number of small HTML pages. These pages are then made available on the WWW, however this process results in lost information fidelity, and increased costs due to double-handling. DynaWeb is an HTTP 1.0 compatible server and CGI script that performs the conversion, and fragmentation at runtime, and uses the very same data used for publishing to other media. The rationale for this is that it dramatically simplifies the information management process, and thereby reduces the costs of publishing on the Internet. This paper discusses the design of DynaWeb, and the concepts behind it.


Introduction

The World Wide Web has enjoyed explosive growth over the last few years. There are many reasons for the success, among which, the very low cost of entry plays a role. Browsers are free, or free for non-commercial use, free servers are available, and installation is not overly difficult for anyone with reasonable computing skills. HTML, the lingua franca of the World Wide Web, is likewise simple to learn (partly due to it's own simplicity). As such, almost anyone with a reasonable level of computing know-how, can either publish, or provide data within the World Wide Web. In addition, modern browsers lower the cost of entry for those not familiar with the traditional text-based Internet tools (ftp, telnet, etc); users can just point and click to get what they want (if they can find it). The above is a remarkable accomplishment: individual users have never had an easier way to creat, distribute, and consume information, but at the same time, the very simplicity is an achilles heel. The World Wide Web's is very much biased toward small-scale publishing using HTTP and HTML.


The Implicit Assumptions

While the initial vision of the World Wide Web was far grander, the current World Wide Web is largely a producer-consumer architecture. As part of the general mentality, there are a number of implicit assumptions made:

The URL will point to either a file, or a CGI script.
To date, in most cases where the URL did not point to a file, it did point to a CGI script, possibly a gateway to another program. It can be argued that CGI is a double-edged sword, because, despite it's convenience, it can be inefficient, among other problems.
The browser will access files, and files will be small.
For most individual publishing efforts, the volume of data will generally not be large, and HTML pages suffice. However maintaining large amounts of data as a myriad small files with hyperlinks between them is a nightmare. Many publishers have multi-megabyte books they would like to put online, but hesitate to do so using HTML.
The file will be in a data format the browser understands.
This is obviously false for a great deal of legacy data, which could be in any one of a huge number of formats. In addition, partly due to the simplicity of HTML, and also for ease of maintenance, data is generally left in it's legacy format, and converted to HTML, if it is ever published on the WWW.

smoke fetish archive Avril lavigne pussy genie in a string bikinie cast Girls fuck horses stories pornstars xxx babes Nikki schieler ziering nude dickies jeans Skinny teen bitches hentai sex videos free Little girls sex nude pot Boob tube High school musical vanessa nude pics free artistic nude photos pussy prowler Gayforit watch free lesbian videos free teen celebs nude Bikini dare galleries spiritual sport fucking Big and beautiful porn stars
faked nude Sleep nude girls nature nude video Catfight nude nude asian sluts nude ladies pussy Sara roemer nude asian girls non nude School girls nude pics gay nude workout Nude nudist pamela anderson playboy nude pics Kathrine heigel nude Free nude hentai fairies nude swedish blonde nude Nude celebrity videos for free nude celebrities miley cyrus nude south african men Free hardcore nude nude cellphone pictures Bollywood nude boobs
blowjobs gallery Jim hardick free porn videos no credit card Creatine sex milfporn star aluminium essex Free video sex positions ultrapasswords xxx Xxx teen britney spears blowjob video Facts teenagers curfews natural hairy pussy Amature women Edwin carungay fuckyourtube sexproadventures Free kinky sex tips rave sex porn lyrics sexy back Better than sex cake recipe final fantasy rikku xxx Paris hilton blowjob
free yno sex video 3d young art sex phone web cam sex Amature woman sex party free home-made sex clips young sex in america Free dirty sex pictures best sfrican sex movies He she sex pics picts of amature sex Julie michaels sex scene bible view on sex Sex tv tv show Extreme insertion sex 6 fee animal sex vids sex girls piss tube Thai pussy sex porn sex 3d fantasy pics sex mature woman jokes Jeremiah birthday sex bio tulsa police sex registration Sex vedeo stream chat
independent sex scenes Racist daughter sex clearanced sex toys K9 sex clips britney sex movies black sex squirt Awsome hard sex manson sex onstage Nimpho sex classifieds sex offenders index Nomid animal sex hardcore lezbo sex Oral sex possitions Out sex videos sly fox sex famos toon sex Only ebony sex anette dawn sex extent sex pill Mature hairy sex asian sex french Kim kardishan sex
education research group Ari banerjee yankee group ancestry group Randy orton group free group sex porno group insurances Galleon group hedge fund rubber fab technologies group Attorneys group group b infection Risk retention group insurance the rules support group Green resources group Group dynamics team r46b group high five amateurs group Amazing group sex on demand color group lesbian group gallery Campy centaur group accept group Group of deer is called
fucking machine xxx College sex xxx chobits xxx Iran xxx sexo xxx enanas collection xxx Eve angel xxx pork xxx Older women xxx download free psp xxx Xxx sluts videos swingers xxx free Free bi xxx Photos xxx free harecore xxx xxx porn passwords Rapes xxx xxx adult dvd xxx gratis con putas Web site xxx free xxx mangas Alena seredova xxx
ball dragon porn video The thrills music video woman squirting free video Roma video card e pci video mtv jam video Apartment mikes picture video paris hilton video stills Big cock homemade movie council meeting video Studio telescope video converter ipod ora video Victoria pink videos Uk movies cussler movie new video releases 2005 Conferencing live video violence video games children tasha nelson video Rv video camera movie graber Adam sandler secret video
teacher sex crazydumper Sex health video marriage with sex Celebrity sex viceos busty office sex shove bull sex Football sex rio free sex shots Consensual submission sex free sex gemes Mauritius sex site hardcore sex mp3 Barbarella sex machine Hunting sex jessica alien sex gaems free sex xxx Muscular sex pictures ass booty sex dogpound group sex Anail sex videos vitamins before sex Brewster sex stories
asians sex Haveing sex with a man lesbien sex xxx Hypno girl sex arabic sex 9356 biker girls sex Guilty gear sex mature free sex tube Nude girls having sex with boys ray j and kim kardashian full sex tape for free Cyber sex forum what is angry sex Sex while pregnant pictures When can i have sex and not get pregnant home made amateur sex tapes dog sex beastality Sex games online for women clips cartoon sex taboo charming mother sex Girl sex pose hardcore gothic sex Best sex teacher
love sex relatioships Historical books sex pegging sex literature Sex story community sex bites torrent long sex trailors Gonzo rawr sex carrie bradshaw sex Voung teen sex home sex stream Kinky sex forum savvanah gold sex Anal sex wide Crushing for sex comic sex jokes mermaid sex videos Pet sex foram ali sheffield sex cancer sex partners Calforina sex retreat mini teens sex Anal sex cum
victorian xxx Xxx sci fi sexy photos xxx Xxx video play xxx babe videos animail xxx All xxx tube tilf 2 xxx Xxx puzzle black porno xxx 3gp xxx wap videos streaming xxx Free xxx moves Muscle gay xxx free gothic xxx video naruto xxx Xxx pass free best xxx movie 2008 xxx dog clips Xxx free e cards xxx porn full videos Xxx stone
porn movie theaters Morgan lane porn catherine porn Porn mom son sex mommy and daddy porn kasumi porn Find porn torrents rumania porn Xxx pictures porn black porn videos free Discipline porn biggest penis porn Littel girl porn Porn leg warmers tiny tits porn movies top 10 porn clips Free lovemaking porn homemade mexican porn vanessa raia porn Muslim porn sex free high definition porn streaming James nichols gay porn
fuck me gay Vulva fuck sexy fuck movie Mother lets son fuck her fuck you mom and dad mommy fuck son Father son fuck girl porn to fuck Fuck off letter fuck my boob Megaupload fuck i fuck my mother inlaw Doggy style fuck videos Woman looking to fuck shemales fuck girls movies kama sutra fuck Fuck you love mother daughter fuck boyfriend fuck church Dog fuck woman movies the fuck buttons Man fuck his dog
Blowjob And Cum Swallow mom giving son blowjob Preggo Blowjob free blowjob compilations blowjob mature Blowjob Guys blowjob fantasies 18 Avatar Blowjob sister gave me a blowjob Tickling Blowjob blowjob at school Hentai Porn Blowjob Fake Blowjob girl pukes during blowjob blowjob tryouts Guys Blowjob japanese girl giving blowjob most famous blowjob Gay Horse Blowjob double blowjob vids Blowjob Outdoor
Youngest Girl Porn Ever plus size sexy school girl Flavor Flav Girl Poops all girl sex videos girl porche Baby Girl I Want You gossip girl on tv com Hey Hey Baby Will You Be My Girl naked girl shitting Little Girl Photos ghetto black girl Go Go Girl Adult Girl Psp Theme girl for sale on ebay pin up girl hats Little Monster Girl naked teen girl pics black girl actress Sleeping Girl Gets Raped how to approach a girl online Girl And Girl Haveing Sex
Ink bitch webbie gutta bitch Lyrics to five star bitch bitch in french Badd bitch quotes cant trust no bitch Bitch asian im a pretty bitch Kristen stewart is a bitch a bitch slap G unit fat bitch Shut up bitch download im in san diego bitch cock hungry bitch Teeh fuck the bitch is kristen stewart a bitch bitch milfs Lyrics to bitch by meredith brooks foot fetish bitch Shake that ass bitch and let
paris hilton beach sex Cocksucker snake girls xxx Nude booty poppin little teens pics most extreme porn list Audience analysis heather locklear nude Porn star named madison lolita preteens Cheyanne bride black cock joelle amateur Nude christina aguilera Nice nude teen photo gallery hot cab mature sex sites Fucked by my dog mpegs massive tits men fucking boys Swedish porn galleries amateur nudes Sexy superheroes
bbw nude women Nude pussy cum naomi nude Nude asian americans courtney smith nude sienna guillory nude Girls basketball nude kate bosworth nude fakes Amateur wife nude photos ukraine nude teen Big black ass nude kiera knightley nude pics Nude russians Sleep nude chris brown rihanna nude photos pic of nude girls Bollywood nude images sexy and nude pics free nude college girl videos Nude dads and daughters ameture nude pictures Serena williams nude pix
1st Anal Sex what is an anal prolapse Types Of Anal Sex gay anal sex technique gay anal fisting videos Why Does Anal Sex Feel Good video double anal Lesbian Teens Anal largest anal dildo Lesbian Anal Toy anal sex poop videos Anal Hidden Cam Amateur Interracial Anal amy amour anal how to anal intercourse Anal Sex Condoms eyaculacion anal free anal streaming Anne Hathaway Loves Anal mini anal Unnatural Anal Insertions
Anal Guest free full anal movies Manual Anal 1st anal video shits herself anal Couple Anal Sex roxy renolds anal Sara Jay First Anal Scene anal destruction casedy Como Hacer El Sexo Anal anal sex effects Anal Cancer Blog Anal Toys Lesbian ice la fox anal scene lesbian anal vid Rough Anal Sex Clips wet anal double anal sex movie Palin Anal really painful anal Shitty Anal Fuck
rodox sex mpg Shower sex how penis breasts sex Sex malam pertama random sex videos exsplicit sex videos Sex lubrication silicone i post sex Sex fat chick celebriies having sex Adult sex animations sex and motorcycles Adult sex therapy Laura cover sex fucking having sex sex vacation caribbean Pool sex orgasm women barbershop sex office sex gay Secretaire office sex black sex vod Rainbow mika sex
Rock cock jock cock robin when your Wife big cock huge cock free pics Mature sucking black cock cock docking clips Hardcore riding cock cock sucking whores Fuck you cock sucker cock fighting rules Big cock hardcore Hubby loans to black cock milf sucking young cock two cock in pussy Cock sucker t shirt two cock fucking cock pierced Tila tequila suck cock largest cock videos White teen black cock
miss teen usa south carolina Fucking boobs thumbnails free videos of gay black me gandbang Senior sex trailer sophie monk nude nude music videos Britney spears porn video maggie grace nude Preteen bikini movies xxx Sexy pamela anderson vanessa new nude photos Aisha tyler nude pics Gametophyte produces male female sex mate plants toothless blowjob monthly membership streaming porn Pinkpanteens preteens in thongs lingerie nudecollege students Fat mature sex teen monologues Ebony muff diving
sex with hookers Free jaybee sex sex with redheads Cartoons about sex usa sex forum retarted girls sex Photo booth sex gay virgin sex Female sex chromosome sex teen candy Teenage sex story sex feet tingle Celebrity sex sces Flex girl sex lesbian sex galerii work at sex Rough sex free roug gangbang sex hypnosis sex best Sex trek 6 teens wating sex Ssecretary sex videos
1st Anal Sex what is an anal prolapse Types Of Anal Sex gay anal sex technique gay anal fisting videos Why Does Anal Sex Feel Good video double anal Lesbian Teens Anal largest anal dildo Lesbian Anal Toy anal sex poop videos Anal Hidden Cam Amateur Interracial Anal amy amour anal how to anal intercourse Anal Sex Condoms eyaculacion anal free anal streaming Anne Hathaway Loves Anal mini anal Unnatural Anal Insertions
It is widely recognised that until better tools for the creation and maintenance of HTML arrive (and possibly not even then), that it seldom makes sense to work in native HTML for large amounts of data. Rather, most sites use whatever editing or desktop publishing environment they have installed, and then rely on tools to convert the data to HTML for publishing on the WWW. Verifying the output of such programs can be both time consuming and error-prone, despite the best efforts of tool writers. In such cases, where the actual information management is taking place in a format other than HTML, WWW publishing becomes an additional step in an already complex process.

As data sizes increase, the costs associated with maintenance increase, especially if the data is frequently updated. This is a hidden, and often overlooked cost associated with Web publishing. Indeed, the combination of software and data maintenance, could easily be more costly in the short term, and will almost certainly be more costly in the long term, than actually setting up the initial WWW server (including costs for hardware). It is becoming common for a company to have fulltime staff working solely on the care and feeding of the company Web site (to which the situations vacant areas bear adequate testimony). The thought "There must be an easier way." is probably at the fore of many peoples' minds.

DynaWeb is designed with a set of assumptions, and goals, almost completely quite different to those found in other WWW servers:


DynaWeb Goals

EBT is widely recognised as one of the leading suppliers of SGML-based online publishing tools. The DynaText product has been used in a number of industries to publish large SGML documents electronically. Some of DynaText's desireable features are:

With the advent of the WWW, it seemed desireable to provide EBT's customers with the tools required for publishing on the WWW, in addition to disk based publishing, and to bring these desireable features along in the process. The target set was to allow publishers to publish using the same techniques, and to bring as much DynaText functionality to the WWW as possible. This led to some smaller individual goals:


Basic architecture

The basic architecture of the current DynaWeb server is the common fork and exec architecture, in which the server proper accepts connections, forks, and then executes an engine for processing requests. This architecture was selected primarily for it's simplicity, and flexibility during the development cycle. In addition, from early in the project, there was thought of having a CGI script version of DynaWeb, and this architecture maximises code sharing between the two different versions, though at some expense in raw performance. DynaWeb is largely HTTPD compatible, so it can quite obviously handle arbitrary data types in the same way that HTTPD does (via MIME-type mapping) in addition to allowing access to DynaText books. Like most other HTTP servers, the exact processing performed is largely decided by the HTTP method invoked, and the URL. This architecture is shown in Figure 1.


Figure 1: The General Architecture Of DynaWeb


DynaWeb URL's

For a server like DynaWeb, a certain amount of state is required, but HTTP is a stateless protocol. So for this and other reasons, the commonly understood semantics attached to parts of a URL have been expanded.

Sub-document addressing

DynaWeb needs to address parts of a document in order to be able to break it into fragments. The WWW defines no standard way to do this, so DynaWeb uses the addresses of the elements in a document. The resulting URL's look, for the most part, like normal filenames, making it easier for people accustomed to filenames to understand, but harder for the server, because some overlap of namespaces occurs. Such addresses can only occur in the context of DynaText book accesses, so this is generally not a problem. The URL syntaxes DynaWeb understands are:

File Access
http://www.ebt.com/path
This is the same as the normal file access URL's seen elsewhere.
CGI Script Access
http://www.ebt.com/keyword/path
When the server sees keyword it executes a CGI script, as found in other HTTP browsers.
Sub-document addressing
http://www.ebt.com/collection/book/eid
This is used to access parts of DynaText books. The collection part of the path could be considered a library, and a book a book within it. The eid is an address for an SGML element.

Early versions of DynaWeb also supported 2 other syntaxes taken from the TEI guidelines:

Child Number Path
http://www.ebt.com/n/n/n/n...
With this naming scheme, an element is addressed by descending from the root of the SGML document, and taking the nth child as the new parent until the path has ben completely traversed. The resulting parent is the target element.
Child Type and Occurence Path
http://www.ebt.com/gi[=x]/gi[=x]...
This is similar to the above method, except that it goes by child type, represented by gi in the above, which is possibly qualified by an occurence indicator (ie. specifying which child of that type). Again traversal starts at the root of the SGML document.

However, these were found to be unneccessary as the algorithms for generating navigational aids improved. They are still valuable as a standard means of accessing heirarchically structured data, however.

Forms Data as an Environment

The current method of sending data from forms to a server is to append the (possibly encoded) name+value pairs after the end of the URL, following a question mark. This area is also overloaded by being where keywords for searches are specified, and where data from ISMAP images is transferred. This area can also be used to manage state.

DynaWeb looks at the name+value pairs in much the same way many applications look at environment variables. User-specified options, and server-generated state, are transferred from the server to the client in the links generated by the last request. When the client activates one of the links, the environment data will be sent to the server, starting the cycle once more. An example of how this is used can be found in DynaWeb's named-stylesheet support: the stylesheet name is passed back and forth betwen client and server. Apart from these semantics, and the URL extensions, DynaWeb should appear to clients exactly like any other typical HTTP server.


The DynaWeb Publishing Process

As mentioned earlier, one of the stated goals for DynaWeb was to make it as simple as possible for EBT's customers to publish to the WWW. To a very large degree this has been accomplshed.

In order to produce a DynaText book, one first runs an indexer/compiler upon validated SGML source, which produces data files containing indexes, and associated data. Once this is accomplished, one then uses either the WYSIWYG stylesheet editor, or a text editor, to create sets of stylesheets controlling the display of text, TOC's, the behaviour of hyperlinks, and other such things. The process for DynaWeb is exactly the same and more importantly, the data files produced in the DynaText publishing process, can also be used for DynaWeb publishing. The only thing one needs to do to put a DynaText book into DynaWeb is to create new stylesheets.

One thing worth emphasizing is that the size of the DynaText books is irrelevant: DynaWeb will fragment them at runtime. Also, hyperlinks are not coded by hand, but rather generated at runtime by DynaWeb, based on entries in stylesheets. As such, no individual link validation is required by the document maintenance people; rather, they simply make sure their stylesheets are correct, and from then on, any books conforming to the same DTD will be able to make use of the same stylesheets. For example, if a publisher uses the Docbook DTD exclusively, then they need only write the stylesheets once, and update them as needed. Once they stylesheets for CDROM and WWW publishing have been created, the publisher can then produce DynaText books, and to a large degree, not think about the distribution media at all.


The Conversion Process

SGML documents are inherently heirarchical: they consist of a tree of elements, which may, or may not have attributes associated with them. Before looking at the actual conversion process, let's look at what is meant by document structure, and compare some typical structural markup defined using SGML, and HTML (also defined using SGML). Here is a small sample document using structural markup.


  <DOCUMENT>
    <TITLE>DynaWeb: Interfacing large SGML...</>
    <ABSTRACT>Many companies are now ...</>
    <CHAPTER>
      <TITLE>Introduction</>
      <PARA>The World Wide Web has enjoyed...</>
      <SECTION>
        <TITLE>The Implicit Assumptions</>
        <PARA>While the initial vision...
          <TERM.LIST>
            <TERM>The URL will point to either...</>
	    <EXPLANATION>To date, in most cases where...</>
	    <TERM>The file will be in a format...</>
	    <EXPLANATION>This is obviously false for...</>
	  </TERM.LIST>
	</PARA>
      </SECTION>
      <SECTION>
        <TITLE>DynaWeb URL's</>
        <PARA>For a server like DynaWeb...</>
        <SUBSECTION>
          <TITLE>Sub-document Addressing</TITLE>
	  <PARA>DynaWeb needs to address...</>
	</SUBSECTION>
      </SECTION>
    </CHAPTER>
  </DOCUMENT>

The following figure shows the heirarchical nature of the document, by showing each element as a node in a tree. Note the special element. This represents a psuedo-element, or one which exists by implication.



Figure 2: The Tree Structure of the Sample SGML Document


In order for HTML-based browsers to display the document in a pleasing manner, the above document needs to be translated into a corresponding HTML document, such as the one below.


  <HTML>
  <H1>DynaWeb: Interfacing large SGML...</H1>
  <H2>Abstract</H2>
  <BLOCKQUOTE>Many companies are now ...</BLOCKQUOTE>
  <H2>Introduction</H2>
  <P>The World Wide Web has enjoyed...</P>
  <H3>The Implicit Assumptions</H3>
  <P>While the initial vision...</P>
  <DL>
    <DT>The URL will point to either...</DT>
    <DD>To date, in most cases where...</DD>
    <DT>The file will be in a format...</DT>
    <DD>This is obviously false for...</DD>
  </DL>
  </P>
  <H3>DynaWeb URL's</H3>
  <P>For a server like DynaWeb...</P>
  <H4>Sub-document Addressing</H4>
  <P>DynaWeb needs to address...</P>
  </HTML>

The above HTML file, when treated as SGML (as it should be), would have the following tree structure.



Figure 3: Tree of the HTML representation of the sample SGML.

It is immediately obvious that the HTML representation has far less structural depth than the native SGML representation. This is one reason why many people in the SGML field dislike the HTML DTD: they are used to far more structure (others abhor it).

The job of converting SGML to HTML is primarily that of converting one tree into another. Arbitrary SGML to SGML conversion is possible, in the same way that arbitrary conversion between programming languages is possible. However, like programming language conversion, there are some cases which cannot be handled elegantly, simply due to the grammars being too different. The HTML DTD has less structural depth, and is overall, much simpler than most other SGML DTD's. This simplifies the conversion task a great deal, just as translating C into assembler represents a far simpler task than translating C into Ada. It should be noted that typesetting SGML can also be regarded as a translation process (SGML to Postscript).

There are many ways to perform the actual translation: some systems are driven by the events generated by the SGML parser, while other manipulate trees directly. Most use some form of scripting language to associate processing with elements, or in other words stylesheets. Hard-coded formatting is generally frowned upon in SGML applications.

DynaText books can be regarded as a static object oriented database of sorts: in them, the structure of the SGML, as well as the text is stored. It is trivial to traverse the tree and regenerate a valid SGML representation of the original SGML data (though some things, like entity references, will be lost in some cases). In addition, the DynaText system already uses stylesheets extensively for online formatting, for printing, for TOC creation, and for hyperlink behaviour. The stylesheets in DynaText define a set of properties to be associated with each node, which may be set by evaluating scripts written in the internal DynaText scripting language at runtime As such, the DynaText stylesheet language is quite well suited to the SGML to HTML conversion task. While it is quite possible to simply use a tag mapping table (ie. When this tag is seen, generate that tag.), the DynaText stylesheet mechanism brings an extra level of sophistication to the job at hand.

SGML to HTML conversion is accomplished by using the #TEXT-BEFORE and #TEXT-AFTER properties in the DynaText stylesheet language. These allow the stylesheet writer to add text before and after the element they are associated with, respectively. By setting these to the HTML start and end tags desired, conversion can be accomplished. Indeed, with the WYSIWYG stylesheet editor, it is possible so actually see the tags as you define them. This is made even simpler by the support for stylesheet groups, which makes formatting an element as simple as adding it to a group. EBT provides definitions for some groups to be used in HTML conversion.

One important capability of DynaWeb is the ability to use multiple named stylesheets. As HTML and browsers are evolving very rapidly, the problem of supporting multiple versions of one's document raises it's head. In most normal servers, this requires multiple versions of files to be managed (one supporting HTML 2.0 without tables, another HTML 2.0 with tables, and another for HTML 3.0). In DynaWeb, one's data remains unchanged, and instead, one uses multiple stylesheet versions, representing a much more manageable task.

Of course, the DynaText stylesheet language was not designed for this application, so there are some limitations. In particular, converting between widely disparate table models can require quite complex scripts to be written, but as HTML matures, conversion of such things should become easier (ie. the set of common features in the grammar for HTML, and other SGML DTD's will become larger).


Navigational Aids

This section discusses the navigational aids found within DynaWeb. The most important thing to remember is that these aids are generated automatically from the combination of SGML structure and stylesheets. This represents a significant advance over most current WWW publishing systems.

Auto-generated TOC's

One of the early requirements for DynaWeb was that it should, as far as possible, offer a similar level of functionality, and a similar interface to, DynaText. DynaText has automatically-generated, expandable and collapsible TOC's, which also provide feedback on search results. In DynaText, the TOC is normally displayed along with the fulltext view, which scrolls to the position associated with a TOC entry being selected. However, almost all WWW browsers are restricted to single windows, and do not allow communication between windows. As such, the TOC feature had to be implemented as a standalone WWW page. Like DynaText, the contents, and to a certain degree the look, of TOC's, is controlled by stylesheets.

The automatically-generated TOC's have plus or minus buttons to the left of the title for the TOC entry. When a user clicks on a button, a request is sent to the server, telling it to regenerate the TOC with that section expanded or collapsed. Once no more TOC expansion can occur, selecting the TOC entry will bring up a page containing actual text data.

TOC's provide an excellent interface to the runtime chunking that DynaWeb perfoms, but a very difficult design decision is when they should be generated. If DynaWeb sees a URL, which accesses a DynaText book, and if that URL ends with a ".toc" extension, it will generate a TOC. If the URL does not end with such an extension, then the size of the data below the target element is used to decide whether to generate a TOC. One of the configuration parameters specifies a desired limit on data sent to clients. If the size fo the data below the target element exceeds that size, and then if a TOC can be generated, one will be, otherwise the data is sent to the client (possibly after prompting the user, or broken into pageable chunks).

Next and Previous Buttons

DynaWeb attaches navigational hints to text "pages" as well. At the top and bottom, buttons are attached that allow the user to enter into page flipping mode. Selecting the forward button causes the next page to be retrieved, and selecting the back arrow selects the previous page. A button in the center causes a TOC to be generated. This fragmentation occurs automatically, with boundaries being decided by SGML document structure, and TOC stylesheets. The meaning of page is equivalent to the meaning "logical block of data".

Auto-generated links to other data

In addition to these automatically-generated aids, the standard DynaText hyperlinking facilities work as well. In the stylesheets, one can specify links to graphics, links to other books, query links, and more. For example, if your SGML source has a <FIGURE> element:

   <FIGURE NAME="widget.gif" TITLE="The Widget">

Then one would use the following style definition:

   <style name="ART.RASTER">
        <script>        ebt-raster filename=@(name) title="@(title)" </>
        <icon-type>     raster  </>
   </style>

causing all <FIGURE> elements to be displayed as an icon, which when selected, would result in the image named by the NAME attribute to be retrieved. However, if one wanted inline images, one would write:

   <style name="ART.RASTER">
        <inline>        raster filename=@(name) title="@(title)" </>
   </style>

causing all <FIGURE> elements to generate the code required to display graphics inline. Specifying both script and inline properties allows one to create hot images. Other kinds of behaviour are specified similarly.

The important thing to understand is that, again, after having defined such behaviour once, the stylesheets can be used for any book conforming to the same DTD, and links will be generated automatically.


Searching

Another of the great benefits of leaving the data in structured SGML can be found in DynaWeb's searching capabilities. Not only does DynaWeb support proximity, boolean, and other such queries, but it also support SGML-aware queries. For example, one can do the following:

   asimov inside <author>
   <author> containing asimov

to perform a search limited to text found within an <AUTHOR> tag (text within an author tag or it's children). DynaWeb also supports searches on attribute values, and other such things as well.

DynaText has it's own format for defining search forms, and these are translated to HTML forms at runtime, again providing for smooth interoperability between CDROM and WWW publishing. Search hits are reported via the TOC's, which display the number of hits per TOC entry, and also by highlighting within the actual text. It should be noted that searching is not limited to only books: queries can be made at almost level within a DynaWeb server, allowing exploratory querying of DynaWeb sites.


Discussion

To date, DynaWeb has been deployed at some major sites, including EBT's home page, and for the manuals area of Novell's WWW site. Initial feedback from customers proves that we have met all of our initial goals. Large scale publishing with DynaWeb is a pleasure compared to the traditional methods, and the time involved in both publishing, and maintenance is substantially reduced. For example, Novell published around 100,000 pages of documentation in a week, and another customer took a day to publish using DynaWeb, compared to the week spent previously in conversion to HTML. Performance of the current server is sufficient for most needs.

However, all was not smooth sailing. The fact that HTTP is a stateless protocol complicates the management of state in DynaWeb (including security) enormously. Also, the large behavioural differences in browsers presented a problem: the auto-generated HTML for things like the search sliver needed to be both legal, and understood by all tested browsers. This proved difficult to achieve. Many other such problems were encountered.

The use of TEI locators proved to be very valuable initially, but as development progressed, they became less so. However, the author believes they still have great potential as a standard way of accessing heirarchically structured databases. For example, they could be used to address parts of a VRML file, or an object oriented database, or even relational databases. They are certainly worth keeping in mind.

The author believes that systems such as DynaWeb represent the future of the WWW. HTML is unsuitable for large scale publishing, as is filesystem based management of documents. Neither of these technologies scale when multiple megabytes of data are being manipulated, nor when multiple media types, and multiple file formats need to be supported.

The author also believes that as the WWW evolves, it will become steadily more object oriented, to a point in the future when instead of just documents and replication, we will also have objects that we can combine together to create applications tied together via both replication and remote method invocation. Object location will steadily become something a user rarely need think about.

For DynaWeb, many enhancements are possible, even though the current product has delivered on its promises. Most of these enhancements are in the implementation, rather than in the overall system design. For example, it seems natural that at some point in the future, the static object oriented database be replaced by a true, large-scale, SGML document repository, and for a multi-threaded architecture to be used.


Bibliography

The SGML Handbook
Oxford University Press
Written by Charles Goldfarb
ISBN 0-19-853737-9

The Text Encoding Initiative Home Page
http://etext.virginia.edu/TEI.html

The Harvest Document Managenent System
http://rd.cs.colorado.edu/harvest/

A Two-view Document Editor With User Definable Document Structure
Digital Systems Research Center report #33
Written by Kenneth P. Brooks
http://www.research.digital.com/SRC/home.html

MIME (Multipurpose Internet Mail Extensions) Part 1
N. Borenstein and N. Freed
http://ds.internic.net/rfc/rfc1521.ps

MIME (Multipurpose Internet Mail Extensions) Part 2
K. Moore
http://ds.internic.net/rfc/rfc1522.txt

Hypertext Transfer Protocol -- HTTP/1.0
T. Berners-Lee, R. T. Fielding, H. Frystyk Nielsen
ftp://ds.internic.net/internet-drafts/draft-fielding-http-spec-01.txt



Gavin T. Nicol
Electronic Book Technologies, Japan
1-29-9 Tsurumaki, Setagaya-ku,
Tokyo 154,
Japan
Phone: +81-3-3230-3861
Fax: +81-3-3230-3863
Email: gtn@ebt.com http://www.ebt.com/


Brought to you by the letters P, S, G, M and L, and S and P.