Announce SiSU - publishing for e-documents, books, libraries, relational databases

Discussion in 'Ruby' started by Ralph Amissah, Jan 4, 2005.

  1. 20050104 SiSU is released


    Excuse the lengthy announcement, hubris and repetition.

    A fairly big day for me, I have worked on SiSU for several years, though
    only recently with its imminent release in mind...

    The focus of SiSU is simple and sparse markup requirements, (used single
    documents or large documents sets), to produce structured multiformat
    published text versions, with a common/shared citation system, and
    search possibilities that take advantage of this.

    Little time has been spent on the installation procedure. I would
    appreciate feedback from anyone who installs and tests SiSU on Linux and
    BSD (and OSX?) platforms. I anticipate there will be problems initially
    related to installation and setup, which I would be grateful for
    feedback on and, which I will be pleased to help with.

    Once past the install I would very much appreciate feedback generally
    and especially from Rubyists (as the text it is designed to work with is
    not a code or documentation, interest will not be developer specific,
    and may be limited), Librarians, Document Projects, and academic writers
    on aspects of interest.

    Additional syntax highlighters for SiSU markup would be extremely
    welcome, they don't need to be as complete as the vim highlighter. Emacs
    would obviously be nice, of much interest would be the ruby editors, and
    also less geeky text editors, as it is hoped that SiSU will eventually
    be used by non-coders.

    I expect some criticism for hubris, some OT opinions expressed here (and
    elsewhere), and possibly coding style which has evolved over the years,
    and which may not always have been consistently updated (also because of
    the lack of use of spaces, put that down to using an editor with
    excellent syntax highlighting and what I have come to be accustomed to,
    as a lone coder).

    This release will primarily be of interest to developers as the
    install/setup are hardly documented, (and assumes you have independently
    installed external programs that are taken advantage of such as
    Postgresql, have file permissions set and more), it is not tested across
    platforms. But if you are able to get it working it does do quite a bit.
    Paradoxically, though for documents it is not for programming
    documentation, and this will reduce its value to the same developers who
    might currently be able to use it.

    I ask much, there is no rush. (This is sadly be a fairly busy month for
    me, my response time is going to have to be slow.)

    I have enjoyed working on SiSU very much over a number of years, and am
    pleased with what it does and how it does them. I hope it is of use to

    Ready or not, here it is, as it (currently) is, enjoy,



    Well Wishes all for 2005,
    Ralph Amissah

    What is SiSU?

    (SiSU - simple, information structuring utility/universe)

    SiSU is an electronic publishing system and (hybrid) kind of document
    management system (for the documents that it generates), with its own
    unique set of features, including amongst many others, very simple
    markup; writing to the file system (for Internet, Intranet, or file
    serving, and including eg CD publication) and/or relational database; in
    multiple output formats (html, structured XML, LaTeX and pdf,
    postgresql), with a citation system that is common to all output types.

    SiSU is a (command line) text processing program that produces
    structured electronic documents from a simple marked up input file
    (using a markup syntax similar to smart ascii that I claim to be simpler
    than the most elementary html) in multiple output formats, from html,
    and structured XML, to pdf via LaTeX, and to streaming into relational
    databases (currently Postgresql), writing in a structured way to the
    file system or to a relational database, where it retains information on
    the documents structure.

    SiSU may be used either for individual documents or
    collections/libraries of published (as in finished and not subject to
    continuous change) documents. The type of documents it handles being
    primarily law (which can be quite diverse) and literature, some social
    sciences, (as opposed to maths, science, programming etc.) There are
    several samples available.

    Documents are marked up in "SiSU Syntax" in your favourite browser, and
    SiSU a command line driven batch processor is run against the marked up
    document(s) to produce the desired output(s).

    SiSU (once installed and set up) should be easy enough for anyone to
    use, (with a bit of additional documentation). The markup syntax is
    simple, and the commands are easy enough with interactive help. It would
    benefit greatly from additional syntax highlighters. (There are sample
    input documents from which various outputs can be generated).

    As a proof of concept the SiSU framework is in place, and many of the
    modules have been used professionally for several years. There are many
    more modules than the ones so far released, these have been held back
    either because they have not been properly maintained, having fallen
    into disuse, or because they are not generic enough in their current

    Information on SiSU is available at:

    Sample texts, and remember SiSU is not specifically for books:

    Possibly of greater interest to illustrate how different the
    possibilities this provides, is search:

    And the markup from which this is derived:

    SiSU provides

    [This is part of a fairly recent attempt to explain certain aspects of
    the project to a layman.]

    SiSU provides a way with minimal markup effort to have multiple output
    formats, taking advantage of some of the their strengths - vis. html,
    structured XML, pdf via LaTeX, and relational SQL databases, all of
    which are tied together using a common citation system.

    * simple markup (done once, makes automatically available the rest),[1]

    * possibility of adding semantic data to documents (currently the Dublin
    Core, though it would be easy to incorporate other, or alternative

    * multiple outputs - using industry standards, and taking advantage of
    the rather different strong points of each (html, structured XML, pdf
    via LaTeX, relational SQL database - currently Postgresql, retaining
    structural information)

    * a common citation system for all document outputs, including the
    relational database, searches being able to take taking advantage of the
    implications of the citation system (primarily the automatic consistent
    numbering of headings and paragraphs, in such a way that they can be
    used by and to reference content in all output types).

    There is a list of features of SiSU listed here:
    which I will tag on to the end of this document.

    The document contains sample input and output files (several places, but
    also here):

    The last thing to be done was a search front-end for the database, which
    I finally decided to buckle down to doing. The back-end has been in
    place for a number of years now, but this makes this feature a lot
    easier to demonstrate. Unfortunately I do not have that online - a link
    to images in its current form:

    which relates to what IBM for example found to be of particular interest
    early in the summer of 2004:
    [location may change as this document is updated]

    Some of those links will change with subsequent modifications to the
    text, it is best used for published works.

    There is much to browse generally, some of it is just fan material of
    other things technical that I have found useful.

    The document

    [1] e.g. marking up War and Peace (from a Gutenberg Project ascii text)
    is done in a little over an hour. Reduction in the effort required for
    the preparation of texts (XML for example buzzword of the industry is
    labour intensive and complicated, LaTeX is also a lot more complicated
    than SiSU markup syntax - they are more flexible, but do not provide the
    composite solution... single command building of documents and/or
    populating of a relational dataabse, while retaining structural


    [I have not glanced at other OS's for the purpose of
    development since 1999.]

    Developed and tested on Debian/Gnu/Linux Sid.

    Short summary of features


    (i) minimal markup requirement, (ii) single file marked up for multiple
    outputs, (iii) markup is simpler than html, (iv) the simple syntax is
    mnemonic, influenced by mail/messaging/wiki markup practices *(v)* human
    readable, and easily writable syntax, (vi) multiple outputs include
    amongst others: "html"; "pdf" via "LaTeX"; (structured) "XML"; sql -
    currently "PostgreSQL" (and sqlite); "ascii", (also "texinfo"), (vii)
    takes advantage of the strengths implicit in these very different output
    types, (e.g. LaTeX (professional document typesetting, easy conversion
    to pdf or Postscript); XML (in this case, structural representation);
    sql relational database (e.g. document search; representing constituent
    parts of documents based on their structure, headings, chapters,
    paragraphs as required; control of use) important enough to be given a
    heading of its own.), (viii) provides a common citation system for all
    outputs, (object citation numbering), all text objects (headings and
    paragraphs) are numbered identically, for citation purposes, in all
    outputs ("html", "pdf", sql etc.), (ix) use of Dublin Core and other
    meta-tags to permit the addition of some semantic information on
    documents, and making easy integration of rdf/rss feeds etc., (x)
    creates organised directory/file structure for (file-system) output,
    (xi) easily mapped with its clearly defined structure, with all text
    objects numbered, you know in advance where in each document output
    type, a bit of text will be found (eg. from an sql search, you know
    where to go to find the prepared "html" output or "pdf" etc.)... there
    is more, (xii) search of document sets, the relational database retains
    information on the document structure, and citation numbering makes it
    possible for example to present search matches as an index of documents
    and locations within the document where the match is found, (an image
    series added December 12th 2004 in the Chronology pages, somewhere
    around gives an idea of what is
    possible, I unfortunately do not have the hardware currently set up to
    demonstrate this dynamically on the www), (xiii) "word maps" rudimentary
    index, consisting of all the words in a document and their (text object)
    locations within the text, (xiv) very easily skinnable, document
    appearance on a project/site wide, directory wide, or document instance
    level easily controlled/changed, (xv) easy directory management and
    document associations, the document preparation (sub-)directory may be
    used to determine output (sub-)directory, the skin used, and the sql
    database used, (xvi) in many cases a regular expression may be used
    (once in the document header) to define all or part of a documents
    structure obviating or reducing the need to provide structural markup
    within the document, (xvii) is a batch processor for handling large
    document sets, ... though once generated they need not be re-generated,
    unless changes are made to the desired presentation of a particular
    output type, (xviii) possible to pre-process, which permits the easy
    creation of standard form documents, and templates/term-sheets, (xix)
    easy to add, modify, or have alternative syntax rules for input, should
    you need to, (xx) (future-proofing) extremely modular, (thanks in no
    small part to Ruby) another output format required, write another
    module.... , (xxi) (future-proofing) easy to update output formats (eg
    html, xhtml, latex/pdf produced can be updated in program and run
    against whole document set), (xxii) scalability, dependent on your
    file-system (in my case Reiserfs) and on the relational database chosen
    (currently Postgresql), and your hardware, (xxiii) a framework for
    adding further capability as required, (xxiv) tied to version control
    system, only code and marked up file need be backed up, to be sure of
    the much larger document set, (xxv) document management, (xxvi) use your
    favourite editor, syntax highlighting files for markup, primarily (g)vim
    so far.

    SiSU was developed in relation to legal documents, and so is strong
    across a wide variety of texts (law, literature...), though weak on
    formulae/statistics, it does handle images. An assumption has been
    document sets that are to be preserved and maintained over time (also a
    result of the legal text origin). SiSU has been developed and used over
    a number of years, and the requirements to cover a wide range of
    documents have been thoroughly explored.


    Outputs are to standard protocols or open source software.

    I would like to keep SiSU markup and meta-markup a standard, although by
    the SiSU program design it is easy to modify.

    I make claim to "object citation numbering" as a very simple idea with
    which I have persisted for many years, that makes much possible, and is
    a unifying feature of SiSU output.

    Generated by SiSU
    SiSU Sabaki 0.1.0-8 2004w51/4
    Standard SiSU markup syntax,
    Standard SiSU meta-markup syntax, and the
    Standard SiSU object citation numbering and system
    © Ralph Amissah 1997, current 2005.
    All Rights Reserved.

    Separating the markup syntax (human readable, and usually human
    prepared), and meta-markup syntax (machine written) has interesting

    (i) It is possible to change the markup syntax (or have several
    alternative input sytaxes) without disturbing the downstream program
    modules/libraries, provided you write to the same standard meta-markup
    syntax. (if you used the original syntax and then changed to an
    alternative syntax, you would presumably have alternative standard
    meta-markup generators, or convert the original syntax to the
    alternative syntax).

    (ii) It is also possible to change the meta-markup syntax, with
    consequences for all the downstream programs, but without in any way
    affecting your document set (your marked up documents).

    Both of which have been very useful over the years of development, and
    use of SiSU.

    The object citation numbering system (ocn) is a simple idea, which being
    relevant to man and machine has far reaching possibilities. All output
    uses the same object citation numbering, including database searches,
    which can present matches with an index of documents and the
    (hyperlinked ocn) locations within each document where the match was

    However, it is of interest to keep both relatively stable, and indeed to
    have a Standard. I claim this standard (at least the original standard).


    (i) GPL 2 or later, for non-commercial use of the program and

    (ii) Distributed under a commercial license everything else, (terms to
    be determined) that is for everything that is not (i)

    expanded upon a bit -

    GPL 2 or later.

    Or under special license terms from Ralph Amissah. The details of which
    are to be determined. The idea being that it can be incorporated into
    proprietary systems, under a proprietary license, for a per seat fee.
    (SiSU was identified as being of interest as a middle-ware application
    by a large database and document management software provider...)

    From this point on there will be a GPL and proprietary branch. I expect
    if there is any take-up the GPL branch will advance faster and further
    (in my hands and generally) than the proprietary branch.

    SiSU is the result of several years of research and development in
    electronic publishing, commenced in 1993 and under active development
    since 1997. There is always more to be done. SiSU is released under GPL
    2 or later (first on January 4th
    2005) and is alternatively available under special license terms from
    Ralph Amissah the detail of which is to be determined.


    To start with see the README file provided with the program.

    Historical note

    SiSU is the result of a several year journey of research and development
    related to electronic publishing, in particular related to legal and
    academic writings. It started with the discovery of the Web and a
    project to publish legal documents on the Web in 1993. Programming
    started later, but ideas as to what would be useful to have and be able
    to do, started formed from that initiation. I was lucky enough at the
    time to work with Geoffrey Armstrong and Tommy Johanson, (who wrote the
    first lines of Perl I ever saw).

    Programming SiSU, setting ends and attaining the ends set has been a
    solo effort, from which I have learnt masses, and come to appreciate and
    depend on the work of others, no one less so than Matz of Ruby fame.
    Within the Ruby community I have learnt lots from others, in particular
    Ruby book authors both paper and electronic (I would guess Dave Thomas,
    Why (what's new in Ruby 1.8.0, and yes even bits of the Poignant Guide),
    and Hal Fulton in roughly that order, Slagell's book is decent, I would
    not have minded starting on Ruby with that), and those most vocal in the
    newsgroup and irc channel (to many to keep track of let alone mention, -
    Eek and Batsman and earlier in time DBlack deserve special mention). I
    have not used, the recommended route of studying the code of other
    projects (perhaps one day). The Ruby language is remarkable as has been
    the Ruby community to date.

    I have not studied other document/text processors as such either. My
    impression is that this must be much easier to use than say a DocBook,
    but will offer a different range of features. (I probably should not
    mention it at all, I don't know).

    I have always planned to share this work (under a dual license, one of
    them being the GPL). A brief encounter with IBM in 2004 (Software
    Innovations evaluation) had me scrambling to the U.S. June/July to
    arrange a provisional Patent application, (and wondering if that was the
    route I wished to pursue why I had not done so seven or more years
    earlier) as the only way to meaningfully talk to them. The employee
    left, and interest has not persisted, fortunately. As to where I stand
    on Software Patents, software patents in their current form appear to be
    primarily a tool to stifle innovation, not to promote it, (indeed this
    is why what I have done is a lot more interesting to a large company if
    I hold a Patent than otherwise) that can only be financially afforded by
    large companies in their application, and in their enforcement through
    litigation. Europe would do well not to have them.

    If I were not pleased with Debian/Gnu/Linux(Sid), its' packaging system,
    (developers and range of applications) and social contract, I would
    almost certainly use one of the BSDs as my development platform -
    FreeBSD or Dragonfly.

    What SiSU is not - SiSU is not

    * blogging software. (though i sometimes misuse it in this way)

    * a wiki (well obviously, though it would be interesting to use this
    technology alongside a wiki - the wiki being used for constantly updated
    pages and navigation information, whilst SiSU is used for published
    works that are not changed frequently - eg a published academic writing,
    a book, a convention)

    * for documentation on programming, or mathematical, scientific texts.


    This is a fairly large project, much remains to be done. Of particular
    interest, without any time scale or immediate urgency:

    * Documentation. There is some, but the presentation is nowhere near as
    digestable as it should be.

    * Documentation apart, the biggest single todo is Unicode processing.
    LaTeX and Postgresql support UTF-8 so that is what it is most likely to
    be. My excuse for not having looked at it yet ... need to date, and not
    having configured my environment for it. I do however recognise this as
    a need.

    * Getting the Sqlite module working again. Similar to the Postgresql
    module, fell out of maintenance, when I found Sqlite to be a bit of a
    pain to install on Debian, (and was prioritising Postgresql), once upon
    a time the modules were in sync, and I hope to have them that way again

    * Much code cleaning ... this project has developed over several years,
    and there have been many changes in how things are done, without
    rigorous removal of dead code.

    * simplify installation, and test across other Unix and Gnu/Linux

    * object citation numbering is currently done only for substantive text
    and other objects (such as images), a secondary numbering will
    eventually be implemented for non-substantive items.

    * decide what to do with images and tables in XML and in relational

    * Marshalled/PStored Metaverse. As an alternative (not replacement) to
    the current ordinary text based SiSU meta-markup state.

    * Additional Syntax hi-lighters. The current syntax hi-lighter, and
    folds are for vim. Additional syntax highlighters for SiSU markup would
    be extremely welcome, they don't need to be as complete as the vim
    highligter. Emacs would obviously be nice, but the ruby editors, and
    less geeky editors are of much interest. Not sure that I will do this,
    after all I do use Vim, we'll see.

    * My vim configuration files are a total mess, but are provided as is.

    Help/suggestions welcome.
    Ralph Amissah, Jan 4, 2005
    1. Advertising

  2. Re: [Ann] SiSU - document generator, "atomic" search etc.

    SiSU Sabaki, version 0.1.1-0 of 2005w01/4 (20050106):


    Modifications to installation script, configuration paths, and help
    only. This release should be easier to install and to figure out what
    needs to be done should problems be encountered with installation.

    * install script a bit smarter, - also installs the configuration files
    that come along with the sample marked up documents, which should make
    things a bit easier

    * a lot more information provided on paths, both by the install script
    and interactive help once installed, to assist in figuring out what
    needs to be done should a problem arise with installation or

    * help updated

    On Tue, 4 Jan 2005 20:33:22 +0000 (UTC), Ralph Amissah
    <> wrote:
    > 20050104 SiSU is released
    > --------------------------
    > Announce
    > --------
    > Excuse the lengthy announcement, hubris and repetition.

    > Help/suggestions welcome.

    On SiSU Generally:
    Ralph Amissah, Jan 6, 2005
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HDL Book Seller
    HDL Book Seller
    Dec 1, 2004
  2. Eric Frigot

    Xindice VS Relational Databases

    Eric Frigot, Dec 20, 2004, in forum: XML
    Jim Kennedy
    Dec 21, 2004
  3. Guest

    Books, Books, Books...

    Guest, Sep 19, 2004, in forum: C++
    Sep 19, 2004
  4. Ralph Amissah
    Ralph Amissah
    Jan 10, 2005
  5. Ralph Amissah
    Ralph Amissah
    Nov 20, 2005

Share This Page