XML, DTDs and getting the newspaper on the web

Discussion in 'XML' started by Brandons of mass destruction, May 6, 2004.

  1. Ok, I work for a small newspaper thats looking to translate our content
    to the web.

    XML sounds like a good way to do this, but i'm a little confused.

    Is it possible to have <b> and <i> tags in an XML document? we'd need to
    do this editorial style (movie names get are italic etc)

    When writing a dtd for the paper, it seems as though everything must
    appear in a linear fashion i.e. headline, body, endnotes.
    yet how do i handle sidebars, which appear within the body?
    can i make sub elements?

    Also, our paper has different sections with different types of content.
    Should i write one DTD that covers every possible style in our paper and
    break it down to several different DTDs? does it matter?

    Is anyone using Indesign CS to export content to XML? if so, how
    successful is it?

    --
    Quarkxpress sucks.
    Brandons of mass destruction, May 6, 2004
    #1
    1. Advertising

  2. Brandons of mass destruction wrote:

    > Ok, I work for a small newspaper thats looking to translate our content
    > to the web.
    >
    > XML sounds like a good way to do this, but i'm a little confused.
    >
    > Is it possible to have <b> and <i> tags in an XML document? we'd need to
    > do this editorial style (movie names get are italic etc)


    You can have any tag in an XML document (besides the restrictions that
    the XML specification puts on element/tag names) so of course you can
    have a <b> or <i> element however the semantics of such elements is not
    automatically the HTML presentational semantics of bold text or italic
    text. Unless you use a namespace of a well specified document markup
    language like XHTML e.g.
    <b xmlns="http://www.w3.org/1999/xhtml">text</b>
    your element is just an element with the name "b" which can mean bold or
    big or brother or whatever you want it to mean and a browser will not
    understand it as meaning bold.




    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, May 6, 2004
    #2
    1. Advertising

  3. In article <409a5873$0$10890$-online.net>,
    Martin Honnen <> wrote:

    > Brandons of mass destruction wrote:
    >
    > > Ok, I work for a small newspaper thats looking to translate our content
    > > to the web.
    > >
    > > XML sounds like a good way to do this, but i'm a little confused.
    > >
    > > Is it possible to have <b> and <i> tags in an XML document? we'd need to
    > > do this editorial style (movie names get are italic etc)

    >
    > You can have any tag in an XML document (besides the restrictions that
    > the XML specification puts on element/tag names) so of course you can
    > have a <b> or <i> element however the semantics of such elements is not
    > automatically the HTML presentational semantics of bold text or italic
    > text. Unless you use a namespace of a well specified document markup
    > language like XHTML e.g.
    > <b xmlns="http://www.w3.org/1999/xhtml">text</b>
    > your element is just an element with the name "b" which can mean bold or
    > big or brother or whatever you want it to mean and a browser will not
    > understand it as meaning bold.


    ah.

    Well, how would I go about preserving those bold and italic tags?

    --
    Quarkxpress sucks.
    Brandons of mass destruction, May 6, 2004
    #3
  4. Brandons of mass destruction

    Victor Guest

    Brandons of mass destruction wrote:

    > Well, how would I go about preserving those bold and italic tags?


    Two options come to mind:
    - Save your files as XHTML, the XML-ified version of HTML. At least with
    the older versions of XHTML <b> and <i> tags should have their
    traditional meaning
    - Enter the semantic web. This is a relatively new way of designing web
    pages where content (data/semantics) is separated from presentation.
    E.g., if you know movie names should be presented as italic text, you
    have a <movie> tag with a child <name> which is formatted using a
    separate stylesheet rule (CSS or XSLT). This way, when the layout
    changes, you only need to change the stylesheet to reflect the change in
    all documents. Also, "boldness" in this case is not information _about
    the movie_, so you would avoid adding data which really isn't needed.

    To learn how to use the semantic web, you can have a look at the
    W3Schools XML tutorial (http://www.w3schools.com/xml/) to get you
    started. Afterwards, you might want to check out the tutorials for other
    W3C technologies (http://www.w3schools.com/), an interesting article
    about the semantic web
    (http://www.creativebehavior.com/index.php?PID=87), and the W3C
    standards themselves (http://www.w3.org/).

    --
    Victor
    Victor, May 7, 2004
    #4
  5. Brandons of mass destruction

    Andy Dingley Guest

    > To learn how to use the semantic web, you can have a look at the
    > W3Schools XML tutorial (http://www.w3schools.com/xml/)


    If you want a clueless misunderstandong of XML, let alone SemWeb, you
    could try the execrable w3schools.
    Andy Dingley, May 7, 2004
    #5
  6. Brandons of mass destruction

    Andy Dingley Guest

    Brandons of mass destruction <> wrote in message news:<>...
    > Ok, I work for a small newspaper thats looking to translate our content
    > to the web.


    Look first at existing standards like RSS 1.0, NITF NewsML, XHTML,
    DocBook, Dublin Core etc.

    Secondly, look at off-the-shelf content management systems. This is a
    big project you're dealing with here – it's the sort where
    organisations can disappear into a software tar-pit (and some go bust
    as a result!). If you aren't "a software guy", then grab a copy of
    Steve McConnell's "Rapid Development", which is an essential for
    anyone managing a software project for the first time.

    You're not just writing a DTD here. You're writing a _system_. Think
    system all the way through (or else your life will become tiresome).
    How is authoring done, and how do things get into this DTD format ?
    How do you store and manage the great many articles you'll have
    (including editorial review and embargo) ? How do you finally publish
    them ? How do you let users write dummy articles and publish them,
    for training or demo purposes ? All this stuff needs to be thought
    through, because the alternative is to go live in an unclear manner
    (and we can imagine how publishing deadlines are conducive to
    re-working software on a now-live system).

    > When writing a dtd for the paper,


    Don't set out by writing a DTD. Really, truly bad idea. DTD-writing
    is for those who are the first team to ever address a particular
    problem. If you're Yet Another Newspaper, then use someone else's
    existing DTD, don't go writing your own, use a pre-existing one.

    As a separate point, I suggest using Schema instead of DTD.

    If you can't find a schema to suit all your needs (which is likely),
    then assemble a composite from several sources. In almost all cases,
    you can solve 90% of your problem by bolting together (XML
    namespacing) existing schemas.

    > it seems as though everything must
    > appear in a linear fashion


    No, everything must appear in a branching tree, which can be
    serialised in a linear manner. It's not _quite_ as restrictive,
    although it is close.

    My last few years of work has been with using RDF for seriously
    complex content management (i.e. automated editorial and content
    _production_, not just publishing). That's an area where I needed to
    go beyond XML's Infoset tree model. For the general publishing case of
    articles authored and subbed by humans though, then XML is adequate.

    These days I'm working on magazine sites like www.t3.co.uk,
    www.laptopmagazine.co.uk and http://gamesradar.msn.co.uk These are
    examples of several dozen magazine sites generated by the same XML /
    XSLT-based CMS. They use their own schema (bad move in hindsight).

    Content Management seems to be a different problem with every user you
    talk to. There are three broad directions to approach it from though;
    content, page layout and site structure. Some users worry much more
    about one aspect than others. A newspaper probably has a fairly simple
    site structure, and certainly one that's long-term stable (so the
    one-off design costs are less crucial). Page layout may or may not be
    an issue, depending on the destination of your content – if "publish
    on the web" means offering RSS syndication, then you're effectively
    avoiding the question anyway. For a page on your own site, you will
    care about this though.

    >i.e. headline, body, endnotes.
    > yet how do i handle sidebars, which appear within the body?


    No big deal. There is a "document order" (which as we've just noted
    needs to fit onto XML) and there is a layout order (which is two or
    more dimensional). You need to relate one to the other, but this isn't
    a problem. The linearity of one doesn't enforce that same order onto
    the other.

    One question is how final layout is specified. Is the placing of a
    call-out etc. a matter for the text content author (so they embed
    positioning information in the article body), or the page designer ?
    Does it need to move around, depending on how text is filling the
    available space ? An entirely workable solution is to simply have a
    "callout" property (or set of them) and leave the positioning up to
    the final rendering engine. After all, this is how it's done by the
    art ed. in a paper-based world.

    The "content" aspect of CMS for newspapers is one of the most complex
    (in terms of Schema), but fortunately it's also one of the best
    established. You don't need to invent here, there's a lot you can
    borrow from pre-existing standards.

    Think of your "article" structure at two levels; one is "newspaper"
    structure, bylines, callouts etc. The other is lower than this,
    generalised text markup such as <b>, <i>. Steal this from XHTML
    (particularly the use of <div>, <span> and the concept of coreattrs)

    A problem with article authoring is maintaining consistency between
    authors. You'll extract some structure (i.e. "headline", "abstract")
    into a formal Schema and out of the article body. Other less-obvious
    properties, such as bylines, might not be treated so explicitly and
    find themselves styled in-line by the text editing tool. If you don't
    give your users a good way to do something, they'll only find a bad
    way to try it instead. Encourage them to style regularly-used elements
    as <div class="byline" >…, not just a mish-mash of <newline /><b><i>…

    Can inline formatting (<em>, <b> etc.) be applied to the major
    properties (headline etc.)? You need to decide this early on (either
    way will work), then make your system consistent in where it permits
    it! A schema is a specification of the data model you support – you
    need to extend this to every part of the system, from the authoring
    tool to the database to the publishing engine and the final rendering
    as HTML/CSS or PDF. It doesn't always have to implement every feature
    at every level, but you have to _know_ how it is handled (or not) at
    _every_ step, or you will go crazy when debugging it. This is a big
    system you're dealing with here.

    For newspapers, I don't see content representation as being a big
    problem. I don't even see page layout as insurmountable. Magazine
    page-layout OTOH is much trickier. It varies far more, there's more
    branding distinction between titles, and it's generally more
    design-centred than content-centred. But that you should want to hear
    _my_ troubles….


    > can i make sub elements?


    What's a sub element ? If you mean "paragraph, but not quite like a
    standard paragraph" then look at permitting a "class" attribute on
    _all_ of the elements in your text formatting set. This is what HTML
    does with the coreattrs set. You now know that _every_ element you
    have can be treated equally, with a class to allow sub-classing its
    behaviour like this (and an easy binding to CSS when you publish it).
    Also allow some arbitrary containers like <div> and <span>, just as a
    placeholder to carry such attributes. The rest of the set; title, id
    and lang are nearly as useful too.

    > Also, our paper has different sections with different types of content.
    > Should i write one DTD that covers every possible style in our paper and
    > break it down to several different DTDs? does it matter?


    Write it in modules, because it makes their management easier. But be
    able to generate a single composite schema from all of this, as a
    global overview to check you're avoiding collisions.

    Also allow the "documents" to have a choice of root element. Maybe
    everything gets published as "publication", but there will be many
    times in the content management system when it's useful to deal with
    an "article" or a "competition-question-set" and still be able to
    validate this fragment against the schema.


    I really should write a white paper on "What's needed in an ideal
    magazine publishing CMS" – I have a meeting on Monday where it would
    be useful, so I might even make time for it. Maybe…
    Andy Dingley, May 7, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?RGF2ZQ==?=

    Split datagrid into newspaper-style columns?

    =?Utf-8?B?RGF2ZQ==?=, Mar 16, 2006, in forum: ASP .Net
    Replies:
    5
    Views:
    2,683
    =?Utf-8?B?RWx0b24gVw==?=
    Mar 18, 2006
  2. Dan Abrey

    Newspaper columns?

    Dan Abrey, Aug 12, 2004, in forum: HTML
    Replies:
    12
    Views:
    5,221
    Mark Parnell
    Aug 16, 2004
  3. Clifford W. Racz
    Replies:
    4
    Views:
    2,008
    Clifford W. Racz
    Feb 13, 2004
  4. seven.reeds
    Replies:
    2
    Views:
    693
    seven.reeds
    Jul 1, 2007
  5. John Dufour
    Replies:
    0
    Views:
    89
    John Dufour
    Oct 17, 2006
Loading...

Share This Page