Can I use XML as an article database ?

Discussion in 'XML' started by Alvin SIU, May 28, 2007.

  1. Alvin SIU

    Alvin SIU Guest

    Hi all,

    I am a newbie with XML.
    Hope that any expert can give me a hand to guide me the right
    direction on this topic.

    I have many articles, all are text file.
    They are stored in many directories, according to its topic.

    Using this method, I can easily classify the articles by topic.
    But, I cannot classify it by Author, or by date.
    So, 'directory' is not a good method.

    If I put the articles into database,
    I can easily add additional columns (e.g. Author, Date of Publish,
    etc) to each article.

    Then, I can easily sorted by Author or by Date.

    But, using a database seems to be quite troublesome.

    I wonder whether I can convert all article text file into an XML file
    with, for example,
    the following tags:
    <author>xxx</author>
    <date>yyyy-mm-dd</date>
    <essay>The original article contents</essay>

    Then, put all the XML files under a directory.
    Then, use 'something' to search this directory.
    Then, I can easily get a list sorted by Author, or by Date, or else.

    Now, my questions are:

    Q1. Is this method feasible ?

    Q2. Is this a correct way of using XML ?
    What I mean is XML designed for this use) ?

    Q3. Is there anything in the world already done this ?
    If yes, please guide me to that.

    Q4. Is there anything related to this situation ?
    If yes, please give me some keywords
    so that I can continue searching the net.
    I use the keywords : XML +document +index
    but cannot find what I want.

    Thanks for your expert advice in advance.
    Alvin SIU
     
    Alvin SIU, May 28, 2007
    #1
    1. Advertising

  2. Alvin SIU

    Pavel Lepin Guest

    Alvin SIU <> wrote in
    <>:
    > I have many articles, all are text file.
    > They are stored in many directories, according to its
    > topic.
    >
    > If I put the articles into database,
    > I can easily add additional columns (e.g. Author, Date of
    > Publish, etc) to each article.
    >
    > Then, I can easily sorted by Author or by Date.
    >
    > But, using a database seems to be quite troublesome.


    Troublesome? I'm not sure what you mean. A database seems
    like the only sensible way to go, whether it's XML
    database, more traditional tuple-based RDBMS or something
    else that has 'database' in its name. Because, whether you
    realize it or not, what you describe *is* a database.

    > I wonder whether I can convert all article text file into
    > an XML file with, for example,
    > the following tags:
    > <author>xxx</author>
    > <date>yyyy-mm-dd</date>
    > <essay>The original article contents</essay>
    >
    > Then, put all the XML files under a directory.


    Right. Concealing the databaseness of your task behind the
    familiar concepts of filesystem won't make The Database go
    away. For that matter, any filesystem is a specialised
    database.

    > Then, use 'something' to search this directory.


    'Something' is called XQuery. You stuff your XML data into
    an XML database, then use XPath/XQuery/XSLT/whatever else
    to access it.

    > Q1. Is this method feasible ?


    Not as you described. But if you replace 'directory'
    with 'XML database' and 'something' with 'XQuery', it is.

    > Q2. Is this a correct way of using XML ?
    > What I mean is XML designed for this use) ?


    XML is designed to represented structured data. XML
    databases are designed to store and access structured data
    represented as XML. XQuery is designed to query structured
    data represented as XML.

    > Q3. Is there anything in the world already done this ?
    > If yes, please guide me to that.


    IBM's DB2 9 Express-C. Alternatively, you might want to
    google for XML databases.

    --
    Pavel Lepin
     
    Pavel Lepin, May 28, 2007
    #2
    1. Advertising

  3. Alvin SIU

    Andy Dingley Guest

    On 28 May, 07:54, Alvin SIU <> wrote:


    > Q1. Is this method feasible ?


    As an example or as working code?

    You can certainly do it, but performance for retrieving articles will
    be terrible.


    > Q2. Is this a correct way of using XML ?
    > What I mean is XML designed for this use) ?


    XML is a data format primarily for exchanging documents. Once they're
    retrieved, store them in some sort of database.

    For your example here, the obvious technology to use is a SQL
    database. It's not a perfect choice, but it's very accessible to you.
    Anyone can easily get hold of MySQL or Access-like database engines


    > Q3. Is there anything in the world already done this ?
    > If yes, please guide me to that.


    About a squillion things already!

    You should probably read up on:

    Dublin Core (especially on this)
    Metadata
    OAI
    RSS 1.0 / Atom syndication formats

    You can do this in XML, although XML has restrictions that become a
    real nuisance for big systems.


    One of your problems isn't the storage and querying of your data, it's
    the issue of "vocabularies". As your system grows bigger and more
    interested in inter-working with other systems, then you start to care
    about identifying "authors" such that "Douglas Adams" is the guy who
    wrote "Health Monitoring of Structural Materials and Components", not
    the guy with the towel obsession (follow the link - even the mighty
    Amazon have got this one wrong).
    <http://www.amazon.co.uk/exec/obidos/ASIN/0470033134/codesmiths>

    This itself is a big topic! (with much work going on within it). You
    might find yourself using techniques like XML Schema or even OWL to
    list these. It also starts to hit the limits of XML, and you might
    find RDF more useful to you.
     
    Andy Dingley, May 29, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Mertz, Ph.D.

    Article on updates to gnosis.xml.objectify

    David Mertz, Ph.D., Dec 4, 2004, in forum: Python
    Replies:
    0
    Views:
    294
    David Mertz, Ph.D.
    Dec 4, 2004
  2. Jonathan Wood

    Article Storage: Files vs. Database

    Jonathan Wood, Jun 4, 2008, in forum: ASP .Net
    Replies:
    15
    Views:
    529
    Jonathan Wood
    Jun 6, 2008
  3. Praxis Happenstance
    Replies:
    4
    Views:
    194
    General Protection Fault
    Jul 23, 2004
  4. George Hester
    Replies:
    0
    Views:
    125
    George Hester
    Sep 21, 2003
  5. James Britt

    REXML article on XML.com

    James Britt, Nov 14, 2005, in forum: Ruby
    Replies:
    5
    Views:
    97
Loading...

Share This Page