Huge XML data needed

Discussion in 'XML' started by Beda Christoph Hammerschmidt, Apr 1, 2004.

  1. I wat to perform some performance measurements on an XML database. FOr
    this reason i need some huge XML sample data. The data should be not
    too structured and a lot of reasonable queries should make sense.
    Any idea, where i can get this data ??
     
    Beda Christoph Hammerschmidt, Apr 1, 2004
    #1
    1. Advertising

  2. Beda Christoph Hammerschmidt

    Andy Dingley Guest

    On 1 Apr 2004 06:45:29 -0800, (Beda Christoph
    Hammerschmidt) wrote:

    >Any idea, where i can get this data ??


    Make it yourself. That way you can control the size and the
    distribution of certain features. If this process is automated, then
    you can easily run tests over and over with different parameters.

    It's often useful (but rarely done) to test, not just that "it works",
    but to test for sensitivity to different sorts of load. Does
    performance change with many small items, or with few large items ?
    Does sorted/unsorted input data make a difference ?

    Another source of "real world" data in a large corporate is to connect
    to something like an LDAP server and use that. I've also done much of
    my own testing with lists of endangered species form the WCMC. You may
    also find the W3C site useful, particularly the RDF test cases (not
    large, but they do demonstrate many obscure conditions).

    --
    Smert' spamionam
     
    Andy Dingley, Apr 2, 2004
    #2
    1. Advertising

  3. >>>>> "Beda" == Beda Christoph Hammerschmidt <> writes:

    Beda> I wat to perform some performance measurements on an XML database. FOr
    Beda> this reason i need some huge XML sample data. The data should be not
    Beda> too structured and a lot of reasonable queries should make sense. Any
    Beda> idea, where i can get this data ??

    You might get some RSS feed. RSS is a form used by several news servers to
    distribute news. So by definition, there is not much structure, but you can
    make reasonable queries, like what happed (some terrorist act), what was the
    score (some soccer game) etc.

    --
    Arto V. Viitanen
    University of Tampere, Department of Computer Sciences
    Tampere, Finland http://www.cs.uta.fi/~av/
     
    Arto Viitanen, Apr 2, 2004
    #3
  4. Beda Christoph Hammerschmidt wrote in message news:<>...
    > I wat to perform some performance measurements on an XML database. FOr
    > this reason i need some huge XML sample data. The data should be not
    > too structured and a lot of reasonable queries should make sense.
    > Any idea, where i can get this data ??


    I'm not sure what you mean by "huge", but there is a good amount of
    data that might be intersting to query at:
    http://www.ibiblio.org/xml/examples/shakespeare/

    Toivo Lainevool
    http://www.XMLPatterns.com - Develop effective DTDs and XML Schema
    documents for your XML using structural design patterns.
     
    Toivo Lainevool, Apr 2, 2004
    #4
  5. Beda Christoph Hammerschmidt

    Fabien R Guest

    Why don't you generate them ?
    Use a free-db like MySQL...
    (Beda Christoph Hammerschmidt) wrote in message news:<>...
    > I wat to perform some performance measurements on an XML database. FOr
    > this reason i need some huge XML sample data. The data should be not
    > too structured and a lot of reasonable queries should make sense.
    > Any idea, where i can get this data ??
     
    Fabien R, Apr 2, 2004
    #5
  6. Arto Viitanen wrote:
    >>>>>>"Beda" == Beda Christoph Hammerschmidt <> writes:

    >
    >
    > Beda> I wat to perform some performance measurements on an XML database. FOr
    > Beda> this reason i need some huge XML sample data. The data should be not
    > Beda> too structured and a lot of reasonable queries should make sense. Any
    > Beda> idea, where i can get this data ??
    >
    > You might get some RSS feed.


    But RSS - by definition - is not "huge XML data".
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
     
    Johannes Koch, Apr 2, 2004
    #6
  7. >>>>> "Johannes" == Johannes Koch <> writes:

    Beda> this reason i need some huge XML sample data. The data should be not
    Beda> too structured and a lot of reasonable queries should make sense. Any
    Beda> idea, where i can get this data ??
    >> You might get some RSS feed.


    Johannes> But RSS - by definition - is not "huge XML data".

    But I got two out of third: it is not too structured and there can be
    reasonable queries !


    --
    Arto V. Viitanen
    University of Tampere, Department of Computer Sciences
    Tampere, Finland http://www.cs.uta.fi/~av/
     
    Arto Viitanen, Apr 2, 2004
    #7
  8. Arto Viitanen wrote:
    >>>>>>"Johannes" == Johannes Koch <> writes:

    >
    >
    > Beda> this reason i need some huge XML sample data. The data should be not
    > Beda> too structured and a lot of reasonable queries should make sense. Any
    > Beda> idea, where i can get this data ??
    > >> You might get some RSS feed.

    >
    > Johannes> But RSS - by definition - is not "huge XML data".
    >
    > But I got two out of third: it is not too structured and there can be
    > reasonable queries !


    That's right :)
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
     
    Johannes Koch, Apr 2, 2004
    #8
  9. On 1 Apr 2004, Fabien R wrote:

    > Why don't you generate them ?


    Good idea. There are 5 major XML DB Benchmark efforts. Some include data
    generators. See:

    http://www.rpbourret.com/xml/XMLDBLinks.htm#Benchmarks

    Ron Bourret has a link to a benchmark page that I use to maintain, but I
    no longer have time to maintain it.

    > Use a free-db like MySQL...
    > (Beda Christoph Hammerschmidt) wrote in message news:<>...
    > > I wat to perform some performance measurements on an XML database. FOr
    > > this reason i need some huge XML sample data. The data should be not
    > > too structured and a lot of reasonable queries should make sense.
    > > Any idea, where i can get this data ??

    >
    >


    Some benchmarks and performance issues are also covered in the book I
    helped edit:

    A.B. Chaudhri, A. Rashid and R. Zicari (eds.) (2003) XML data management:
    native XML and XML-enabled database systems (Reading, Massachusetts:
    Addison-Wesley)

    http://www.awprofessional.com/titles/0201844524/

    HTH

    akmal
     
    Akmal B. Chaudhri, Apr 2, 2004
    #9
  10. Beda Christoph Hammerschmidt

    Stefan Ram Guest

    (Beda Christoph Hammerschmidt) writes:
    >I wat to perform some performance measurements on an XML database. FOr
    >this reason i need some huge XML sample data. The data should be not
    >too structured and a lot of reasonable queries should make sense.


    This is somewhat structured, but large:

    http://rdf.dmoz.org/rdf/content.rdf.u8.gz
     
    Stefan Ram, Apr 2, 2004
    #10
  11. Beda Christoph Hammerschmidt

    Andy Dingley Guest

    On Fri, 02 Apr 2004 10:25:31 +0200, Johannes Koch
    <> wrote:

    >But RSS - by definition - is not "huge XML data".


    Much RSS isn't even XML !

    Today's RSS feed bug was this
    http://www.littlefluffy.com/index.php?a=rss

    <description>
    A more aptly named game you are not likely to find. [...] a great
    game for drug users &lt;em>and&lt;/em> kids.
    </description>
     
    Andy Dingley, Apr 2, 2004
    #11
  12. In article <>,
    Andy Dingley <> wrote:

    % Today's RSS feed bug was this
    % http://www.littlefluffy.com/index.php?a=rss
    %
    % <description>
    % A more aptly named game you are not likely to find. [...] a great
    % game for drug users &lt;em>and&lt;/em> kids.
    % </description>

    So, what's wrong with it? That <em> should appear as mark-up, or that
    you think > shouldn't be there?


    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Apr 4, 2004
    #12
  13. Beda Christoph Hammerschmidt

    Andy Dingley Guest

    On Sun, 4 Apr 2004 03:00:54 +0200 (MEST), (Patrick
    TJ McPhee) wrote:

    >% game for drug users &lt;em>and&lt;/em> kids.


    >So, what's wrong with it? That <em> should appear as mark-up, or that
    >you think > shouldn't be there?


    There's no valid encoding for HTML in RSS where the opening character
    of a tag is escaped, but not the closing character.
     
    Andy Dingley, Apr 4, 2004
    #13
  14. In article <>,
    Andy Dingley <> wrote:
    % On Sun, 4 Apr 2004 03:00:54 +0200 (MEST), (Patrick
    % TJ McPhee) wrote:
    %
    % >% game for drug users &lt;em>and&lt;/em> kids.
    %
    % >So, what's wrong with it? That <em> should appear as mark-up, or that
    % >you think > shouldn't be there?
    %
    % There's no valid encoding for HTML in RSS where the opening character
    % of a tag is escaped, but not the closing character.

    OK, perhaps it's not valid RSS, but it's valid XML.

    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Apr 5, 2004
    #14
  15. Beda Christoph Hammerschmidt

    Jimmy Zhang Guest

    http://www.pir.uniprot.org/database/download.shtml

    up to a gig in size


    "Beda Christoph Hammerschmidt" <> wrote in message
    news:...
    > I wat to perform some performance measurements on an XML database. FOr
    > this reason i need some huge XML sample data. The data should be not
    > too structured and a lot of reasonable queries should make sense.
    > Any idea, where i can get this data ??
     
    Jimmy Zhang, Apr 8, 2004
    #15
  16. > too structured and a lot of reasonable queries should make sense.
    > Any idea, where i can get this data ??

    I also suggest you to create the data on your own. I recommend ToXgene:
    http://www.cs.toronto.edu/tox/toxgene

    hth
    Torsten
     
    Torsten Bittner, Apr 21, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Suggestion needed for huge DataGrid

    Hardy Wang, Nov 12, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    359
  2. iksrazal

    Huge DB query ideas needed

    iksrazal, Mar 9, 2005, in forum: Java
    Replies:
    0
    Views:
    436
    iksrazal
    Mar 9, 2005
  3. Xenia
    Replies:
    4
    Views:
    459
    Xenia
    Nov 25, 2003
  4. Thomas Nick
    Replies:
    0
    Views:
    1,972
    Thomas Nick
    Jun 13, 2005
  5. Replies:
    3
    Views:
    547
Loading...

Share This Page