Re: Vote tallying...

Discussion in 'Python' started by Andrew Robinson, Jan 18, 2013.

  1. On 01/18/2013 08:47 AM, Stefan Behnel wrote:
    > Andrew Robinson, 18.01.2013 00:59:
    >> I have a problem which may fit in a mysql database

    > Everything fits in a MySQL database - not a reason to use it, though. Py2.5
    > and later ship with sqlite3 and if you go for an external database, why use
    > MySQL if you can have PostgreSQL for the same price?

    MySQL is provided by the present server host. It's pretty standard at
    web hosting sites.
    It works through "import MySQLdb" -- and it means an IP call for every
    action...
    Postgre isn't available :( otherwise, I'd use it....

    I'm mildly concerned about scaling issues.... but don't have a lot of
    time (just a few days) to come to a decision. I don't need high
    performance, just no grotesque degradation when the system is scaled up,
    and no maintenance nightmare. The votes table is going to get
    monsterous if all votes are held in one table....

    Your comment about sqlite is interesting; I've never used it before.
    At a glance, it uses individual files as databases, which is good... But it wants to lock the entire database against reads as well as writes when any access of the database happens. Which is bad...

    http://www.sqlite.org/different.html
    http://www.sqlite.org/whentouse.html

    > ... XML files are a rather static thing and meant to be
    > processed from start to end on each run. That adds up if the changes are
    > small and local while the file is ever growing. You seem to propose one
    > file per article, which might work. That's unlikely to become too huge to
    > process, and Python's cElementTree is a very fast XML processor.

    Yes, that's exactly what I was thinking.... one file/article.

    It's attractive, I think, because many Python programs are allowed to
    read the XML file concurrently, but only one periodically updates it as
    a batch/chron/or triggered process; eg: the number/frequency of update
    is actually controllable.

    eg: MySQL accumulates a list of new votes and vote changes and python
    occasionally flushes the database into the archive file. That way, MySQL
    only maintains a small database of real-time changes, and the
    speed/accuracy of the vote tally can be tailored to the user's need.
    >
    > However, your problem sounds a lot like you could map it to one of the dbm
    > databases that Python ships. They work like dicts, just on disk.

    Doing a Google search, I see some of these that you are mentioning --
    yes, they may have some potential.
    >
    > IIUC, you want to keep track of comments and their associated votes, maybe
    > also keep a top-N list of the highest voted comments. So, keep each comment
    > and its votes in a dbm record, referenced by the comment's ID (which, I
    > assume, you keep a list of in the article that it comments on).

    The comments themselves are just ancillary information; the votes only
    apply to the article itself at this time. The two pieces of feedback
    information are independent, occasionally having a user that gives both
    kinds. Statistically, there are many votes -- and few comments.

    Each archive file has the same filename as the article that is being
    commented or voted on; but with a different extension (eg: xml, or
    ..db,or...) so there's no need to store article information on each vote
    or comment; (unlike the MySQL database, which has to store all that
    information for every vote.... ugh....!)

    > You can use
    > pickle (see the shelve module) or JSON or whatever you like for storing
    > that record. Then, on each votes update, look up the comment, change its
    > votes and store it back. If you keep a top-N list for an article, update it
    > at the same time. Consider storing it either as part of the article or in
    > another record referenced by the article, depending of how you normally
    > access it. You can also store the votes independent of the comment (i.e. in
    > a separate record for each comment), in case you don't normally care about
    > the votes but read the comments frequently. It's just a matter of adding an
    > indirection for things that you use less frequently and/or that you use in
    > more than one place (not in your case, where comments and votes are unique
    > to an article).
    >
    > You see, lots of options, even just using the stdlib...
    >
    > Stefan
    >

    Yes, lots of options....
    Let's see... you've noticed just about everything important, and have
    lots of helpful thoughts; thank you.

    There are implementation details I'm not aware of regarding how the
    file-system dictionaries (dbm) work; and I wouldn't know how to compare
    it to XML access speed either.... but I do know some general information
    about how the data might be handled algorithmically; and which might
    suggest a better Python import to use?

    If I were to sort all votes by voter ID (a 32 bit number), and append
    the vote value (A 4 to 8bit number); Then a vote becomes a chunk of 40
    bits, fixed length; and I can stack one right after another in a compact
    format.

    Blocks of compacted votes are ideal for binary searching; since they
    have fixed length... and if I am only wanting to change a vote, I don't
    need to re-write the entire file, because a changed vote doesn't change
    the file length. ( So, now I have COMPACT + FASTER ).

    If I were to wish to insert new votes, in this sorted list of votes -- I
    would need to do something like a merge sort; which means changing the
    length of the file by a bulk copy of most of it to open up a "little"
    space for 1 or two votes; BUT:

    It's also possible to break the vote list up into compact ranges of
    voter ID's; And perhaps using indirection like you suggested.... and I
    was thinking it might not be hard to add *some* limited blank space in
    the file -- like this XML example:

    <vblock start="125" end="550">
    0xFFFFFABCDEFAFDADADADAFAFADADADAFADADAFAFAFADDAFAFAD................</vblock>
    <vblock start="560" end="620">
    0xFEEEEAFFFFABCDEFAFDADADADAFAFADADADAFADADAFAFAFADDAFAFAD......</vblock>

    So that inserting a vote to block "125" could be done by a simultaneous
    deleting of some of the padding characters "...." and therefore, the
    whole file rarely needs a re-write.

    I don't want to waste a large amount of time optimizing this, as a fast
    C-api COPY in one library might make merge sort tons faster than a
    python interpreter based blank space shrink-expand; But, do any of the
    possible optimizations I just gave suggest one standard Python library
    might be better suited than another?

    Thanks for the help. :)
     
    Andrew Robinson, Jan 18, 2013
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Robinson

    Vote tallying...

    Andrew Robinson, Jan 17, 2013, in forum: Python
    Replies:
    0
    Views:
    112
    Andrew Robinson
    Jan 17, 2013
  2. Lie Ryan

    Re: Vote tallying...

    Lie Ryan, Jan 18, 2013, in forum: Python
    Replies:
    0
    Views:
    121
    Lie Ryan
    Jan 18, 2013
  3. Stefan Behnel

    Re: Vote tallying...

    Stefan Behnel, Jan 18, 2013, in forum: Python
    Replies:
    0
    Views:
    148
    Stefan Behnel
    Jan 18, 2013
  4. Nick Cash

    RE: Vote tallying...

    Nick Cash, Jan 18, 2013, in forum: Python
    Replies:
    0
    Views:
    115
    Nick Cash
    Jan 18, 2013
  5. Tim Chase

    Re: Vote tallying...

    Tim Chase, Jan 18, 2013, in forum: Python
    Replies:
    0
    Views:
    115
    Tim Chase
    Jan 18, 2013
Loading...

Share This Page