SQL and CSV

Discussion in 'Python' started by Nick, May 5, 2009.

  1. Nick

    Nick Guest

    I have a requirement to read a CSV file. Normally, no problem, just
    import CSV and slurp the file up.

    However, in this case I want to filter out lines that have fields set
    to particular values.

    It would be neat to be able to do something like this.

    select * from test.csv where status <> "Canceled"

    Using adodb I can do this, so long as I don't have the where clause. :-
    (

    Is there a reasonable lightweight way of doing this in Python?

    I could write some python code that is used to filter rows, and inport
    that from config, but it's not quite as elegant as an SQL route.

    Thanks

    Nick
    Nick, May 5, 2009
    #1
    1. Advertising

  2. Nick

    Tim Golden Guest

    Nick wrote:
    > I have a requirement to read a CSV file. Normally, no problem, just
    > import CSV and slurp the file up.
    >
    > However, in this case I want to filter out lines that have fields set
    > to particular values.
    >
    > It would be neat to be able to do something like this.
    >
    > select * from test.csv where status <> "Canceled"
    >
    > Using adodb I can do this, so long as I don't have the where clause. :-
    > (
    >
    > Is there a reasonable lightweight way of doing this in Python?
    >
    > I could write some python code that is used to filter rows, and inport
    > that from config, but it's not quite as elegant as an SQL route.



    Not entirely clear what you are and aren't prepared to try here, but...
    the most obvious Python-based way to do this is treating the csv reader
    as an iterator and filtering there. Your last line suggests that's not
    what you want but just in case I've misunderstood:

    <test.csv>
    id,code,status
    1,"ONE","Active"
    2,"TWO","Cancelled"
    3,"THREE","Active"
    </test.csv>

    <code>
    import csv

    for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
    if row['status'] != 'Cancelled':
    print row

    </code>

    Doesn't seem too onerous, and could obviously be wrapped in
    some useful class/module.

    But if you really want to go the SQL route, I believe there are
    ODBC adapters for CSV which, combined with PyODBC or CeODBC,
    would probably take you where you want to go.

    TJG
    Tim Golden, May 5, 2009
    #2
    1. Advertising

  3. Nick

    Nick Guest

    On May 5, 5:19 pm, Tim Golden <> wrote:
    > Nick wrote:
    > > I have a requirement to read a CSV file. Normally, no problem, just
    > > import CSV and slurp the file up.

    >
    > > However, in this case I want to filter out lines that have fields set
    > > to particular values.

    >
    > > It would be neat to be able to do something like this.

    >
    > > select * from test.csv where status <> "Canceled"

    >
    > > Using adodb I can do this, so long as I don't have the where clause. :-
    > > (

    >
    > > Is there a reasonable lightweight way of doing this in Python?

    >
    > > I could write some python code that is used to filter rows, and inport
    > > that from config, but it's not quite as elegant as an SQL route.

    >
    > Not entirely clear what you are and aren't prepared to try here, but...
    > the most obvious Python-based way to do this is treating the csv reader
    > as an iterator and filtering there. Your last line suggests that's not
    > what you want but just in case I've misunderstood:
    >
    > <test.csv>
    > id,code,status
    > 1,"ONE","Active"
    > 2,"TWO","Cancelled"
    > 3,"THREE","Active"
    > </test.csv>
    >
    > <code>
    > import csv
    >
    > for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
    >   if row['status'] != 'Cancelled':
    >     print row
    >
    > </code>
    >
    > Doesn't seem too onerous, and could obviously be wrapped in
    > some useful class/module.
    >
    > But if you really want to go the SQL route, I believe there are
    > ODBC adapters for CSV which, combined with PyODBC or CeODBC,
    > would probably take you where you want to go.
    >
    > TJG


    Part of the problem is that the 'selection' needs to be in a config
    file. I can put the if row['status'] != 'Cancelled': return True into
    a config, read it and eval it, but its not quite as clean as an sql
    route.

    Nick
    Nick, May 5, 2009
    #3
  4. Nick

    Matimus Guest

    On May 5, 9:25 am, Nick <> wrote:
    > On May 5, 5:19 pm, Tim Golden <> wrote:
    >
    >
    >
    > > Nick wrote:
    > > > I have a requirement to read a CSV file. Normally, no problem, just
    > > > import CSV and slurp the file up.

    >
    > > > However, in this case I want to filter out lines that have fields set
    > > > to particular values.

    >
    > > > It would be neat to be able to do something like this.

    >
    > > > select * from test.csv where status <> "Canceled"

    >
    > > > Using adodb I can do this, so long as I don't have the where clause. :-
    > > > (

    >
    > > > Is there a reasonable lightweight way of doing this in Python?

    >
    > > > I could write some python code that is used to filter rows, and inport
    > > > that from config, but it's not quite as elegant as an SQL route.

    >
    > > Not entirely clear what you are and aren't prepared to try here, but...
    > > the most obvious Python-based way to do this is treating the csv reader
    > > as an iterator and filtering there. Your last line suggests that's not
    > > what you want but just in case I've misunderstood:

    >
    > > <test.csv>
    > > id,code,status
    > > 1,"ONE","Active"
    > > 2,"TWO","Cancelled"
    > > 3,"THREE","Active"
    > > </test.csv>

    >
    > > <code>
    > > import csv

    >
    > > for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
    > >   if row['status'] != 'Cancelled':
    > >     print row

    >
    > > </code>

    >
    > > Doesn't seem too onerous, and could obviously be wrapped in
    > > some useful class/module.

    >
    > > But if you really want to go the SQL route, I believe there are
    > > ODBC adapters for CSV which, combined with PyODBC or CeODBC,
    > > would probably take you where you want to go.

    >
    > > TJG

    >
    > Part of the problem is that the 'selection' needs to be in a config
    > file. I can put the if row['status'] != 'Cancelled': return True into
    > a config, read it and eval it, but its not quite as clean as an sql
    > route.
    >
    > Nick


    Well, if you are using 2.5.x you could always stuff it into a sqlite
    in-memory database, and then execute a SQL query. Heck, you don't even
    _need_ 2.5, but in 2.5 sqlite is part of the distribution.

    Matt
    Matimus, May 5, 2009
    #4
  5. Nick

    Tim Golden Guest

    Nick wrote:
    > Part of the problem is that the 'selection' needs to be in a config
    > file. I can put the if row['status'] != 'Cancelled': return True into
    > a config, read it and eval it, but its not quite as clean as an sql
    > route.



    Still not clear what the restriction is. If you were writing
    SQL you'd have to read *something* from your config file,
    unless you're suggesting that the "config file" is in fact
    a SQL file. Which is one way of doing it, but then you might
    just as well have your config file as a Python file and
    import it.

    Have I missed the point somewhere here? Can you give an
    example -- even a fictional one -- of what you couldn't
    do using, say, the example I gave earlier?

    TJG
    Tim Golden, May 5, 2009
    #5
  6. Nick

    Nick Guest

    On May 5, 8:27 pm, Tim Golden <> wrote:
    > Nick wrote:
    > > Part of the problem is that the 'selection' needs to be in a config
    > > file. I can put the if row['status'] != 'Cancelled': return True into
    > > a config, read it and eval it, but its not quite as clean as ansql
    > > route.

    >
    > Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,
    > unless you're suggesting that the "config file" is in fact
    > aSQLfile. Which is one way of doing it, but then you might
    > just as well have your config file as a Python file and
    > import it.
    >
    > Have I missed the point somewhere here? Can you give an
    > example -- even a fictional one -- of what you couldn't
    > do using, say, the example I gave earlier?
    >
    > TJG


    Solution found. In the end I used SQLite to read from a csv file, and
    now I can query the CSV file. The file is read using the csv module

    First create a function

    def fraction(p, denom):
    num, frac = p.split ('-')
    return float (num) + float (frac) / denom

    for use within queries.

    Now build the class.

    self.filename = filename
    self.dialect = dialect
    self.query = query
    reader = csv.reader (open (filename, 'r'))
    self.connection = sqlite.connect(":memory:")
    self.connection.create_function("fraction", 2, fraction) #
    Adds in function
    self.cursor = self.connection.cursor()
    first = True
    for row in reader:
    if first:
    headers = []
    for r in row:
    n = r.strip().replace (' ', '_').replace ('-','_')
    headers.append (n)
    command = 'create table csv (%s)' % ','.join (headers)
    self.cursor.execute (command)
    first = False
    else:
    command = 'insert into csv values ("%s")' % '","'.join
    (row)
    self.cursor.execute (command)

    and then I can use this

    self.cursor.execute (self.query)
    rows = self.cursor.fetchall()
    headers = []
    for r in self.cursor.description:
    headers.append (r[0])
    results = Results.Results (headers, self.name, {})
    i = 0
    for row in rows:
    results.add (row, i)
    i = i + 1
    return results

    to query the results.

    Results.Results is one of my classes that's reused in lots of places.

    The query then looks somethign like this

    select
    Client_Reference_Number as TrdNbr,
    Asset_Number as ISIN,
    Quantity as Qty,
    status
    from
    csv
    where status in ("CANCELLED")

    union

    select
    Client_Reference_Number as TrdNbr,
    Asset_Number as ISIN,
    Quantity as Qty,
    status
    from
    csv
    where status not in ("CANCELLED")


    All incredibly neat and the first time I've used SQLite.

    nick
    Nick, May 7, 2009
    #6
  7. Nick

    John Machin Guest

    On May 8, 1:45 am, Nick <> wrote:
    > On May 5, 8:27 pm, Tim Golden <> wrote:
    >
    >
    >
    > > Nick wrote:
    > > > Part of the problem is that the 'selection' needs to be in a config
    > > > file. I can put the if row['status'] != 'Cancelled': return True into
    > > > a config, read it and eval it, but its not quite as clean as ansql
    > > > route.

    >
    > > Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,
    > > unless you're suggesting that the "config file" is in fact
    > > aSQLfile. Which is one way of doing it, but then you might
    > > just as well have your config file as a Python file and
    > > import it.

    >
    > > Have I missed the point somewhere here? Can you give an
    > > example -- even a fictional one -- of what you couldn't
    > > do using, say, the example I gave earlier?

    >
    > > TJG

    >
    > Solution found. In the end I used SQLite to read from a csv file, and
    > now I can query the CSV file. The file is read using the csv module
    >
    > First create a function
    >
    > def fraction(p, denom):
    >     num, frac = p.split ('-')
    >     return float (num) + float (frac) / denom
    >
    > for use within queries.
    >
    > Now build the class.
    >
    >         self.filename  = filename
    >         self.dialect   = dialect
    >         self.query     = query
    >         reader = csv.reader (open (filename, 'r'))
    >         self.connection = sqlite.connect(":memory:")
    >         self.connection.create_function("fraction", 2, fraction) #
    > Adds in function
    >         self.cursor = self.connection.cursor()
    >         first = True
    >         for row in reader:
    >             if first:
    >                 headers = []
    >                 for r in row:
    >                     n = r.strip().replace (' ', '_').replace ('-','_')
    >                     headers.append (n)
    >                 command = 'create table csv (%s)' % ','..join (headers)
    >                 self.cursor.execute (command)
    >                 first = False
    >             else:
    >                 command = 'insert into csv values ("%s")' % '","'.join
    > (row)
    >                 self.cursor.execute (command)
    >
    > and then I can use this
    >
    >         self.cursor.execute (self.query)
    >         rows = self.cursor.fetchall()
    >         headers = []
    >         for r in self.cursor.description:
    >             headers.append (r[0])
    >         results = Results.Results (headers, self.name, {})
    >         i = 0
    >         for row in rows:
    >             results.add (row, i)
    >             i = i + 1
    >         return results
    >
    > to query the results.
    >
    > Results.Results is one of my classes that's reused in lots of places.
    >
    > The query then looks somethign like this
    >
    >                 select
    >                     Client_Reference_Number as TrdNbr,
    >                     Asset_Number as ISIN,
    >                     Quantity as Qty,
    >                     status
    >                 from
    >                     csv


    The remaining lines of your SELECT statement are incredibly redundant
    AFAICT. It seems you have pushed the contents of your csv file into a
    data base and pulled them ALL out again ... not what I'd call a
    "query". What's the point?


    >                 where status in ("CANCELLED")
    >
    >                 union
    >
    >                 select
    >                     Client_Reference_Number as TrdNbr,
    >                     Asset_Number as ISIN,
    >                     Quantity as Qty,
    >                     status
    >                 from
    >                     csv
    >                 where status not in ("CANCELLED")
    >
    > All incredibly neat and the first time I've used SQLite.
    >
    > nick
    John Machin, May 8, 2009
    #7
  8. On Thu, 7 May 2009 08:45:06 -0700 (PDT), Nick <>
    declaimed the following in gmane.comp.python.general:


    > The query then looks somethign like this
    >
    > select
    > Client_Reference_Number as TrdNbr,
    > Asset_Number as ISIN,
    > Quantity as Qty,
    > status
    > from
    > csv
    > where status in ("CANCELLED")
    >
    > union
    >
    > select
    > Client_Reference_Number as TrdNbr,
    > Asset_Number as ISIN,
    > Quantity as Qty,
    > status
    > from
    > csv
    > where status not in ("CANCELLED")
    >

    Pardon? The union of "in ('CANCELLED')" and "not in ('CANCELLED')"
    will be ALL records... -- possibly ordered with the "cancelled" first...

    I'd probably just use

    select ... from csv
    order by status;

    to group the values.
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
    Dennis Lee Bieber, May 8, 2009
    #8
  9. Nick

    Peter Otten Guest

    Nick wrote:

    > self.cursor = self.connection.cursor()
    > first = True
    > for row in reader:
    > if first:
    > headers = []
    > for r in row:
    > n = r.strip().replace (' ', '_').replace ('-','_')
    > headers.append (n)
    > command = 'create table csv (%s)' % ','.join (headers)
    > self.cursor.execute (command)
    > first = False
    > else:
    > command = 'insert into csv values ("%s")' % '","'.join
    > (row)
    > self.cursor.execute (command)
    >


    You can simplify that a bit:

    cursor = self.cursor = self.connection.cursor()

    first_row = next(reader)
    headers = [column.strip().replace(" ", "_").replace("-", "_") for column in
    first_row]
    cursor.execute("create table csv (%s)" % ", ".join(headers))

    placeholders = ", ".join("?"*len(headers))
    command = "insert into csv values (%s)" % placeholders
    cursor.executemany(command, reader)

    While it may not matter here using placeholders instead of manually escaping
    user-provided values is a good habit to get into.

    > self.cursor.execute (self.query)
    > rows = self.cursor.fetchall()


    rows = self.cursor.execute(self.query)

    doesn't build an intermediate list.

    > i = 0
    > for row in rows:
    > results.add (row, i)
    > i = i + 1


    This is written

    for i, row in enumerate(rows):
    results.add(row, i)

    in idiomatic Python.

    Peter
    Peter Otten, May 8, 2009
    #9
  10. In message <gu0ofm$oj9$00$-online.com>, Peter Otten wrote:

    > While it may not matter here using placeholders instead of manually
    > escaping user-provided values is a good habit to get into.


    Until you hit things it can't deal with.
    Lawrence D'Oliveiro, May 8, 2009
    #10
  11. Nick

    Nick Guest

    On May 8, 1:49 pm, "andrew cooke" <> wrote:
    > Lawrence D'Oliveiro wrote:
    > > In message <gu0ofm$oj9$-online.com>, Peter Otten wrote:

    >
    > >> While it may not matter here using placeholders instead of manually
    > >> escaping user-provided values is a good habit to get into.

    >
    > > Until you hit things it can't deal with.

    >
    > The post you are replying to was talking about using the SQL library's "?"
    > syntax that automatically escapes values.  The usual reason this is
    > recommended (if I have understood correctly) is that the library code is
    > much more likely to foil injection attacks.  I have seen this mentioned
    > often and assume it is good advice.
    >
    > Can you expand on your comment?  I assume you are thinking of how the
    > library might handle some strange class.  But aren't the number of types
    > limited by SQL?  In which case a "thing that can't be handled" could
    > presumably be managed by adding an appropriate __str__ or __float__ or
    > whatever?  And you would still use the library to give safety with other
    > values.
    >
    > Maybe you could give an example of the kind of problem you're thinking of?
    >
    > Thanks,
    > Andrew


    Injection attacks aren't an issue, its a local app.

    It's part of a reconciliation system, where sometimes data is in csv
    files. If you want the whole csv file, you can use csv module without
    a problem.

    In some cases, I need to manipulate the data.

    The choices are hard code the manipulation, or load the data from a
    config file.

    So what I've got is the query in the config and I can process it.

    Nick
    Nick, May 8, 2009
    #11
  12. In message <>, andrew
    cooke wrote:

    > Lawrence D'Oliveiro wrote:
    >> In message <gu0ofm$oj9$00$-online.com>, Peter Otten wrote:
    >>
    >>> While it may not matter here using placeholders instead of manually
    >>> escaping user-provided values is a good habit to get into.

    >>
    >> Until you hit things it can't deal with.

    >
    > Can you expand on your comment? I assume you are thinking of how the
    > library might handle some strange class.


    Consider something as simple as a "like" clause.
    Lawrence D'Oliveiro, May 9, 2009
    #12
  13. In message <>, andrew
    cooke wrote:

    > even if you're not open to injection attacks, you're still less likely to
    > get escaping correct than a puprose written, widely used library.


    Escaping isn't really that hard to do. For example, the rules for MySQL
    literals are clearly spelled out here
    <http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>. You can easily
    check that against my quoting routines here
    <http://codecodex.com/wiki/index.php?title=Useful_MySQL_Routines#Quoting>.
    Lawrence D'Oliveiro, May 9, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michal Mikolajczyk
    Replies:
    0
    Views:
    632
    Michal Mikolajczyk
    Feb 13, 2004
  2. Skip Montanaro
    Replies:
    0
    Views:
    700
    Skip Montanaro
    Feb 13, 2004
  3. Tintin92
    Replies:
    1
    Views:
    1,676
    Andrew Thompson
    Feb 14, 2007
  4. jliu66
    Replies:
    0
    Views:
    498
    jliu66
    Oct 19, 2007
  5. Sacha Rook

    csv read clean up and write out to csv

    Sacha Rook, Nov 2, 2012, in forum: Python
    Replies:
    2
    Views:
    206
    Hans Mulder
    Nov 2, 2012
Loading...

Share This Page