question about csv.DictReader

Discussion in 'Python' started by Norman Clerman, Apr 4, 2013.

  1. Hello,

    I have the following python script (some of lines are wrapped):

    #! /usr/bin/env python

    import csv

    def dict_test_1():
    """ csv test program """

    # Open the file Holdings_EXA.csv
    HOLDING_FILE = 'Holdings_EXA.csv'
    try:
    csv_file = open(HOLDING_FILE, 'rt')
    except IOError:
    print('Problem opening {0}\nExiting').format(HOLDING_FILE)
    exit()

    # create a dictionary reader
    try:
    csv_reader = csv.DictReader(csv_file)
    except NameError:
    print('Cannot find file {0} to create a dictionary reader \nExiting').format(HOLDING_FILE)
    exit()

    # Print the keys in each row
    i_row = 1
    for row in csv_reader:
    print ('There are {0} keys in row {1}').format(len(row.keys()), i_row)
    print ('The keys in row {0} are \n{1}').format(i_row, row.keys())
    i_row += 1
    dict_test_1()

    Here are the lines in file Holdings_EXA.csv:
    Please note that the first field in the first row is "Holdings"

    "Holdings","Weighting","Type","Ticker","Style","First Bought","Shares Owned","Shares Change","Sector","Price","Day Change","Day high/low","Volume","52-Wk high/low","Country","3-Month Return","1-Year Return","3-Year Return","5-Year Return","Market Cap Mil","Currency","Morningstar Rating","YTD Return","P/E","Maturity Date","Coupon %","Yield to Maturity"
    "Nestle SA","1.91","EQUITY","NESN","Large Core","1999-12-31","3732276","197810","Consumer Defensive","67.65","-","67.75-67.35","1211531","67.75-53.8","Switzerland","10.42","21.25","10.5","8.84","213475.59","CHF","2","12.92","21.69","-","-","-"
    "HSBC Holdings PLC","1.75","EQUITY","HSBA","Large Value","1999-12-31","21120203","1711934","Financial Services","733.3","-1.4|-0","738.8-731","7839724","739.9-501.2","United Kingdom","14.51","37.17","3.88","2.77","132694.66","GBP","3","13.93","15.55","-","-","-"
    "Novartis AG","1.33","EQUITY","NOVN","Large Core","2003-06-30","2669523","206851","Healthcare","65.95","0.5|0.01","66-65.4","1121549","66-48.29","Switzerland","15.1","36.5","6.16","8.53","158671.66","CHF","4","16.7","17.76","-","-","-"
    "Roche Holding AG","1.31","EQUITY","ROG","Large Growth","2003-05-31","817830","59352","Healthcare","214.8","1.4|0.01","215.2-213.1","684173","220.4-148.4","Switzerland","17.45","37.95","7.78","4.09","34000","CHF","3","18.09","19.05","-","-","-"

    Finally, here are the results of running the script:


    norm@lima:~/python/overlap$ python dict_test_1.py
    There are 27 keys in row 1
    The keys in row 1 are
    ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    There are 27 keys in row 2
    The keys in row 2 are
    ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    There are 27 keys in row 3
    The keys in row 3 are
    ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    There are 27 keys in row 4
    The keys in row 4 are
    ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date','1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    norm@lima:~/python/overlap$


    Can anyone explain the presence of the characters "\xref\xbb\xbf" before the first field contents "Holdings" ?

    Thanks,
    Norm
     
    Norman Clerman, Apr 4, 2013
    #1
    1. Advertising

  2. Norman Clerman

    MRAB Guest

    On 04/04/2013 02:26, Norman Clerman wrote:
    > Hello,
    >
    > I have the following python script (some of lines are wrapped):
    >
    > #! /usr/bin/env python
    >
    > import csv
    >
    > def dict_test_1():
    > """ csv test program """
    >
    > # Open the file Holdings_EXA.csv
    > HOLDING_FILE = 'Holdings_EXA.csv'
    > try:
    > csv_file = open(HOLDING_FILE, 'rt')
    > except IOError:
    > print('Problem opening {0}\nExiting').format(HOLDING_FILE)
    > exit()
    >
    > # create a dictionary reader
    > try:
    > csv_reader = csv.DictReader(csv_file)
    > except NameError:
    > print('Cannot find file {0} to create a dictionary reader \nExiting').format(HOLDING_FILE)
    > exit()
    >
    > # Print the keys in each row
    > i_row = 1
    > for row in csv_reader:
    > print ('There are {0} keys in row {1}').format(len(row.keys()), i_row)
    > print ('The keys in row {0} are \n{1}').format(i_row, row.keys())
    > i_row += 1
    > dict_test_1()
    >
    > Here are the lines in file Holdings_EXA.csv:
    > Please note that the first field in the first row is "Holdings"
    >
    > "Holdings","Weighting","Type","Ticker","Style","First Bought","Shares Owned","Shares Change","Sector","Price","Day Change","Day high/low","Volume","52-Wk high/low","Country","3-Month Return","1-Year Return","3-Year Return","5-Year Return","Market Cap Mil","Currency","Morningstar Rating","YTD Return","P/E","Maturity Date","Coupon %","Yield to Maturity"
    > "Nestle SA","1.91","EQUITY","NESN","Large Core","1999-12-31","3732276","197810","Consumer Defensive","67.65","-","67.75-67.35","1211531","67.75-53.8","Switzerland","10.42","21.25","10.5","8.84","213475.59","CHF","2","12.92","21.69","-","-","-"
    > "HSBC Holdings PLC","1.75","EQUITY","HSBA","Large Value","1999-12-31","21120203","1711934","Financial Services","733.3","-1.4|-0","738.8-731","7839724","739.9-501.2","United Kingdom","14.51","37.17","3.88","2.77","132694.66","GBP","3","13.93","15.55","-","-","-"
    > "Novartis AG","1.33","EQUITY","NOVN","Large Core","2003-06-30","2669523","206851","Healthcare","65.95","0.5|0.01","66-65.4","1121549","66-48.29","Switzerland","15.1","36.5","6.16","8.53","158671.66","CHF","4","16.7","17.76","-","-","-"
    > "Roche Holding AG","1.31","EQUITY","ROG","Large Growth","2003-05-31","817830","59352","Healthcare","214.8","1.4|0.01","215.2-213.1","684173","220.4-148.4","Switzerland","17.45","37.95","7.78","4.09","34000","CHF","3","18.09","19.05","-","-","-"
    >
    > Finally, here are the results of running the script:
    >
    >
    > norm@lima:~/python/overlap$ python dict_test_1.py
    > There are 27 keys in row 1
    > The keys in row 1 are
    > ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    > There are 27 keys in row 2
    > The keys in row 2 are
    > ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    > There are 27 keys in row 3
    > The keys in row 3 are
    > ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    > There are 27 keys in row 4
    > The keys in row 4 are
    > ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
    > norm@lima:~/python/overlap$
    >
    >
    > Can anyone explain the presence of the characters "\xref\xbb\xbf" before the first field contents "Holdings" ?
    >

    Microsoft Windows indicates that a text file contains text encoded as
    UTF-8 by including a signature at its start. (Does the file also have
    "\r\n" line endings? Presumably it was created on a Windows system.)

    Try opening the file with the "utf-8-sig" encoding instead; this will
    drop the signature if present.
     
    MRAB, Apr 4, 2013
    #2
    1. Advertising

  3. Norman Clerman

    Tim Chase Guest

    On 2013-04-03 18:26, Norman Clerman wrote:
    > Can anyone explain the presence of the characters "\xref\xbb\xbf"
    > before the first field contents "Holdings" ?


    (you mean "\xef", not "\xref")

    This is a byte-order-mark (BOM), which you can read about at [1]. In
    this case, it denotes the file as UTF-8 encoded. Certain programs
    insert these, though it's more important with UTF-16 or UTF-32
    encodings where the byte-order and endian'ness actually matters. I
    believe Notepad and Visual Studio on Win32 were both offenders when
    it came to inserting unbidden BOMs.

    -tkc

    [1]
    http://en.wikipedia.org/wiki/Byte_order_mark
     
    Tim Chase, Apr 4, 2013
    #3
  4. Thanks for your replies. Greatly appreciated.

    Norm
     
    Norman Clerman, Apr 4, 2013
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Blaine

    CSV module, DictReader problem (bug?)

    Jeff Blaine, Nov 1, 2006, in forum: Python
    Replies:
    10
    Views:
    593
    Fredrik Lundh
    Nov 2, 2006
  2. brnstrmrs

    csv dictreader

    brnstrmrs, Mar 19, 2008, in forum: Python
    Replies:
    6
    Views:
    938
    Dennis Lee Bieber
    Mar 21, 2008
  3. Laszlo Nagy

    csv.DictReader and unicode

    Laszlo Nagy, Apr 7, 2008, in forum: Python
    Replies:
    3
    Views:
    799
    Peter Otten
    Apr 7, 2008
  4. loveic
    Replies:
    0
    Views:
    847
    loveic
    Aug 26, 2009
  5. Tim
    Replies:
    1
    Views:
    331
    Peter Otten
    Jul 5, 2010
Loading...

Share This Page