Parse specific text in email body to CSV file

Discussion in 'Python' started by dpw.asdf@gmail.com, Mar 8, 2008.

  1. Guest

    I have been searching all over for a solution to this. I am new to
    Python, so I'm a little lost. Any pointers would be a great help. I
    have a couple hundred emails that contain data I would like to
    incorporate into a database or CSV file. I want to search the email
    for specific text.

    The emails basically look like this:



    random text _important text:_15648 random text random text random text
    random text
    random text random text random text _important text:_15493 random text
    random text
    random text random text _important text:_11674 random text random text
    random text
    ===============Date: Wednesday March 5, 2008================
    name1: 15 name5: 14

    name2: 18 name6: 105

    name3: 64 name7: 2

    name4: 24 name8: 13



    I want information like "name1: 15" to be placed into the CSV with the
    name "name1" and the value "15". The same goes for the date and
    "_important text:_15493".

    I would like to use this CSV or database to plot a graph with the
    data.

    Thanks!
     
    , Mar 8, 2008
    #1
    1. Advertising

  2. Paul McGuire Guest

    On Mar 8, 4:20 pm, wrote:
    > I have been searching all over for a solution to this. I am new to
    > Python, so I'm a little lost. Any pointers would be a great help. I
    > have a couple hundred emails that contain data I would like to
    > incorporate into a database or CSV file. I want to search the email
    > for specific text.
    >
    > The emails basically look like this:
    >
    > random text _important text:_15648 random text random text random text
    > random text
    > random text random text random text _important text:_15493 random text
    > random text
    > random text random text _important text:_11674 random text random text
    > random text
    > ===============Date: Wednesday March 5, 2008================
    > name1: 15                name5: 14
    >
    > name2: 18                name6: 105
    >
    > name3: 64                name7: 2
    >
    > name4: 24                name8: 13
    >
    > I want information like "name1: 15" to be placed into the CSV with the
    > name "name1" and the value "15". The same goes for the date and
    > "_important text:_15493".
    >
    > I would like to use this CSV or database to plot a graph with the
    > data.
    >
    > Thanks!


    This kind of work can be done using pyparsing. Here is a starting
    point for you:

    from pyparsing import Word, oneOf, nums, Combine
    import calendar

    text = """
    random text _important text:_15648 random text random text random
    text
    random text
    random text random text random text _important text:_15493 random
    text
    random text
    random text random text _important text:_11674 random text random
    text
    random text
    ===============Date: Wednesday March 5, 2008================
    name1: 15 name5: 14

    name2: 18 name6: 105

    name3: 64 name7: 2

    name4: 24 name8: 13
    """

    integer = Word(nums)

    IMPORTANT_TEXT = "_important text:_" + integer("value")
    monthName = oneOf( list(calendar.month_name) )
    dayName = oneOf( list(calendar.day_name) )
    date = dayName("dayOfWeek") + monthName("month") + integer("day") + \
    "," + integer("year")
    DATE = Word("=").suppress() + "Date:" + date("date") +
    Word("=").suppress()
    NAMEDATA = Combine("name" + integer)("name") + ':' + integer("value")

    for match in (IMPORTANT_TEXT | DATE | NAMEDATA).searchString(text):
    print match.dump()

    Prints:

    ['_important text:_', '15648']
    - value: 15648
    ['_important text:_', '15493']
    - value: 15493
    ['_important text:_', '11674']
    - value: 11674
    ['Date:', 'Wednesday', 'March', '5', ',', '2008']
    - date: ['Wednesday', 'March', '5', ',', '2008']
    - day: 5
    - dayOfWeek: Wednesday
    - month: March
    - year: 2008
    - day: 5
    - dayOfWeek: Wednesday
    - month: March
    - year: 2008
    ['name1', ':', '15']
    - name: name1
    - value: 15
    ['name5', ':', '14']
    - name: name5
    - value: 14
    ['name2', ':', '18']
    - name: name2
    - value: 18
    ['name6', ':', '105']
    - name: name6
    - value: 105
    ['name3', ':', '64']
    - name: name3
    - value: 64
    ['name7', ':', '2']
    - name: name7
    - value: 2
    ['name4', ':', '24']
    - name: name4
    - value: 24
    ['name8', ':', '13']
    - name: name8
    - value: 13

    Find out more about pyparsing at http://pyparsing.wikispaces.com.

    -- Paul
     
    Paul McGuire, Mar 9, 2008
    #2
    1. Advertising

  3. Miki Guest

    Hello,
    >

    I have been searching all over for a solution to this. I am new to
    > Python, so I'm a little lost. Any pointers would be a great help. I
    > have a couple hundred emails that contain data I would like to
    > incorporate into a database or CSV file. I want to search the email
    > for specific text.
    >
    > The emails basically look like this:
    >
    > random text _important text:_15648 random text random text random text
    > random text
    > random text random text random text _important text:_15493 random text
    > random text
    > random text random text _important text:_11674 random text random text
    > random text
    > ===============Date: Wednesday March 5, 2008================
    > name1: 15                name5: 14
    >
    > name2: 18                name6: 105
    >
    > name3: 64                name7: 2
    >
    > name4: 24                name8: 13
    >
    > I want information like "name1: 15" to be placed into the CSV with the
    > name "name1" and the value "15". The same goes for the date and
    > "_important text:_15493".
    >
    > I would like to use this CSV or database to plot a graph with the
    > data.

    import re

    for match in re.finditer("_([\w ]+):_(\d+)", text):
    print match.groups()[0], match.groups()[1]

    for match in re.finditer("Date: ([^=]+)=", text):
    print match.groups()[0]

    for match in re.finditer("(\w+): (\d+)", text):
    print match.groups()[0], match.groups()[1]


    Now you have two problems :)

    HTH,
    --
    Miki <>
    http://pythonwise.blogspot.com
     
    Miki, Mar 9, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ram Laxman
    Replies:
    22
    Views:
    924
    Programmer Dude
    Feb 11, 2004
  2. Ram Laxman

    How to Parse a CSV formatted text file

    Ram Laxman, Feb 7, 2004, in forum: C Programming
    Replies:
    22
    Views:
    1,132
    Programmer Dude
    Feb 11, 2004
  3. sso
    Replies:
    20
    Views:
    2,707
    Martin Gregorie
    Apr 26, 2009
  4. Replies:
    5
    Views:
    220
    A. Sinan Unur
    Sep 8, 2005
  5. Zhen Zhang

    parse a csv file into a text file

    Zhen Zhang, Feb 6, 2014, in forum: Python
    Replies:
    29
    Views:
    141
    Tim Chase
    Feb 6, 2014
Loading...

Share This Page