need help extracting data from a text file

Discussion in 'Python' started by nephish@xit.net, Nov 7, 2005.

  1. Guest

    Hey there,
    i have a text file with a bunch of values scattered throughout it.
    i am needing to pull out a value that is in parenthesis right after a
    certain word,
    like the first time the word 'foo' is found, retrieve the values in the
    next set of parenthesis (bar) and it would return 'bar'

    i think i can use re to do this, but is there some easier way?
    thanks
     
    , Nov 7, 2005
    #1
    1. Advertising

  2. Iain King Guest

    wrote:
    > Hey there,
    > i have a text file with a bunch of values scattered throughout it.
    > i am needing to pull out a value that is in parenthesis right after a
    > certain word,
    > like the first time the word 'foo' is found, retrieve the values in the
    > next set of parenthesis (bar) and it would return 'bar'
    >
    > i think i can use re to do this, but is there some easier way?
    > thanks


    well, you can use string.find with offsets, but an re is probably a
    cleaner way to go. I'm not sure which way is faster - it'll depend on
    how many times you're searching compared to the overhead of setting up
    an re.

    start = textfile.find("foo(") + 4 # 4 being how long 'foo(' is
    end = textfile.find(")", start)
    value = textfile[start:end]

    Iain
     
    Iain King, Nov 7, 2005
    #2
    1. Advertising

  3. Guest

    this is cool, it is only going to run about 10 times a day,

    the text is not written out like foo(bar) its more like
    foo blah blah blah (bar)

    the thing is , every few days the structure of the textfile may change,
    one of the reasons i wanted to avoid the re.

    thanks for the tip,
     
    , Nov 7, 2005
    #3
  4. Iain King Guest

    wrote:
    > this is cool, it is only going to run about 10 times a day,
    >
    > the text is not written out like foo(bar) its more like
    > foo blah blah blah (bar)
    >


    then I guess you worked this out, but just for completeness:

    keywordPos = textfile.find("foo")
    start = textfile.find("(", keywordPos)
    end = textfile.find(")", start)
    value = textfile[start:end]


    Iain
     
    Iain King, Nov 7, 2005
    #4
  5. Guest

    um, wait. what you are doing here is easier than what i was doing after
    your first post.
    thanks a lot. this is going to work out ok.

    thanks again.
    sk
     
    , Nov 7, 2005
    #5
  6. Paul McGuire Guest

    <> wrote in message
    news:...
    > Hey there,
    > i have a text file with a bunch of values scattered throughout it.
    > i am needing to pull out a value that is in parenthesis right after a
    > certain word,
    > like the first time the word 'foo' is found, retrieve the values in the
    > next set of parenthesis (bar) and it would return 'bar'
    >
    > i think i can use re to do this, but is there some easier way?
    > thanks
    >

    Using string methods to locate the 'foo' instances is by far the fastest way
    to go.

    If your requirements get more complicated, look into using pyparsing
    (http://pyparsing.sourceforge.net). Here is a pyparsing rendition of this
    problem. This does three scans through some sample data - the first lists
    all matches, the second ignores matches if they are found inside a quoted
    string, and the third reports only the third match. This kind of
    context-sensitive matching gets trickier with basic string and re tools.

    -- Paul

    data = """
    i have a text file with a bunch of foo(bar1) values scattered throughout it.
    i am needing to pull out a value that foo(bar2) is in parenthesis right
    after a
    certain word,
    like the foo(bar3) first time the word 'foo' is found, retrieve the values
    in the
    next set of parenthesis foo(bar4) and it would return 'bar'
    do we want to skip things in quotes, such as 'foo(barInQuotes)'?
    """

    from pyparsing import Literal,SkipTo,quotedString

    pattern = Literal("foo") + "(" + SkipTo(")").setResultsName("payload") + ")"

    # report all occurrences of xxx found in "foo(xxx)"
    for tokens,start,end in pattern.scanString(data):
    print tokens.payload, "at location", start
    print

    # ignore quoted strings
    pattern.ignore(quotedString)
    for tokens,start,end in pattern.scanString(data):
    print tokens.payload, "at location", start
    print

    # only report 3rd occurrence
    tokenMatch = {'foo':0}
    def thirdTimeOnly(strg,loc,tokens):
    word = tokens[0]
    if word in tokenMatch:
    tokenMatch[word] += 1
    if tokenMatch[word] != 3:
    raise ParseException(strg,loc,"wrong occurrence of token")

    pattern.setParseAction(thirdTimeOnly)
    for tokens,start,end in pattern.scanString(data):
    print tokens.payload, "at location", start
    print

    Prints:
    bar1 at location 36
    bar2 at location 116
    bar3 at location 181
    bar4 at location 278
    barInQuotes at location 360

    bar1 at location 36
    bar2 at location 116
    bar3 at location 181
    bar4 at location 278

    bar3 at location 181
     
    Paul McGuire, Nov 7, 2005
    #6
  7. Kent Johnson Guest

    wrote:
    > Hey there,
    > i have a text file with a bunch of values scattered throughout it.
    > i am needing to pull out a value that is in parenthesis right after a
    > certain word,
    > like the first time the word 'foo' is found, retrieve the values in the
    > next set of parenthesis (bar) and it would return 'bar'
    >
    > i think i can use re to do this, but is there some easier way?


    It's pretty easy with an re:

    >>> import re
    >>> fooRe = re.compile(r'foo.*?\((.*?)\)')
    >>> fooRe.search('foo(bar)').group(1)

    'bar'
    >>> fooRe.search('This is a foo bar baz blah blah (bar)').group(1)

    'bar'

    Kent
     
    Kent Johnson, Nov 7, 2005
    #7
  8. Tom Anderson Guest

    On Mon, 7 Nov 2005, Kent Johnson wrote:

    > wrote:
    >
    >> i have a text file with a bunch of values scattered throughout it. i am
    >> needing to pull out a value that is in parenthesis right after a
    >> certain word, like the first time the word 'foo' is found, retrieve the
    >> values in the next set of parenthesis (bar) and it would return 'bar'

    >
    > It's pretty easy with an re:
    >
    >>>> import re
    >>>> fooRe = re.compile(r'foo.*?\((.*?)\)')


    Just out of interest, i've never really got into using non-greedy
    quantifiers (i use them from time to time, but hardly ever feel the need
    for them), so my instinct would have been to write this as:

    >>> fooRe = re.compile(r"foo[^(]*\(([^)]*)\)")


    Is there any reason to use one over the other?

    >>>> fooRe.search('foo(bar)').group(1)

    > 'bar'
    >>>> fooRe.search('This is a foo bar baz blah blah (bar)').group(1)

    > 'bar'


    Ditto.

    tom

    --
    [of Muholland Drive] Cancer is pretty ingenious too, but its best to
    avoid. -- Tex
     
    Tom Anderson, Nov 9, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Max
    Replies:
    6
    Views:
    6,143
    Malcolm Dew-Jones
    Sep 17, 2004
  2. Trader
    Replies:
    2
    Views:
    339
    Trader
    Aug 26, 2003
  3. Vumani Dlamini
    Replies:
    5
    Views:
    187
    Michele Dondi
    Jan 9, 2004
  4. Michael Hill

    Extracting Numerica Data Pairs from Text Box

    Michael Hill, Feb 10, 2005, in forum: Javascript
    Replies:
    5
    Views:
    235
    Michael Hill
    Feb 15, 2005
  5. Replies:
    5
    Views:
    110
    Chris Angelico
    May 14, 2014
Loading...

Share This Page