re beginner

Discussion in 'Python' started by SuperHik, Jun 4, 2006.

  1. SuperHik

    SuperHik Guest

    hi all,

    I'm trying to understand regex for the first time, and it would be very
    helpful to get an example. I have an old(er) script with the following
    task - takes a string I copy-pasted and wich always has the same format:

    >>> print stuff

    Yellow hat 2 Blue shirt 1
    White socks 4 Green pants 1
    Blue bag 4 Nice perfume 3
    Wrist watch 7 Mobile phone 4
    Wireless cord! 2 Building tools 3
    One for the money 7 Two for the show 4

    >>> stuff

    'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'

    I want to put items from stuff into a dict like this:
    >>> print mydict

    {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White socks':
    4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the money': 7,
    'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow hat': 2,
    'Building tools': 3}

    Here's how I did it:
    >>> def putindict(items):

    .... items = items.replace('\n', '\t')
    .... items = items.split('\t')
    .... d = {}
    .... for x in xrange( len(items) ):
    .... if not items[x].isdigit(): d[items[x]] = int(items[x+1])
    .... return d
    >>>
    >>> mydict = putindict(stuff)



    I was wondering is there a better way to do it using re module?
    perheps even avoiding this for loop?

    thanks!
     
    SuperHik, Jun 4, 2006
    #1
    1. Advertising

  2. SuperHik

    Guest

    SuperHik wrote:
    > I was wondering is there a better way to do it using re module?
    > perheps even avoiding this for loop?


    This is a way to do the same thing without REs:

    data = 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen
    pants\t1\nBlue bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile
    phone\t4\nWireless cord!\t2\tBuilding tools\t3\nOne for the
    money\t7\tTwo for the show\t4'

    data2 = data.replace("\n","\t").split("\t")
    result1 = dict( zip(data2[::2], map(int, data2[1::2])) )

    O if you want to be light:

    from itertools import imap, izip, islice
    data2 = data.replace("\n","\t").split("\t")
    strings = islice(data2, 0, len(data), 2)
    numbers = islice(data2, 1, len(data), 2)
    result2 = dict( izip(strings, imap(int, numbers)) )

    Bye,
    bearophile
     
    , Jun 4, 2006
    #2
    1. Advertising

  3. SuperHik

    faulkner Guest

    you could write a function which takes a match object and modifies d,
    pass the function to re.sub, and ignore what re.sub returns.

    # untested code
    d = {}
    def record(match):
    s = match.string[match.start() : match.end()]
    i = s.index('\t')
    print s, i # debugging
    d[s[:i]] = int(s[i+1:])
    return ''
    re.sub('\w+\t\d+\t', record, stuff)
    # end code

    it may be a bit faster, but it's very roundabout and difficult to
    debug.

    SuperHik wrote:
    > hi all,
    >
    > I'm trying to understand regex for the first time, and it would be very
    > helpful to get an example. I have an old(er) script with the following
    > task - takes a string I copy-pasted and wich always has the same format:
    >
    > >>> print stuff

    > Yellow hat 2 Blue shirt 1
    > White socks 4 Green pants 1
    > Blue bag 4 Nice perfume 3
    > Wrist watch 7 Mobile phone 4
    > Wireless cord! 2 Building tools 3
    > One for the money 7 Two for the show 4
    >
    > >>> stuff

    > 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    > bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    > cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
    >
    > I want to put items from stuff into a dict like this:
    > >>> print mydict

    > {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White socks':
    > 4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the money': 7,
    > 'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow hat': 2,
    > 'Building tools': 3}
    >
    > Here's how I did it:
    > >>> def putindict(items):

    > ... items = items.replace('\n', '\t')
    > ... items = items.split('\t')
    > ... d = {}
    > ... for x in xrange( len(items) ):
    > ... if not items[x].isdigit(): d[items[x]] = int(items[x+1])
    > ... return d
    > >>>
    > >>> mydict = putindict(stuff)

    >
    >
    > I was wondering is there a better way to do it using re module?
    > perheps even avoiding this for loop?
    >
    > thanks!
     
    faulkner, Jun 4, 2006
    #3
  4. SuperHik

    Guest

    > strings = islice(data2, 0, len(data), 2)
    > numbers = islice(data2, 1, len(data), 2)


    This probably has to be:

    strings = islice(data2, 0, len(data2), 2)
    numbers = islice(data2, 1, len(data2), 2)

    Sorry,
    bearophile
     
    , Jun 4, 2006
    #4
  5. SuperHik

    John Machin Guest

    On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
    > SuperHik a écrit :
    >> hi all,
    >>
    >> I'm trying to understand regex for the first time, and it would be
    >> very helpful to get an example. I have an old(er) script with the
    >> following task - takes a string I copy-pasted and wich always has the
    >> same format:
    >>
    >> >>> print stuff

    >> Yellow hat 2 Blue shirt 1
    >> White socks 4 Green pants 1
    >> Blue bag 4 Nice perfume 3
    >> Wrist watch 7 Mobile phone 4
    >> Wireless cord! 2 Building tools 3
    >> One for the money 7 Two for the show 4
    >>
    >> >>> stuff

    >> 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    >> bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    >> cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
    >>
    >> I want to put items from stuff into a dict like this:
    >> >>> print mydict

    >> {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White
    >> socks': 4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the
    >> money': 7, 'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow
    >> hat': 2, 'Building tools': 3}
    >>
    >> Here's how I did it:
    >> >>> def putindict(items):

    >> ... items = items.replace('\n', '\t')
    >> ... items = items.split('\t')
    >> ... d = {}
    >> ... for x in xrange( len(items) ):
    >> ... if not items[x].isdigit(): d[items[x]] = int(items[x+1])
    >> ... return d
    >> >>>
    >> >>> mydict = putindict(stuff)

    >>
    >>
    >> I was wondering is there a better way to do it using re module?
    >> perheps even avoiding this for loop?

    >
    > There are better ways. One of them avoids the for loop, and even the re
    > module:
    >
    > def to_dict(items):
    > items = items.replace('\t', '\n').split('\n')


    In case there are leading/trailing spaces on the keys:

    items = [x.strip() for x in items.replace('\t', '\n').split('\n')]

    > return dict(zip(items[::2], map(int, items[1::2])))
    >
    > HTH


    Fantastic -- at least for the OP's carefully copied-and-pasted input.
    Meanwhile back in the real world, there might be problems with multiple
    tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
    In that case a loop approach that validated as it went and was able to
    report the position and contents of any invalid input might be better.
     
    John Machin, Jun 5, 2006
    #5
  6. SuperHik

    Paul McGuire Guest

    "John Machin" <> wrote in message
    news:...
    > Fantastic -- at least for the OP's carefully copied-and-pasted input.
    > Meanwhile back in the real world, there might be problems with multiple
    > tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
    > In that case a loop approach that validated as it went and was able to
    > report the position and contents of any invalid input might be better.


    Yeah, for that you'd need more like a real parser... hey, wait a minute!
    What about pyparsing?!

    Here's a pyparsing version. The definition of the parsing patterns takes
    little more than the re definition does - the bulk of the rest of the code
    is parsing/scanning the input and reporting the results.

    The pyparsing home page is at http://pyparsing.wikispaces.com.

    -- Paul


    stuff = 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
    print "Original input string:"
    print stuff
    print

    from pyparsing import *

    # define low-level elements for parsing
    itemWord = Word(alphas, alphanums+".!?")
    itemDesc = OneOrMore(itemWord)
    integer = Word(nums)

    # add parse action to itemDesc to merge separate words into single string
    itemDesc.setParseAction( lambda s,l,t: " ".join(t) )

    # define macro element for an entry
    entry = itemDesc.setResultsName("item") + integer.setResultsName("qty")

    # scan through input string for entry's, print out their named fields
    print "Results when scanning for entries:"
    for t,s,e in entry.scanString(stuff):
    print t.item,t.qty
    print

    # parse entire string, building ParseResults with dict-like access
    results = dictOf( itemDesc, integer ).parseString(stuff)
    print "Results when parsing entries as a dict:"
    print "Keys:", results.keys()
    for item in results.items():
    print item
    for k in results.keys():
    print k,"=", results[k]


    prints:

    Original input string:
    Yellow hat 2 Blue shirt 1
    White socks 4 Green pants 1
    Blue bag 4 Nice perfume 3
    Wrist watch 7 Mobile phone 4
    Wireless cord! 2 Building tools 3
    One for the money 7 Two for the show 4

    Results when scanning for entries:
    Yellow hat 2
    Blue shirt 1
    White socks 4
    Green pants 1
    Blue bag 4
    Nice perfume 3
    Wrist watch 7
    Mobile phone 4
    Wireless cord! 2
    Building tools 3
    One for the money 7
    Two for the show 4

    Results when parsing entries as a dict:
    Keys: ['Wireless cord!', 'Green pants', 'Blue shirt', 'White socks', 'Mobile
    phone', 'Two for the show', 'One for the money', 'Blue bag', 'Wrist watch',
    'Nice perfume', 'Yellow hat', 'Building tools']
    ('Wireless cord!', '2')
    ('Green pants', '1')
    ('Blue shirt', '1')
    ('White socks', '4')
    ('Mobile phone', '4')
    ('Two for the show', '4')
    ('One for the money', '7')
    ('Blue bag', '4')
    ('Wrist watch', '7')
    ('Nice perfume', '3')
    ('Yellow hat', '2')
    ('Building tools', '3')
    Wireless cord! = 2
    Green pants = 1
    Blue shirt = 1
    White socks = 4
    Mobile phone = 4
    Two for the show = 4
    One for the money = 7
    Blue bag = 4
    Wrist watch = 7
    Nice perfume = 3
    Yellow hat = 2
    Building tools = 3
     
    Paul McGuire, Jun 5, 2006
    #6
  7. SuperHik a écrit :
    > hi all,
    >
    > I'm trying to understand regex for the first time, and it would be very
    > helpful to get an example. I have an old(er) script with the following
    > task - takes a string I copy-pasted and wich always has the same format:
    >
    > >>> print stuff

    > Yellow hat 2 Blue shirt 1
    > White socks 4 Green pants 1
    > Blue bag 4 Nice perfume 3
    > Wrist watch 7 Mobile phone 4
    > Wireless cord! 2 Building tools 3
    > One for the money 7 Two for the show 4
    >
    > >>> stuff

    > 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    > bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    > cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
    >
    > I want to put items from stuff into a dict like this:
    > >>> print mydict

    > {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White socks':
    > 4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the money': 7,
    > 'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow hat': 2,
    > 'Building tools': 3}
    >
    > Here's how I did it:
    > >>> def putindict(items):

    > ... items = items.replace('\n', '\t')
    > ... items = items.split('\t')
    > ... d = {}
    > ... for x in xrange( len(items) ):
    > ... if not items[x].isdigit(): d[items[x]] = int(items[x+1])
    > ... return d
    > >>>
    > >>> mydict = putindict(stuff)

    >
    >
    > I was wondering is there a better way to do it using re module?
    > perheps even avoiding this for loop?


    There are better ways. One of them avoids the for loop, and even the re
    module:

    def to_dict(items):
    items = items.replace('\t', '\n').split('\n')
    return dict(zip(items[::2], map(int, items[1::2])))

    HTH
     
    Bruno Desthuilliers, Jun 5, 2006
    #7
  8. a écrit :
    >>strings = islice(data2, 0, len(data), 2)
    >>numbers = islice(data2, 1, len(data), 2)

    >
    >
    > This probably has to be:
    >
    > strings = islice(data2, 0, len(data2), 2)
    > numbers = islice(data2, 1, len(data2), 2)


    try with islice(data2, 0, None, 2)
     
    Bruno Desthuilliers, Jun 5, 2006
    #8
  9. SuperHik

    John Machin Guest

    On 5/06/2006 10:07 AM, Paul McGuire wrote:
    > "John Machin" <> wrote in message
    > news:...
    >> Fantastic -- at least for the OP's carefully copied-and-pasted input.
    >> Meanwhile back in the real world, there might be problems with multiple
    >> tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
    >> In that case a loop approach that validated as it went and was able to
    >> report the position and contents of any invalid input might be better.

    >
    > Yeah, for that you'd need more like a real parser... hey, wait a minute!
    > What about pyparsing?!
    >
    > Here's a pyparsing version. The definition of the parsing patterns takes
    > little more than the re definition does - the bulk of the rest of the code
    > is parsing/scanning the input and reporting the results.
    >


    [big snip]

    I didn't see any evidence of error handling in there anywhere.
     
    John Machin, Jun 5, 2006
    #9
  10. SuperHik

    Paul McGuire Guest

    "John Machin" <> wrote in message
    news:...
    > On 5/06/2006 10:07 AM, Paul McGuire wrote:
    > > "John Machin" <> wrote in message
    > > news:...
    > >> Fantastic -- at least for the OP's carefully copied-and-pasted input.
    > >> Meanwhile back in the real world, there might be problems with multiple
    > >> tabs used for 'prettiness' instead of 1 tab, non-integer values, etc

    etc.
    > >> In that case a loop approach that validated as it went and was able to
    > >> report the position and contents of any invalid input might be better.

    > >
    > > Yeah, for that you'd need more like a real parser... hey, wait a minute!
    > > What about pyparsing?!
    > >
    > > Here's a pyparsing version. The definition of the parsing patterns

    takes
    > > little more than the re definition does - the bulk of the rest of the

    code
    > > is parsing/scanning the input and reporting the results.
    > >

    >
    > [big snip]
    >
    > I didn't see any evidence of error handling in there anywhere.
    >
    >

    Pyparsing has a certain amount of error reporting built in, raising a
    ParseException when a mismatch occurs.

    This particular "grammar" is actually pretty error-tolerant. To force an
    error, I replaced "One for the money" with "1 for the money", and here is
    the exception reported by pyparsing, along with a diagnostic method,
    markInputline:


    stuff = 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
    badstuff = 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen
    pants\t1\nBlue bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile
    phone\t4\nWireless cord!\t2\tBuilding tools\t3\n1 for the money\t7\tTwo for
    the show\t4'
    pattern = dictOf( itemDesc, integer ) + stringEnd
    print pattern.parseString(stuff)
    print
    try:
    print pattern.parseString(badstuff)
    except ParseException, pe:
    print pe
    print pe.markInputline()

    Gives:
    [['Yellow hat', '2'], ['Blue shirt', '1'], ['White socks', '4'], ['Green
    pants', '1'], ['Blue bag', '4'], ['Nice perfume', '3'], ['Wrist watch',
    '7'], ['Mobile phone', '4'], ['Wireless cord!', '2'], ['Building tools',
    '3'], ['One for the money', '7'], ['Two for the show', '4']]

    Expected stringEnd (at char 210), (line:6, col:1)
    >!<1 for the money 7 Two for the show 4


    -- Paul
     
    Paul McGuire, Jun 5, 2006
    #10
  11. John Machin wrote:

    > Fantastic -- at least for the OP's carefully copied-and-pasted input.
    > Meanwhile back in the real world, there might be problems with multiple
    > tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.


    yeah, that's probably why the OP stated "which always has the same format".

    and the "trying to understand regex for the first time, and it would be
    very helpful to get an example" part was obviously mostly irrelevant to
    the "smarter than thou" crowd; only one thread contributor was silly
    enough to actually provide an RE-based example.

    </F>
     
    Fredrik Lundh, Jun 5, 2006
    #11
  12. SuperHik wrote:

    > I'm trying to understand regex for the first time, and it would be very
    > helpful to get an example. I have an old(er) script with the following
    > task - takes a string I copy-pasted and wich always has the same format:
    >
    > >>> print stuff

    > Yellow hat 2 Blue shirt 1
    > White socks 4 Green pants 1
    > Blue bag 4 Nice perfume 3
    > Wrist watch 7 Mobile phone 4
    > Wireless cord! 2 Building tools 3
    > One for the money 7 Two for the show 4
    >
    > >>> stuff

    > 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    > bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    > cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'


    the first thing you need to do is to figure out exactly what the syntax
    is. given your example, the format of the items you are looking for
    seems to be "some text" followed by a tab character followed by an integer.

    a initial attempt would be "\w+\t\d+" (one or more word characters,
    followed by a tab, followed by one or more digits). to try this out,
    you can do:

    >>> re.findall('\w+\t\d+', stuff)

    ['hat\t2', 'shirt\t1', 'socks\t4', ...]

    as you can see, using \w+ isn't good enough here; the "keys" in this
    case may contain whitespace as well, and findall simply skips stuff that
    doesn't match the pattern. if we assume that a key consists of words
    and spaces, we can replace the single \w with [\w ] (either word
    character or space), and get

    >>> re.findall('[\w ]+\t\d+', stuff)

    ['Yellow hat\t2', 'Blue shirt\t1', 'White socks\t4', ...]

    which looks a bit better. however, if you check the output carefully,
    you'll notice that the "Wireless cord!" entry is missing: the "!" isn't
    a letter or a digit. the easiest way to fix this is to look for
    "non-tab characters" instead, using "[^\t]" (this matches anything
    except a tab):

    >>> len(re.findall('[\w ]+\t\d+', stuff))

    11
    >>> len(re.findall('[^\t]+\t\d+', stuff))

    12

    now, to turn this into a dictionary, you could split the returned
    strings on a tab character (\t), but RE provides a better mechanism:
    capturing groups. by adding () to the pattern string, you can mark the
    sections you want returned:

    >>> re.findall('([^\t]+)\t(\d+)', stuff)

    [('Yellow hat', '2'), ('Blue shirt', '1'), ('White socks', ...]

    turning this into a dictionary is trivial:

    >>> dict(re.findall('([^\t]+)\t(\d+)', stuff))

    {'Green pants': '1', 'Blue shirt': '1', 'White socks': ...}
    >>> len(dict(re.findall('([^\t]+)\t(\d+)', stuff)))

    12

    or, in function terms:

    def putindict(items):
    return dict(re.findall('([^\t]+)\t(\d+)', stuff))

    hope this helps!

    </F>
     
    Fredrik Lundh, Jun 5, 2006
    #12
  13. SuperHik

    John Machin Guest

    On 5/06/2006 7:47 PM, Fredrik Lundh wrote:
    > John Machin wrote:
    >
    >> Fantastic -- at least for the OP's carefully copied-and-pasted input.
    >> Meanwhile back in the real world, there might be problems with
    >> multiple tabs used for 'prettiness' instead of 1 tab, non-integer
    >> values, etc etc.

    >
    > yeah, that's probably why the OP stated "which always has the same format".
    >


    Such statements by users are in the the same category as "The cheque is
    in the mail" and "Of course I'll still love you in the morning".
     
    John Machin, Jun 5, 2006
    #13
  14. John Machin a écrit :
    > On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
    >
    >> SuperHik a écrit :
    >>
    >>> hi all,
    >>>

    (snip)

    >>> I have an old(er) script with the
    >>> following task - takes a string I copy-pasted and wich always has the
    >>> same format:
    >>>

    (snip)
    >>>

    >> def to_dict(items):
    >> items = items.replace('\t', '\n').split('\n')

    >
    >
    > In case there are leading/trailing spaces on the keys:


    There aren't. Test passes.

    (snip)

    > Fantastic -- at least for the OP's carefully copied-and-pasted input.


    That was the spec, and my code passes the test.

    > Meanwhile back in the real world,


    The "real world" is mostly defined by customer's test set (is that the
    correct translation for "jeu d'essai" ?). Code passes the test. period.

    > there might be problems with multiple
    > tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.


    Which means that the spec and the customer's test set is wrong. Not my
    responsability. Any way, I refuse to change anything in the parsing
    algorithm before having another test set.

    > In that case a loop approach that validated as it went and was able to
    > report the position and contents of any invalid input might be better.


    One doesn't know what *will* be better without actual facts. You can be
    right (and, from my experience, you probably are !-), *but* you can be
    wrong as well. Until you have a correct spec and test data set on which
    the code fails, writing any other code is a waste of time. Better to
    work on other parts of the system, and come back on this if and when the
    need arise.

    <ot>
    Kind of reminds me of a former employer that paid me 2 full monthes to
    work on a very hairy data migration script (the original data set was so
    f... up and incoherent even a human parser could barely make any sens of
    it), before discovering than none of the users of the old system was
    interested in migrating that part of the data. Talk about a waste of
    time and money...
    </ot>

    Now FWIW, there's actually something else bugging me with this code : it
    loads the whole data set in memory. It's ok for a few lines, but
    obviously wrong if one is to parse huge files. *That* would be the first
    thing I would change - it takes a couple of minutes to do so no real
    waste of time, but it obviously imply rethinking the API, which is
    better done yet than when client code will have been written.

    My 2 cents....
     
    Bruno Desthuilliers, Jun 5, 2006
    #14
  15. SuperHik

    John Machin Guest

    On 5/06/2006 10:30 PM, Bruno Desthuilliers wrote:
    > John Machin a écrit :
    >> On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
    >>
    >>> SuperHik a écrit :
    >>>
    >>>> hi all,
    >>>>

    > (snip)
    >
    >>>> I have an old(er) script with the following task - takes a string I
    >>>> copy-pasted and wich always has the same format:
    >>>>

    > (snip)
    > >>>
    >>> def to_dict(items):
    >>> items = items.replace('\t', '\n').split('\n')

    >>
    >>
    >> In case there are leading/trailing spaces on the keys:

    >
    > There aren't. Test passes.
    >
    > (snip)
    >
    >> Fantastic -- at least for the OP's carefully copied-and-pasted input.

    >
    > That was the spec, and my code passes the test.
    >
    >> Meanwhile back in the real world,

    >
    > The "real world" is mostly defined by customer's test set (is that the
    > correct translation for "jeu d'essai" ?). Code passes the test. period.


    "Jeu d'essai" could be construed as "toss a coin" -- yup, that fits some
    user test sets I've seen.

    In the real world, you are lucky to get a test set that covers all the
    user-expected "good" cases. They have to be driven with whips to think
    about the "bad" cases. Never come across a problem caused by "FOO " !=
    "FOO"? You *have* lead a charmed life, so far.

    >
    >> there might be problems with multiple tabs used for 'prettiness'
    >> instead of 1 tab, non-integer values, etc etc.

    >
    > Which means that the spec and the customer's test set is wrong. Not my
    > responsability.


    That's what you think. The users, the pointy-haired boss, and the evil
    HR director may have other ideas :)

    > Any way, I refuse to change anything in the parsing
    > algorithm before having another test set.
    >
    >> In that case a loop approach that validated as it went and was able to
    >> report the position and contents of any invalid input might be better.

    >
    > One doesn't know what *will* be better without actual facts. You can be
    > right (and, from my experience, you probably are !-), *but* you can be
    > wrong as well. Until you have a correct spec and test data set on which
    > the code fails, writing any other code is a waste of time. Better to
    > work on other parts of the system, and come back on this if and when the
    > need arise.


    Unfortunately one is likely to be told in a Sunday 03:00 phone call that
    the "test data set on which the code fails" is somewhere in the
    production database :-(

    Cheers,
    John
     
    John Machin, Jun 5, 2006
    #15
  16. Fredrik Lundh a écrit :
    > John Machin wrote:
    >
    >> Fantastic -- at least for the OP's carefully copied-and-pasted input.
    >> Meanwhile back in the real world, there might be problems with
    >> multiple tabs used for 'prettiness' instead of 1 tab, non-integer
    >> values, etc etc.

    >
    >
    > yeah, that's probably why the OP stated "which always has the same format".


    Lol.

    > and the "trying to understand regex for the first time, and it would be
    > very helpful to get an example" part


    Yeps, I missed that part when answering yesterday. My bad.
     
    Bruno Desthuilliers, Jun 5, 2006
    #16
  17. SuperHik

    SuperHik Guest

    WOW!
    Thanks for all the answers, even those not related to regular
    expressions tought me some stuff I wasn't aware of.
    I appreciate it very much.

    SuperHik wrote:
    > hi all,
    >
    > I'm trying to understand regex for the first time, and it would be very
    > helpful to get an example. I have an old(er) script with the following
    > task - takes a string I copy-pasted and wich always has the same format:
    >
    > >>> print stuff

    > Yellow hat 2 Blue shirt 1
    > White socks 4 Green pants 1
    > Blue bag 4 Nice perfume 3
    > Wrist watch 7 Mobile phone 4
    > Wireless cord! 2 Building tools 3
    > One for the money 7 Two for the show 4
    >
    > >>> stuff

    > 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
    > bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
    > cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
    >
    > I want to put items from stuff into a dict like this:
    > >>> print mydict

    > {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White socks':
    > 4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the money': 7,
    > 'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow hat': 2,
    > 'Building tools': 3}
    >
    > Here's how I did it:
    > >>> def putindict(items):

    > ... items = items.replace('\n', '\t')
    > ... items = items.split('\t')
    > ... d = {}
    > ... for x in xrange( len(items) ):
    > ... if not items[x].isdigit(): d[items[x]] = int(items[x+1])
    > ... return d
    > >>>
    > >>> mydict = putindict(stuff)

    >
    >
    > I was wondering is there a better way to do it using re module?
    > perheps even avoiding this for loop?
    >
    > thanks!
     
    SuperHik, Jun 5, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Larry Smith

    Trivial resources problem (beginner)

    Larry Smith, Oct 2, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    526
    Scott Manson
    Oct 2, 2003
  2. tripwater

    Help with Visual Studio (beginner)

    tripwater, Feb 18, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    2,467
    Amit Bahree
    Mar 9, 2005
  3. =?Utf-8?B?S3VydCBTY2hyb2VkZXI=?=

    No Class at ALL!!! beginner/beginner question

    =?Utf-8?B?S3VydCBTY2hyb2VkZXI=?=, Feb 2, 2005, in forum: ASP .Net
    Replies:
    7
    Views:
    621
    =?Utf-8?B?S3VydCBTY2hyb2VkZXI=?=
    Feb 3, 2005
  4. Rensjuh
    Replies:
    7
    Views:
    1,008
    Mabden
    Sep 2, 2004
  5. william nelson

    Beginner's Beginner

    william nelson, Apr 11, 2011, in forum: Ruby
    Replies:
    7
    Views:
    234
    7stud --
    Apr 12, 2011
Loading...

Share This Page