Re: create lowercase strings in lists - was: (No subject)

Discussion in 'Python' started by Mark Devine, Dec 16, 2004.

  1. Mark Devine

    Mark Devine Guest

    Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
    Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.




    Steve Holden <> wrote:

    >
    > Mark Devine wrote:
    >
    > > Sorry for not putting a subject in the last e-mail. The function lower suited my case exactly. Here however is my main problem:
    > > Given that my new list is :
    > > [class-map match-all cmap1', 'match ip any', 'class-map match-any cmap2', 'match any', 'policy-map policy1', 'class cmap1', 'policy-map policy2', 'service-policy policy1', 'class cmap2']
    > >
    > > Each element in my new list could appear in any order together within another larger list (list1) and I want to count how many matches occur. For example the larger list could have an element 'class-map cmap2 (match any)' and I want to match that but if only 'class-map match-any' or 'class-map cmap2' appears I don't want it to match.
    > >
    > > Can anybody help?
    > > Is my problem clearly stated?
    > >

    >
    > Well, let's see: you'd like to know which strings occur in both lists,
    > right?
    >
    > You might like to look at the "Efficient grep using Python?" thread for
    > suggestions. My favorite would be:
    >
    > .>>> lst1 = ["ab", "ac", "ba", "bb", "bc"]
    > .>>> lst2 = ["ac", "ab", "bd", "cb", "bb"]
    > .>>> dct1 = dict.fromkeys(lst1)
    > .>>> [x for x in lst2 if x not in dct1]
    > ['bd', 'cb']
    > .>>> [x for x in lst2 if x in dct1]
    > ['ac', 'ab', 'bb']
    >
    > regards
    > Steve
    > --
    > Steve Holden http://www.holdenweb.com/
    > Python Web Programming http://pydish.holdenweb.com/
    > Holden Web LLC +1 703 861 4237 +1 800 494 3119
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >




    _________________________________________________________________
    Sign up for eircom broadband now and get a free two month trial.*
    Phone 1850 73 00 73 or visit http://home.eircom.net/broadbandoffer
    Mark Devine, Dec 16, 2004
    #1
    1. Advertising

  2. Mark Devine

    Steve Holden Guest

    Mark Devine wrote:

    > Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
    > Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.
    >

    Well since that's the case it would seem you'd be best processing each
    item from the large list against the small list, though in truth it may
    not make any difference.

    It looks like the best way to proceed might be to reduce each line to a
    canonical form -- strip the parens and other irrelevant characters out,
    and sort the words in order. After that it'd be relatively simple to
    determine whether two lines match - they'd be the same!

    The only slight wrinkle would be keeping the original lines for
    reference, but that's not difficult.

    Does this give you enough of an idea, or do you need code samples?

    regards
    Steve
    >
    >
    >
    > Steve Holden <> wrote:
    >
    >
    >>Mark Devine wrote:
    >>
    >>
    >>>Sorry for not putting a subject in the last e-mail. The function lower suited my case exactly. Here however is my main problem:
    >>>Given that my new list is :
    >>>[class-map match-all cmap1', 'match ip any', 'class-map match-any cmap2', 'match any', 'policy-map policy1', 'class cmap1', 'policy-map policy2', 'service-policy policy1', 'class cmap2']
    >>>
    >>>Each element in my new list could appear in any order together within another larger list (list1) and I want to count how many matches occur. For example the larger list could have an element 'class-map cmap2 (match any)' and I want to match that but if only 'class-map match-any' or 'class-map cmap2' appears I don't want it to match.
    >>>
    >>>Can anybody help?
    >>>Is my problem clearly stated?
    >>>

    >>
    >>Well, let's see: you'd like to know which strings occur in both lists,
    >>right?
    >>
    >>You might like to look at the "Efficient grep using Python?" thread for
    >> suggestions. My favorite would be:
    >>
    >>.>>> lst1 = ["ab", "ac", "ba", "bb", "bc"]
    >>.>>> lst2 = ["ac", "ab", "bd", "cb", "bb"]
    >>.>>> dct1 = dict.fromkeys(lst1)
    >>.>>> [x for x in lst2 if x not in dct1]
    >>['bd', 'cb']
    >>.>>> [x for x in lst2 if x in dct1]
    >>['ac', 'ab', 'bb']
    >>
    >>regards
    >> Steve
    >>--
    >>Steve Holden http://www.holdenweb.com/
    >>Python Web Programming http://pydish.holdenweb.com/
    >>Holden Web LLC +1 703 861 4237 +1 800 494 3119
    >>--
    >>http://mail.python.org/mailman/listinfo/python-list
    >>

    >
    >
    >
    >
    > _________________________________________________________________
    > Sign up for eircom broadband now and get a free two month trial.*
    > Phone 1850 73 00 73 or visit http://home.eircom.net/broadbandoffer
    >
    >



    --
    Steve Holden http://www.holdenweb.com/
    Python Web Programming http://pydish.holdenweb.com/
    Holden Web LLC +1 703 861 4237 +1 800 494 3119
    Steve Holden, Dec 16, 2004
    #2
    1. Advertising

  3. Mark Devine

    Mike Meyer Guest

    Steve Holden <> writes:

    > Mark Devine wrote:
    >
    >> Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
    >> Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.
    >>

    > Well since that's the case it would seem you'd be best processing each
    > item from the large list against the small list, though in truth it
    > may not make any difference.
    >
    > It looks like the best way to proceed might be to reduce each line to
    > a canonical form -- strip the parens and other irrelevant characters
    > out, and sort the words in order. After that it'd be relatively simple
    > to determine whether two lines match - they'd be the same!


    No, that doesn't work. What happens if an element of the second list
    has *more* words than the element in the first list? In that case, the
    two canonical forms would be different, but it should still be a
    match.

    How about this (If I had sample data, I'd try it out directly...):

    Create a dictionary of sets. For each word in an element in the small
    list, insert into the set indexed by that word in the dictionary a
    tuple version of the list (you'll want to create the tuples in
    advance, and associate them with each list somehow).

    Then go through the long list, and for each element collect all the
    sets that are indexed by the words in that element, and take the
    intersection of them all. If there are any tuples in the intersection,
    then you have a match.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
    Mike Meyer, Dec 17, 2004
    #3
  4. Steve Holden wrote:
    > Mark Devine wrote:
    >
    >> Actually what I want is element 'class-map match-all cmap1' from list
    >> 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark
    >> match-all done' in list 2 but not to match 'class-map cmap1'.
    >> Each element in both lists have multiple words in them. If all the
    >> words of any element of the first list appear in any order within any
    >> element of the second list I want a match but if any of the words are
    >> missing then there is no match. There are far more elements in list 2
    >> than in list 1.
    >>

    >

    sounds like a case for sets...

    >>> # NB Python 2.4

    ...
    >>> # Test if the words of list2 elements appear in any order in list1 elements
    >>> # disregarding case and parens

    ...
    >>> # Reference list
    >>> list1 = ["a b C (D)",

    ... "D A B",
    ... "A B E"]
    >>> # Test list
    >>> list2 = ["A B C D", #True

    ... "A B D", #True
    ... "A E F", #False
    ... "A (E) B", #True
    ... "A B", #True
    ... "E A B" ]
    ...
    >>> def normalize(text, unwanted = "()"):

    ... conv = "".join(char.lower() for char in text if char not in unwanted)
    ... return set(conv.split())
    ...
    >>> reflist = [normalize(element) for element in list1]
    >>> print reflist

    ...
    [set(['a', 'c', 'b', 'd']), set(['a', 'b', 'd']), set(['a', 'b', 'e'])]

    This is the list of sets to test against


    >>> def testmember(element):

    ... """is element a member of the reflist, according to the above rules?"""
    ... testelement = normalize(element)
    ... #brute force comparison until match - depends on small reflist
    ... for el in reflist:
    ... if el.issuperset(testelement):
    ... return True
    ... return False
    ...
    >>> for element in list2:

    ... print element, testmember(element)
    ...
    A B C D True
    A B D True
    A E F False
    A (E) B True
    A B True
    E A B True
    >>>


    Michael
    Michael Spencer, Dec 17, 2004
    #4
  5. Michael Spencer wrote:
    > ... conv = "".join(char.lower() for char in text if char not in
    > unwanted)


    Probably a good place to use str.replace, e.g.

    conv = text.lower()
    for char in unwanted:
    conv = conv.replace(char, '')

    Some timings to support my assertion: =)

    C:\Documents and Settings\Steve>python -m timeit -s "s =
    ''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
    10000 loops, best of 3: 74.6 usec per loop

    C:\Documents and Settings\Steve>python -m timeit -s "s =
    ''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
    100000 loops, best of 3: 2.82 usec per loop

    Steve
    Steven Bethard, Dec 17, 2004
    #5
  6. On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <> wrote:

    >Michael Spencer wrote:
    >> ... conv = "".join(char.lower() for char in text if char not in
    >> unwanted)

    >
    >Probably a good place to use str.replace, e.g.
    >
    >conv = text.lower()
    >for char in unwanted:
    > conv = conv.replace(char, '')
    >
    >Some timings to support my assertion: =)
    >
    >C:\Documents and Settings\Steve>python -m timeit -s "s =
    >''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
    >10000 loops, best of 3: 74.6 usec per loop
    >
    >C:\Documents and Settings\Steve>python -m timeit -s "s =
    >''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
    >100000 loops, best of 3: 2.82 usec per loop
    >

    If unwanted has more than one character in it, I would expect unwanted as
    deletechars in

    >>> help(str.translate)

    Help on method_descriptor:

    translate(...)
    S.translate(table [,deletechars]) -> string

    Return a copy of the string S, where all characters occurring
    in the optional argument deletechars are removed, and the
    remaining characters have been mapped through the given
    translation table, which must be a string of length 256.

    to compete well, if table setup were for free
    (otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)])
    for identity translation, and that might pay for a couple of .replace loops,
    depending).

    Regards,
    Bengt Richter
    Bengt Richter, Dec 17, 2004
    #6
  7. Bengt Richter wrote:
    > On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <> wrote:
    >
    >
    >>Michael Spencer wrote:
    >>
    >>> ... conv = "".join(char.lower() for char in text if char not in
    >>>unwanted)

    >>
    >>Probably a good place to use str.replace, e.g.
    >>
    >>conv = text.lower()
    >>for char in unwanted:
    >> conv = conv.replace(char, '')
    >>
    >>Some timings to support my assertion: =)
    >>
    >>C:\Documents and Settings\Steve>python -m timeit -s "s =
    >>''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
    >>10000 loops, best of 3: 74.6 usec per loop
    >>
    >>C:\Documents and Settings\Steve>python -m timeit -s "s =
    >>''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
    >>100000 loops, best of 3: 2.82 usec per loop
    >>

    Well, sure, if it's just speed, conciseness and backwards-compatibility that you
    want ;-)

    >
    > If unwanted has more than one character in it, I would expect unwanted as
    > deletechars in
    >
    > >>> help(str.translate)

    > Help on method_descriptor:
    >
    > translate(...)
    > S.translate(table [,deletechars]) -> string
    >
    > Return a copy of the string S, where all characters occurring
    > in the optional argument deletechars are removed, and the
    > remaining characters have been mapped through the given
    > translation table, which must be a string of length 256.
    >
    > to compete well, if table setup were for free
    > (otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)])
    > for identity translation, and that might pay for a couple of .replace loops,
    > depending).
    >
    > Regards,
    > Bengt Richter

    Good point - and there is string.maketrans to set up the table too. So
    normalize can be rewritten as:


    def normalize1(text, unwanted = "()", table = maketrans("","")):
    text = text.lower()
    text.translate(table,unwanted)
    return set(text.split())

    which gives:
    >>> t= timeit.Timer("normalize1('(UPPER CASE) lower case')", "from listmembers

    import normalize1")
    >>> t.repeat(3,10000)

    [0.29812783468287307, 0.29807782832722296, 0.3021370034462052]


    But, while we're at it, we can use str.translate to do the case conversion too:

    So:

    def normalize2(text, unwanted = "()", table =
    maketrans(ascii_uppercase,ascii_lowercase)):
    text.translate(table,unwanted)
    return set(text.split())

    >>> t= timeit.Timer("normalize2('(UPPER CASE) lower case')", "from listmembers

    import normalize2")
    >>> t.repeat(3,10000)

    [0.24295154831133914, 0.24174497038029585, 0.25234855267899547]


    ....which is a little faster still

    Thanks for the comments: they were interesting for me - hope some of this is
    useful to OP

    Regards

    Michael
    Michael Spencer, Dec 17, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Devine
    Replies:
    2
    Views:
    277
    Mike Meyer
    Dec 16, 2004
  2. Mark Devine
    Replies:
    1
    Views:
    344
    Steven Bethard
    Dec 17, 2004
  3. Mark Devine
    Replies:
    3
    Views:
    762
    Steve Holden
    Dec 17, 2004
  4. robin
    Replies:
    10
    Views:
    541
    Dave Hansen
    Apr 12, 2006
  5. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    401
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
Loading...

Share This Page