Pattern Matching Given # of Characters and no String Input; use RegularExpressions?

Discussion in 'Python' started by Synonymous, Apr 17, 2005.

  1. Synonymous

    Synonymous Guest


    Can regular expressions compare file names to one another. It seems RE
    can only compare with input i give it, while I want it to compare
    amongst itself and give me matches if the first x characters are

    For example:


    Would result in the 'ddd' and the 'ccc' being grouped together if I
    specified it to look for a match of the first 3 characters.

    What I am trying to do is build a script that will automatically
    create directories based on duplicates like this starting with say 10
    characters, and going down to 1. This way "Vacation1.jpg,
    Vacation2.jpg" would be sent to its own directory (if i specifiy the
    first 8 characters being similiar) and "Cat1.jpg, Cat2.jpg" would
    (with 3) as well.

    Thanks for your help and interest!

    S M
    Synonymous, Apr 17, 2005
    1. Advertisements

  2. Synonymous

    tiissa Guest

    Do you have to use regular expressions?

    If you know the number of characters to match can't you just compare slices?

    In [1]: f1,f2='cccat','cccap'

    In [2]: f1[:3]
    Out[2]: 'ccc'

    In [3]: f1[:3]==f2[:3]
    Out[3]: True

    It seems to me you just have to compare each file to the next one (after
    having sorted your list).
    tiissa, Apr 17, 2005
    1. Advertisements

  3. Synonymous

    tiissa Guest

    If you don't, you can still do it by hand:

    In [7]: def cmp(s1,s2):
    ....: diff_map=[chr(s1!=s2) for i in range(min(len(s1),
    ....: diff_index=''.join(diff_map).find(chr(True))
    ....: if -1==diff_index:
    ....: return min(len(s1), len(s2))
    ....: else:
    ....: return diff_index

    In [8]: cmp('cccat','cccap')
    Out[8]: 4

    In [9]: cmp('ccc','cccap')
    Out[9]: 3

    In [10]: cmp('cccat','dddfa')
    Out[10]: 0
    tiissa, Apr 17, 2005
  4. Synonymous

    Kent Johnson Guest

    itertools.groupby() can do the comparing and grouping:
    ... lst.sort()
    ... def key(item):
    ... return item[:n]
    ... return [ list(items) for k, items in itertools.groupby(lst, key=key) ]
    [['cccat', 'cccap', 'cccan', 'cccbt', 'ccddd'], ['dddfa', 'dddfg', 'dddfz']]

    Kent Johnson, Apr 17, 2005
  5. Synonymous

    Synonymous Guest

    I will look at that, although if i have 300 images i dont want to type
    all the comparisons (In [9]: cmp('ccc','cccap')) by hand, it would
    just be easier to sort them then :).

    I got it somewhat close to working in visual basic:

    If Left$(Cells(iRow, 1).Value, Count) = Left$(Cells(iRow - 1,
    1).Value, Count) Then

    What it says is when comparing a list, it looks at the 'Count' left
    number of characters in the cell and compares it to the row cell
    above's 'Count' left number of characters and then does the task (i.e.
    makes a directory, moves the files) if they are equal.

    I will look for a Left$(str) function that looks at the first X
    characters for python :)).

    Thank you for your help!

    Synonymous, Apr 18, 2005
  6. Synonymous

    John Machin Guest

    Wild goose chase alert! AFAIK there isn't one. Python uses slice
    notation instead of left/mid/right/substr/whatever functions. I do
    suggest that instead of looking for such a beastie, you read this
    section of the Python Tutorial: 3.1.2 Strings.

    Then, if you think that that was a good use of your time, you might
    like to read the *whole* tutorial :))


    John Machin, Apr 18, 2005
  7. BASIC's
    Left$(str, x)

    is essentially Python's

    and a comparison of two would be
    somestring[:X] == anotherstring[:X]

    Dennis Lee Bieber, Apr 18, 2005
  8. Synonymous

    tiissa Guest

    I didn't meant you had to type it by hand. I thought about writing a
    small script (as opposed to using some in the standard tools). It might
    look like:

    In [22]: def make_group(L):
    ....: root,res='',[]
    ....: for i in range(1,len(L)):
    ....: if ''==root:
    ....: root=L[:cmp(L[i-1],L)]
    ....: if ''==root:
    ....: res.append((L[i-1],[L[i-1]]))
    ....: else:
    ....: res.append((root,[L[i-1],L]))
    ....: elif len(root)==cmp(root,L):
    ....: res[-1][1].append(L)
    ....: else:
    ....: root=''
    ....: if ''==root:
    ....: res.append((L[-1],[L[-1]]))
    ....: return res

    In [23]: L=['cccat','cccap','cccan','dddfa','dddfg','dddfz']

    In [24]: L.sort()

    In [25]: make_group(L)
    Out[25]: [('ccca', ['cccan', 'cccap', 'cccat']), ('dddf', ['dddfa',
    'dddfg', 'dddfz'])]

    However I guarantee no optimality in the number of classes (but, hey,
    that's when you don't specify the size of the prefix).
    (Actually, I guarantee nothing at all ;p)
    But in particular, you can have some file singled out:

    In [26]: make_group(['cccan','cccap','cccat','cccb'])
    Out[26]: [('ccca', ['cccan', 'cccap', 'cccat']), ('cccb', ['cccb'])]

    It is a matter of choice: either you want to specify by hand the size of
    the prefix and you'd rather look at itertools as pointed out by Kent, or
    you don't and a variation with the above code might do the job.
    tiissa, Apr 18, 2005
  9. Synonymous

    Synonymous Guest

    Haha it always comes down to RTFM i guess, which is always the best
    advice :eek:).

    Thank you for your help, Now that I think about it I guess string is
    exactly what I am looking for because even though I am using file
    names I am treating them like strings when comparing them.

    Byebye :eek:)

    S M
    Synonymous, Apr 21, 2005
  10. Synonymous

    Synonymous Guest

    Thank you, that is very kool I found out how to copy files finally
    with shutil too, so i'm getting close to doing something. Going to be
    working on an old computer, playing with files = dangerous lol.

    Thanks for your help and taking the time to post!

    Bye :eek:)

    S M
    Synonymous, Apr 21, 2005
  11. Synonymous

    Synonymous Guest


    I was trying to create a program to search for the largest common
    subsetstring among filenames in a directory, them move the filenames
    to the substring's name. I have succeeded, with help, in doing so and
    here is the code.

    Thanks for your help!

    --- Code ---

    #This program was created with feed back from: smeghead and sirup plus
    aum of I2P; and also tiissa and John Machin of comp.lang.python
    #Thank you very much.
    #I still get the odd error in this, but it was 1 out of 2500 files
    successfully sorted. Make sure you have a directory under c:/test/
    called 'aa' and have your
    #I release this code into the public domain :eek:), send feed back to

    files in c:/test/
    import pickle
    import os
    import shutil
    os.chdir ( '/test')
    while y <> 2:
    print y
    List = []
    for fileName in os.listdir ( '/test/' ):
    Directory = fileName
    ListLength = len(List) - 1
    x = 0
    while x < ListLength:
    ListLength = len(List) - 1
    b = List[x]
    c = List[x + 1]
    backward1 = List[x - 1]
    d = b[:y]
    e = c[:y]
    backward2 = backward1[:y]
    f = str(d)
    g = str(e)
    backward3 = str(backward2)
    if f==g:
    if os.path.isdir (aa+"/"+f) == True:
    if f==backward3:
    if os.path.isdir (aa+"/"+f) == True:
    x = x + 1
    y = y - 1

    --- End Code ---
    Synonymous, Apr 22, 2005
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.