Using dictionary to hold regex patterns?

Discussion in 'Python' started by Gilles Ganault, Nov 23, 2008.

  1. Hello

    After downloading a web page, I need to search for several patterns,
    and if found, extract information and put them into a database.

    To avoid a bunch of "if m", I figured maybe I could use a dictionary
    to hold the patterns, and loop through it:

    ======
    pattern = {}
    pattern["pattern1"] = ">.+?</td>.+?>(.+?)</td>"
    for key,value in pattern.items():
    response = ">whatever</td>.+?>Blababla</td>"

    #AttributeError: 'str' object has no attribute 'search'
    m = key.search(response)
    if m:
    print key + "#" + value
    ======

    Is there a way to use a dictionary this way, or am I stuck with
    copy/pasting blocks of "if m:"?

    Thank you.
     
    Gilles Ganault, Nov 23, 2008
    #1
    1. Advertising

  2. Gilles Ganault <> writes:

    > Hello
    >
    > After downloading a web page, I need to search for several patterns,
    > and if found, extract information and put them into a database.
    >
    > To avoid a bunch of "if m", I figured maybe I could use a dictionary
    > to hold the patterns, and loop through it:
    >
    > ======
    > pattern = {}
    > pattern["pattern1"] = ">.+?</td>.+?>(.+?)</td>"


    pattern["pattern1"] = re.compile(">.+?</td>.+?>(.+?)</td>")

    > for key,value in pattern.items():
    > response = ">whatever</td>.+?>Blababla</td>"
    >
    > #AttributeError: 'str' object has no attribute 'search'
    > m = key.search(response)


    m = value.search(response)

    > if m:
    > print key + "#" + value
    > ======
    >
    > Is there a way to use a dictionary this way, or am I stuck with
    > copy/pasting blocks of "if m:"?


    But there is no reason why you should use a dictionary; just use a list
    of key-value pairs:

    patterns = [
    ("pattern1", re.compile(">.+?</td>.+?>(.+?)</td>"),
    ("pattern2", re.compile("something else"),
    ....
    ]

    for name, pattern in patterns:
    ...

    --
    Arnaud
     
    Arnaud Delobelle, Nov 23, 2008
    #2
    1. Advertising

  3. Gilles Ganault

    Terry Reedy Guest

    Gilles Ganault wrote:
    > Hello
    >
    > After downloading a web page, I need to search for several patterns,
    > and if found, extract information and put them into a database.
    >
    > To avoid a bunch of "if m", I figured maybe I could use a dictionary
    > to hold the patterns, and loop through it:


    Good idea.

    import re

    > pattern = {}
    > pattern["pattern1"] = ">.+?</td>.+?>(.+?)</td>"


    .... = re.compile("...")

    > for key,value in pattern.items():


    for name, regex in ...

    > response = ">whatever</td>.+?>Blababla</td>"
    >
    > #AttributeError: 'str' object has no attribute 'search'


    Correct, only compiled re patterns have search, better naming would make
    error obvious.

    > m = key.search(response)


    m = regex.search(response)

    > if m:
    > print key + "#" + value


    print name + '#' + regex
     
    Terry Reedy, Nov 23, 2008
    #3
  4. On Sun, 23 Nov 2008 17:55:48 +0000, Arnaud Delobelle
    <> wrote:
    >But there is no reason why you should use a dictionary; just use a list
    >of key-value pairs:
    >
    >patterns = [
    > ("pattern1", re.compile(">.+?</td>.+?>(.+?)</td>"),


    Thanks for the tip, but... I thought that lists could only use integer
    indexes, while text indexes had to use dictionaries. In which case do
    we need dictionaries, then?
     
    Gilles Ganault, Nov 23, 2008
    #4
  5. Gilles Ganault

    John Machin Guest

    On Nov 24, 6:55 am, Gilles Ganault <> wrote:
    > On Sun, 23 Nov 2008 17:55:48 +0000, Arnaud Delobelle
    >
    > <> wrote:
    > >But there is no reason why you should use a dictionary; just use a list
    > >of key-value pairs:

    >
    > >patterns = [
    > >    ("pattern1", re.compile(">.+?</td>.+?>(.+?)</td>"),

    >
    > Thanks for the tip, but... I thought that lists could only use integer
    > indexes, while text indexes had to use dictionaries. In which case do
    > we need dictionaries, then?


    You don't have a requirement for indexing -- neither a text index nor
    an integer index. Your requirement is met by a sequence of (name,
    regex) pairs. Yes, a list is a sequence, and a list has integer
    indexes, but this is irrelevant.

    General tip: Don't us a data structure that is more complicated than
    what you need.
     
    John Machin, Nov 23, 2008
    #5
  6. Gilles Ganault

    John Machin Guest

    On Nov 24, 5:36 am, Terry Reedy <> wrote:
    > Gilles Ganault wrote:
    > > Hello

    >
    > > After downloading a web page, I need to search for several patterns,
    > > and if found, extract information and put them into a database.

    >
    > > To avoid a bunch of "if m", I figured maybe I could use a dictionary
    > > to hold the patterns, and loop through it:

    >
    > Good idea.
    >
    > import re
    >
    > > pattern = {}
    > > pattern["pattern1"] = ">.+?</td>.+?>(.+?)</td>"

    >
    > ... = re.compile("...")
    >
    > > for key,value in pattern.items():

    >
    > for name, regex in ...
    >
    > >    response = ">whatever</td>.+?>Blababla</td>"

    >
    > >    #AttributeError: 'str' object has no attribute 'search'

    >
    > Correct, only compiled re patterns have search, better naming would make
    > error obvious.
    >
    > >    m = key.search(response)

    >
    > m = regex.search(response)
    >
    > >    if m:
    > >            print key + "#" + value

    >
    > print name + '#' + regex


    Perhaps you meant:
    print key + "#" + regex.pattern
     
    John Machin, Nov 23, 2008
    #6
  7. John Machin schrieb:

    > General tip: Don't us a data structure that is more complicated than
    > what you need.


    Is "[ ( name, regex ), ... ]" really "simpler" than "{ name: regex, ...
    }"? Intuitively, I would consider the dictionary to be the simpler
    structure.

    Greetings,
    Thomas

    --
    Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
    (Coluche)
     
    Thomas Mlynarczyk, Nov 23, 2008
    #7
  8. Gilles Ganault

    André Guest

    On Nov 23, 1:40 pm, Gilles Ganault <> wrote:
    > Hello
    >
    > After downloading a web page, I need to search for several patterns,
    > and if found, extract information and put them into a database.
    >
    > To avoid a bunch of "if m", I figured maybe I could use a dictionary
    > to hold the patterns, and loop through it:
    >
    > ======
    > pattern = {}
    > pattern["pattern1"] = ">.+?</td>.+?>(.+?)</td>"
    > for key,value in pattern.items():
    >         response = ">whatever</td>.+?>Blababla</td>"
    >
    >         #AttributeError: 'str' object has no attribute 'search'
    >         m = key.search(response)
    >         if m:
    >                 print key + "#" + value
    > ======
    >
    > Is there a way to use a dictionary this way, or am I stuck with
    > copy/pasting blocks of "if m:"?
    >
    > Thank you.


    Yes it is possible and you don't need to use pattern.items()...

    Here is something I use (straight cut-and-paste):

    def parse_single_line(self, line):
    '''Parses a given line to see if it match a known pattern'''
    for name in self.patterns:
    result = self.patterns[name].match(line)
    if result is not None:
    return name, result.groups()
    return None, line


    where self.patterns is something like
    self.patterns={
    'pattern1': re.compile(...),
    'pattern2': re.compile(...)
    }

    The one potential problem with the method as I wrote it is that
    sometimes a more generic pattern gets matched first whereas a more
    specific pattern may be desired.

    André
     
    André, Nov 23, 2008
    #8
  9. Gilles Ganault

    John Machin Guest

    On Nov 24, 7:48 am, John Machin <> wrote:
    > On Nov 24, 5:36 am, Terry Reedy <> wrote:
    >
    > > print name + '#' + regex

    >
    > Perhaps you meant:
    >    print key + "#" + regex.pattern


    I definitely meant:
    print name + '#' + regex.pattern
     
    John Machin, Nov 23, 2008
    #9
  10. Gilles Ganault

    John Machin Guest

    On Nov 24, 7:49 am, Thomas Mlynarczyk <>
    wrote:
    > John Machin schrieb:
    >
    > > General tip: Don't us a data structure that is more complicated than
    > > what you need.

    >
    > Is "[ ( name, regex ), ... ]" really "simpler" than "{ name: regex, ...}"? Intuitively, I would consider the dictionary to be the simpler
    >
    > structure.


    Hi Thomas,

    Rephrasing for clarity: Don't use a data structure that is more
    complicated than that indicated by your requirements.

    Judging which of two structures is "simpler" should not be independent
    of those requirements. I don't see a role for intuition in this
    process.

    Please see my belated response in your "My first Python program -- a
    lexer" thread.

    Cheers,
    John
     
    John Machin, Nov 23, 2008
    #10
  11. On Sun, 23 Nov 2008 17:55:48 +0000, Arnaud Delobelle
    <> wrote:
    >But there is no reason why you should use a dictionary; just use a list
    >of key-value pairs:


    Thanks for the tip. I didn't know it was possible to use arrays to
    hold more than one value. Actually, it's a better solution, as
    key/value tuples in a dictionary aren't used in the order in which
    they're put in the dictionary, while arrays are.

    For those interested:

    ========
    response = ">dummy</td>bla>good stuff</td>"
    for name, pattern in patterns:
    m = pattern.search(response)
    if m:
    print m.group(1)
    break
    else:
    print "here"
    ========

    Thanks guys.
     
    Gilles Ganault, Nov 23, 2008
    #11
  12. Gilles Ganault

    MRAB Guest

    Gilles Ganault wrote:
    > On Sun, 23 Nov 2008 17:55:48 +0000, Arnaud Delobelle
    > <> wrote:
    >> But there is no reason why you should use a dictionary; just use a list
    >> of key-value pairs:

    >
    > Thanks for the tip. I didn't know it was possible to use arrays to
    > hold more than one value. Actually, it's a better solution, as
    > key/value tuples in a dictionary aren't used in the order in which
    > they're put in the dictionary, while arrays are.
    >

    [snip]
    A list is an ordered collection of items. Each item can be anything: a
    string, an integer, a dictionary, a tuple, a list...
     
    MRAB, Nov 23, 2008
    #12
  13. On Sun, 23 Nov 2008 23:18:06 +0000, MRAB <>
    wrote:
    >A list is an ordered collection of items. Each item can be anything: a
    >string, an integer, a dictionary, a tuple, a list...


    Yup, learned something new today. Naively, I though a list was
    index=value, where value=a single piece of data. Works like a charm.
    Thanks.
     
    Gilles Ganault, Nov 23, 2008
    #13
  14. On Mon, 24 Nov 2008 00:46:42 +0100, Gilles Ganault wrote:

    > On Sun, 23 Nov 2008 23:18:06 +0000, MRAB <>
    > wrote:
    >>A list is an ordered collection of items. Each item can be anything: a
    >>string, an integer, a dictionary, a tuple, a list...

    >
    > Yup, learned something new today. Naively, I though a list was
    > index=value, where value=a single piece of data.


    Your thought was correct, each value is a single piece of data: *one*
    tuple.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Nov 24, 2008
    #14
  15. Dennis Lee Bieber schrieb:

    >> Is "[ ( name, regex ), ... ]" really "simpler" than "{ name: regex, ...
    >> }"? Intuitively, I would consider the dictionary to be the simpler
    >> structure.


    > Why, when you aren't /using/ the name to retrieve the expression...


    So as soon as I start retrieving a regex by its name, the dict will be
    the most suitable structure?

    Greetings,
    Thomas

    --
    Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
    (Coluche)
     
    Thomas Mlynarczyk, Nov 24, 2008
    #15
  16. John Machin schrieb:

    > Rephrasing for clarity: Don't use a data structure that is more
    > complicated than that indicated by your requirements.


    Could you please define "complicated" in this context? In terms of
    characters to type and reading, the dict is surely simpler. But I
    suppose that under the hood, it is "less work" for Python to deal with a
    list of tuples than a dict?

    > Judging which of two structures is "simpler" should not be independent
    > of those requirements. I don't see a role for intuition in this
    > process.


    Maybe I should have said "upon first sight" / "judging from the outer
    appearance" instead of "intuition".

    > Please see my belated response in your "My first Python program -- a
    > lexer" thread.


    (See my answer there.) I think I should definitely read up a bit on the
    implementation details of those data structures in Python. (As it was
    suggested earlier in my lexer thread.)

    Greetings,
    Thomas

    --
    Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
    (Coluche)
     
    Thomas Mlynarczyk, Nov 24, 2008
    #16
  17. Gilles Ganault

    John Machin Guest

    On Nov 25, 4:38 am, Thomas Mlynarczyk <>
    wrote:
    > John Machin schrieb:
    >
    > > Rephrasing for clarity: Don't use a data structure that is more
    > > complicated than that indicated by your requirements.

    >
    > Could you please define "complicated" in this context? In terms of
    > characters to type and reading, the dict is surely simpler. But I
    > suppose that under the hood, it is "less work" for Python to deal with a
    > list of tuples than a dict?


    The two extra parentheses per item are a trivial cosmetic factor only
    when the data is hard-coded i.e. don't exist if the data is read from
    a file i.e nothing to do with "complicated". The amount of work done
    by Python under the hood is relevant only to a speed/memory
    requirement. No, "complicated" is more related to unused features. In
    the case of using an aeroplane to transport 3 passengers 10 km along
    the autobahn, you aren't using the radar, wheel-retractability, wings,
    pressurised cabin, etc. In your original notion of using a dict in
    your lexer, you weren't using the mapping functionality of a dict at
    all. In both cases you have perplexed bystanders asking "Why use a
    plane/dict when a car/list will do the job?".

    >
    > > Judging which of two structures is "simpler" should not be independent
    > > of those requirements. I don't see a role for intuition in this
    > > process.

    >
    > Maybe I should have said "upon first sight" / "judging from the outer
    > appearance" instead of "intuition".


    I don't see a role for "upon first sight" or "judging from the outer
    appearance" either.
     
    John Machin, Nov 24, 2008
    #17
  18. Gilles Ganault

    Steve Holden Guest

    John Machin wrote:
    > On Nov 25, 4:38 am, Thomas Mlynarczyk <>

    [...]
    >>> Judging which of two structures is "simpler" should not be independent
    >>> of those requirements. I don't see a role for intuition in this
    >>> process.

    >> Maybe I should have said "upon first sight" / "judging from the outer
    >> appearance" instead of "intuition".

    >
    > I don't see a role for "upon first sight" or "judging from the outer
    > appearance" either.
    >

    They are all potentially (inadequate) substitutes for the knowledge and
    experience you bring to the problem.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Nov 24, 2008
    #18
  19. John Machin schrieb:

    > No, "complicated" is more related to unused features. In
    > the case of using an aeroplane to transport 3 passengers 10 km along
    > the autobahn, you aren't using the radar, wheel-retractability, wings,
    > pressurised cabin, etc. In your original notion of using a dict in
    > your lexer, you weren't using the mapping functionality of a dict at
    > all. In both cases you have perplexed bystanders asking "Why use a
    > plane/dict when a car/list will do the job?".


    Now the matter is getting clearer in my head.

    Thanks and greetings,
    Thomas

    --
    Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
    (Coluche)
     
    Thomas Mlynarczyk, Nov 25, 2008
    #19
  20. André a écrit :
    (snip)
    > you don't need to use pattern.items()...
    >
    > Here is something I use (straight cut-and-paste):
    >
    > def parse_single_line(self, line):
    > '''Parses a given line to see if it match a known pattern'''
    > for name in self.patterns:
    > result = self.patterns[name].match(line)


    FWIW, this is more expansive than iterating over (key, value) tuples
    using dict.items(), since you have one extra call to dict.__getitem__
    per entry.

    > if result is not None:
    > return name, result.groups()
    > return None, line
    >
    >
    > where self.patterns is something like
    > self.patterns={
    > 'pattern1': re.compile(...),
    > 'pattern2': re.compile(...)
    > }
    >
    > The one potential problem with the method as I wrote it is that
    > sometimes a more generic pattern gets matched first whereas a more
    > specific pattern may be desired.


    As usual when order matters, the solution is to use a list of (name,
    whatever) tuples instead of a dict. You can still build a dict from this
    list when needed (the dict initializer accepts a list of (name, object)
    as argument).
     
    Bruno Desthuilliers, Nov 26, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. thorsten
    Replies:
    1
    Views:
    493
  2. crichmon
    Replies:
    4
    Views:
    511
    Mabden
    Jul 7, 2004
  3. james_027
    Replies:
    1
    Views:
    364
    Marc 'BlackJack' Rintsch
    Aug 22, 2007
  4. Replies:
    3
    Views:
    834
    Reedick, Andrew
    Jul 1, 2008
  5. Martin

    matching patterns after regex?

    Martin, Aug 12, 2009, in forum: Python
    Replies:
    8
    Views:
    395
    Martin
    Aug 13, 2009
Loading...

Share This Page