re.findall(a patern,'function(dsf sdf sdf)')

Discussion in 'Python' started by gcmartijn, Jul 26, 2008.

  1. gcmartijn

    gcmartijn Guest

    H!

    First I have some random string below.

    bla = """ <script type="text/javascript">
    // <![CDATA[

    var bla = new Blaobject("argh 1a", "argh 2a", "24", 24, 345)

    function la( tec )
    {
    etc etc
    }

    function other thing( ){

    var two = new BlaObject("argh 1b", "argh 2b", ""+(sv), ""+(2f),
    "4");

    bla die bla
    }

    // ]]>
    </script> """


    Now I'm trying to get each BlaObject with the first (variable)
    function argument

    And I can say that this isn't working
    for a in re.findall(r'([BlaObject ])(.*)([)] *)',bla):
    print a

    The output must be something like:
    # ('BlaObject','argh 1a')
    # ('BlaObject','argh 1a')
    or
    # Blaobject("argh 1a", "argh 2a", "24", 24, 345)
    # BlaObject("argh 1b", "argh 2b", ""+(sv), ""+(2f), "4");


    My simple idea was to
    a. the start position is the BlaObject
    b. the stop position is the character ) (not ); because its a
    javascript function)
    c. the output [a (everything between) b]

    Who knows the answer ?

    Thanks very much,
    GCMartijn
     
    gcmartijn, Jul 26, 2008
    #1
    1. Advertisements

  2. gcmartijn

    Lie Guest

    First of all, since you're dealing with Javascript, which is case-
    sensitive, Blaobject and BlaObject means different thing, if the dummy
    code is real, it'd have raised a name not found error.
    Of course that doesn't work, you've put BlaObject in a square bracket
    (character class notation), which means the re module would search for
    _a single letter_ that exist inside the square bracket. Then you do a
    '.*', a greedy match-all, something that you generally don't want to
    do. Next is the '[)] *', a character class containing only a single
    character is the same as the character itself, and the zero-or-more-
    repetition (*) is applied to the white space after the character
    class, not to the character class itself.

    In short, the regular expression you used doesn't seem to be an effort
    to solve the problem. In other words, you haven't read the regular
    expression docs: http://docs.python.org/lib/module-re.html . In other
    words, it's useless to talk with you until then.

    (snip)
     
    Lie, Jul 26, 2008
    #2
    1. Advertisements

  3. gcmartijn

    gcmartijn Guest

    In short, the regular expression you used doesn't seem to be an effort
    Its a combination
    - I don't understand english very good (yet)
    - For me its hard to learn the re , I will try to search again at
    google for examples and do some copy past things.
     
    gcmartijn, Jul 26, 2008
    #3
  4. this might be useful when figuring out how RE:s work:

    http://kodos.sourceforge.net/

    also, don't forget the following guideline:

    "Some people, when confronted with a problem, think 'I know,
    I'll use regular expressions.' Now they have two problems."

    some advice:

    - Keep the RE:s simple. You can often simplify things a lot by doing
    multiple searches, or even by applying a second RE on the results from
    the first. In this case, you could use one RE to search for BlaObject,
    and then use another one to extract the first argument.

    - Ordinary string methods (e.g. find, partition, split) are often a very
    capable alternative (in combination with simple RE:s). In your case,
    for JavaScript code that's as regular as the one in your example, you
    can split the string on "BlaObject(" and then use partition to strip off
    the first argument.

    - only use RE:s to read specialized file formats if you know exactly
    what you're doing; there's often a ready-made library that does it much
    better.

    - The regular expression machinery is not a parser. You cannot handle
    all possible syntaxes with it, so don't even try.

    </F>
     
    Fredrik Lundh, Jul 26, 2008
    #4
  5. gcmartijn

    gcmartijn Guest

    Thanks for the info, I will download that program later so I can build
    a re (i hope)

    Because I can't wait for that re, I have made a non re solution what
    is working for now.

    for a in bla.split():
    if a.find('BlaObject')<>-1:
    print a[11:].replace("'","").replace('"',"").replace(",","")

    (I know this is not the best way, but it helps me for now)
     
    gcmartijn, Jul 26, 2008
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.