Regex Black Magic... how to stop matching if char?

Discussion in 'Ruby' started by Jon, Mar 30, 2007.

  1. Jon

    Jon Guest

    I'm trying to translate a strange derivative of xml into valid xml. Here
    is an example line:

    <SUBEVENTSTATUS
    1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
    1:1><SUBEVENTSTATUS 2:2><......and on

    REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
    should be some kind of attribute declaration instead. I want to
    translate it to something like this: <SUBEVENTSTATUS no="1" of="2">

    I'm trying to make a regex to detect the funny tags. Here is what I have
    so far:

    xml_fix=/<(\S+)\s+(\d+):(\d+)>/

    This is great, but it will match this:

    <Request><code_set_list 1:2>

    instead of just this:

    <code_set_list 1:2>

    ...because there is no gauranteed whitespace between tags. Basically, I
    need to stop matching if a ">" is found. I've never had to deal with
    anything quite like this in my regex experience. Any help or thoughts of
    a better way to do things is much appreciated!

    --
    Posted via http://www.ruby-forum.com/.
     
    Jon, Mar 30, 2007
    #1
    1. Advertising

  2. On 30.03.2007 17:34, Jon wrote:
    > I'm trying to translate a strange derivative of xml into valid xml. Here
    > is an example line:
    >
    > <SUBEVENTSTATUS
    > 1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
    > 1:1><SUBEVENTSTATUS 2:2><......and on
    >
    > REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
    > should be some kind of attribute declaration instead. I want to
    > translate it to something like this: <SUBEVENTSTATUS no="1" of="2">
    >
    > I'm trying to make a regex to detect the funny tags. Here is what I have
    > so far:
    >
    > xml_fix=/<(\S+)\s+(\d+):(\d+)>/
    >
    > This is great, but it will match this:
    >
    > <Request><code_set_list 1:2>
    >
    > instead of just this:
    >
    > <code_set_list 1:2>
    >
    > ..because there is no gauranteed whitespace between tags. Basically, I
    > need to stop matching if a ">" is found. I've never had to deal with
    > anything quite like this in my regex experience. Any help or thoughts of
    > a better way to do things is much appreciated!


    I can think of several solutions:

    /<([^>\s]+)\s+(\d+):(\d+)>/

    Or even a two phased approach

    /<[^>]+>/

    and then with the match
    /(\d+):(\d+)>\z/

    HTH

    robert
     
    Robert Klemme, Mar 30, 2007
    #2
    1. Advertising

  3. Jon

    F. Senault Guest

    Le 30 mars à 17:34, Jon a écrit :

    > ..because there is no gauranteed whitespace between tags. Basically, I
    > need to stop matching if a ">" is found. I've never had to deal with
    > anything quite like this in my regex experience. Any help or thoughts of
    > a better way to do things is much appreciated!


    I'd simply use /<[^>]+\s+(\d+):(\d+)>/ (untested, but you get my
    drift)...

    Fred
    --
    > Microsoft sucks, sucks, sucks.

    Which wouldn't be such a bad thing, if it were cuter, didn't use its
    teeth at inopportune moments, didn't hog the bed, cooked well, and had
    good taste in films. Sadly, that's not the case. (Dan Birchall, SDM)
     
    F. Senault, Mar 30, 2007
    #3
  4. Jon

    Jon Fi Guest

    Robert Klemme wrote:
    > On 30.03.2007 17:34, Jon wrote:
    >>
    >>
    >> <code_set_list 1:2>
    >>
    >> ..because there is no gauranteed whitespace between tags. Basically, I
    >> need to stop matching if a ">" is found. I've never had to deal with
    >> anything quite like this in my regex experience. Any help or thoughts of
    >> a better way to do things is much appreciated!

    >
    > I can think of several solutions:
    >
    > /<([^>\s]+)\s+(\d+):(\d+)>/
    >
    > Or even a two phased approach
    >
    > /<[^>]+>/
    >
    > and then with the match
    > /(\d+):(\d+)>\z/
    >
    > HTH
    >
    > robert



    awesome, and thank you! but for my benefit, could you explain why that
    works? I thought ^ was line start?

    --
    Posted via http://www.ruby-forum.com/.
     
    Jon Fi, Mar 30, 2007
    #4
  5. On Mar 30, 2007, at 11:43 AM, Jon Fi wrote:

    > Robert Klemme wrote:
    >> On 30.03.2007 17:34, Jon wrote:
    >>>
    >>>
    >>> <code_set_list 1:2>
    >>>
    >>> ..because there is no gauranteed whitespace between tags.
    >>> Basically, I
    >>> need to stop matching if a ">" is found. I've never had to deal with
    >>> anything quite like this in my regex experience. Any help or
    >>> thoughts of
    >>> a better way to do things is much appreciated!

    >>
    >> I can think of several solutions:
    >>
    >> /<([^>\s]+)\s+(\d+):(\d+)>/
    >>
    >> Or even a two phased approach
    >>
    >> /<[^>]+>/
    >>
    >> and then with the match
    >> /(\d+):(\d+)>\z/
    >>
    >> HTH
    >>
    >> robert

    >
    >
    > awesome, and thank you! but for my benefit, could you explain why that
    > works? I thought ^ was line start?


    Within a character set it inverts the selection so [^>] matches any
    character that's NOT a '>'

    My solution is: .gsub(/<([^>]*?\b\s+)(\d+):(\d+)>/, '<\1no="\2"
    of="\3">')

    -Rob

    Rob Biedenharn http://agileconsultingllc.com
     
    Rob Biedenharn, Mar 30, 2007
    #5
  6. On Sat, Mar 31, 2007 at 12:34:25AM +0900, Jon wrote:
    > <SUBEVENTSTATUS
    > 1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
    > 1:1><SUBEVENTSTATUS 2:2><......and on
    >
    > REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
    > should be some kind of attribute declaration instead. I want to
    > translate it to something like this: <SUBEVENTSTATUS no="1" of="2">
    >
    > I'm trying to make a regex to detect the funny tags. Here is what I have
    > so far:
    >
    > xml_fix=/<(\S+)\s+(\d+):(\d+)>/
    >
    > This is great, but it will match this:
    >
    > <Request><code_set_list 1:2>
    >
    > instead of just this:
    >
    > <code_set_list 1:2>


    Try (\w+) instead of (\S+)
     
    Brian Candler, Mar 31, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jan Burgy
    Replies:
    2
    Views:
    623
    Jan Burgy
    Aug 16, 2004
  2. Michael Spencer

    Black Magic - Currying using __get__

    Michael Spencer, Mar 24, 2005, in forum: Python
    Replies:
    0
    Views:
    409
    Michael Spencer
    Mar 24, 2005
  3. Andrew Robert

    regex/lambda black magic

    Andrew Robert, May 25, 2006, in forum: Python
    Replies:
    5
    Views:
    426
    John Machin
    May 25, 2006
  4. fdm
    Replies:
    18
    Views:
    724
    Balog Pal
    Oct 5, 2009
  5. Tom Willis

    Ruby black magic? Meta Programming

    Tom Willis, Mar 12, 2005, in forum: Ruby
    Replies:
    4
    Views:
    375
    Mathieu Bouchard
    Mar 13, 2005
Loading...

Share This Page