Regular expression woes

Discussion in 'Javascript' started by Mark (News), Feb 4, 2005.

  1. Mark (News)

    Mark (News) Guest

    I'm not really sure where to post this question as it covers so many
    platforms, but as the platform isn't relevant, here goes...

    I'm trying to (pulling my hair out more like) construct a regular
    expression string that says the following: "match if the input string
    does not start with the characters http". E.g.

    e.g.
    "this string" - match
    "this http string" - match
    "http-and-a-bit-more-text" - no match
    "ht" - match
    "" - match

    I've tried something like ^[^(^http)] but this gives no match on the
    last 2. Any ideas? - I'd really appreciate it!
    Cheers
    Mark
     
    Mark (News), Feb 4, 2005
    #1
    1. Advertising

  2. Mark (News)

    Paul Lalli Guest

    "Mark (News)" <4less.com> wrote in message
    news:...
    > I'm not really sure where to post this question as it covers so many
    > platforms, but as the platform isn't relevant, here goes...


    Incorrect. The platform is exceedingly relevant. Regular expressions
    are not a constant across languages. Perl regular expression are not
    the same as Javascript regular expressions are not the same as PHP
    regular expressions.

    Choose one or the other, tell us what you're *trying* to do, and in what
    environment you're doing it, and then someone can help you.

    Paul Lalli
     
    Paul Lalli, Feb 4, 2005
    #2
    1. Advertising

  3. On Fri, 04 Feb 2005 07:19:44 -0800, Mark (News) wrote:
    > I'm trying to (pulling my hair out more like) construct a regular
    > expression string that says the following: "match if the input string
    > does not start with the characters http". E.g.
    >
    > e.g.
    > "this string" - match
    > "this http string" - match
    > "http-and-a-bit-more-text" - no match
    > "ht" - match
    > "" - match


    So don't match if the string starts with "http":

    $str !~ m/^http/


    -leendert bottelberghs
     
    Leendert Bottelberghs, Feb 4, 2005
    #3
  4. Mark (News)

    Guest

    Mark (News) wrote:
    > I'm not really sure where to post this question as it covers so many
    > platforms, but as the platform isn't relevant, here goes...
    >
    > I'm trying to (pulling my hair out more like) construct a regular
    > expression string that says the following: "match if the input string
    > does not start with the characters http". E.g.


    wouldn't it be:

    $match !~ m/^http/;

    Is there an equivalent negation metacharacter for a word and not just a
    character class? I was just wondering about that.

    wana
     
    , Feb 4, 2005
    #4
  5. Mark (News) wrote:

    > I'm not really sure where to post this question as it covers so many
    > platforms, but as the platform isn't relevant, here goes...
    >
    > I'm trying to (pulling my hair out more like) construct a regular
    > expression string that says the following: "match if the input string
    > does not start with the characters http". E.g.
    >
    > e.g.
    > "this string" - match
    > "this http string" - match
    > "http-and-a-bit-more-text" - no match
    > "ht" - match
    > "" - match
    >
    > I've tried something like ^[^(^http)] but this gives no match on the
    > last 2. Any ideas? - I'd really appreciate it!
    > Cheers
    > Mark


    Use the "does not match" operator, !~.

    if ($my_string !~ /^http/) {
    do_something(); }

    If you're not using perl, well I guess your platform *is* relevant...
    --
    Christopher Mattern

    "Which one you figure tracked us?"
    "The ugly one, sir."
    "...Could you be more specific?"
     
    Chris Mattern, Feb 4, 2005
    #5
  6. Paul Lalli wrote:

    > Incorrect. The platform is exceedingly relevant. Regular expressions
    > are not a constant across languages. Perl regular expression are not
    > the same as Javascript regular expressions are not the same as PHP
    > regular expressions.


    Also, what you're trying to do - negate a match condition - is often easier
    to do in the host language than in the regex itself. For example, in Perl
    you could do what you asked with this:

    if ($some_string !~ /^http/) { ... }
    # or
    unless (/^http/) { ... }

    But that just reinforces Paul's point - the platform is very relevant.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Feb 4, 2005
    #6
  7. Mark (News)

    Mark (News) Guest

    I appreciate all the effort in providing a solution to the wider
    problem, but perhaps I should have been more explicit - my fault.

    I'm specifically trying to avoid using the host shell to do the
    negation even though I can use this approach in just about any
    language. What I'm really after is to contain the logic entirely within
    the regular expression.

    Why? Intellectual exercise. :) (Kind of like why people climb
    mountains, but without having to take my butt off the chair.)

    Cheers
    Mark
     
    Mark (News), Feb 4, 2005
    #7
  8. Mark (News)

    Evertjan. Guest

    Mark (News) wrote on 04 feb 2005 in comp.lang.javascript:

    > I'm not really sure where to post this question as it covers so many
    > platforms, but as the platform isn't relevant, here goes...
    >
    > I'm trying to (pulling my hair out more like) construct a regular
    > expression string that says the following: "match if the input string
    > does not start with the characters http". E.g.
    >
    > e.g.
    > "this string" - match
    > "this http string" - match
    > "http-and-a-bit-more-text" - no match
    > "ht" - match
    > "" - match


    In javascript this function is not match but test:

    var s = "this http string"

    if (!/^http/.test(s))
    alert("Match!")
    else
    alert("No match!")

    --
    Evertjan.
    The Netherlands.
    (Replace all crosses with dots in my emailaddress)
     
    Evertjan., Feb 4, 2005
    #8
  9. Mark (News) wrote:
    > I appreciate all the effort in providing a solution to the wider
    > problem, but perhaps I should have been more explicit - my fault.
    >
    > I'm specifically trying to avoid using the host shell to do the
    > negation even though I can use this approach in just about any
    > language. What I'm really after is to contain the logic entirely within
    > the regular expression.


    You can do it with a zero-width negative look-ahead assertion in perl.

    $string=~/^(?!http)/

    --

    Rasto Levrinc
    http://sourceforge.net/projects/rlocate/
     
    Rasto Levrinc, Feb 4, 2005
    #9
  10. Mark (News)

    Mark (News) Guest

    Wow - quite brilliant!

    Clearly this was far too easy for you. :)

    Cheers
    Mark
     
    Mark (News), Feb 4, 2005
    #10
  11. Rasto Levrinc wrote:

    >> What I'm really after is to contain the logic entirely within
    >> the regular expression.


    > You can do it with a zero-width negative look-ahead assertion in perl.
    >
    > $string=~/^(?!http)/


    Some JavaScript implementations implement regular expressions but
    don't implement look-ahead assertions. Here you would need

    /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)

    ciao, dhgm
     
    Dietmar Meier, Feb 4, 2005
    #11
  12. Mark (News)

    Evertjan. Guest

    Dietmar Meier wrote on 04 feb 2005 in comp.lang.javascript:

    >>> What I'm really after is to contain the logic entirely within
    >>> the regular expression.

    >
    >> You can do it with a zero-width negative look-ahead assertion in perl.
    >>
    >> $string=~/^(?!http)/

    >
    > Some JavaScript implementations implement regular expressions but
    > don't implement look-ahead assertions. Here you would need
    >
    > /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)


    [The $ cannot be right, I think.]

    r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)



    --
    Evertjan.
    The Netherlands.
    (Replace all crosses with dots in my emailaddress)
     
    Evertjan., Feb 4, 2005
    #12
  13. Evertjan. wrote:

    >> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)


    > [The $ cannot be right, I think.]


    For what value of string do you think, the "$" would lead to the
    wrong result?

    > r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)


    This would not match strings with 3 or less characters.

    ciao, dhgm
     
    Dietmar Meier, Feb 4, 2005
    #13
  14. Mark (News)

    Evertjan. Guest

    Dietmar Meier wrote on 04 feb 2005 in comp.lang.javascript:

    > Evertjan. wrote:
    >
    >>> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)

    >
    >> [The $ cannot be right, I think.]

    >
    > For what value of string do you think, the "$" would lead to the
    > wrong result?


    "xttp://" should return true
    "http://" should return false

    Yes, you are right here.

    >> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)

    >
    > This would not match strings with 3 or less characters.


    Yes, you are right again.

    Let me try:

    r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)

    [I could loose some () but I like them for clarity

    --
    Evertjan.
    The Netherlands.
    (Replace all crosses with dots in my emailaddress)
     
    Evertjan., Feb 4, 2005
    #14
  15. Mark (News)

    Grant Wagner Guest

    "Dietmar Meier" <> wrote in
    message news:...
    > Rasto Levrinc wrote:
    >
    >>> What I'm really after is to contain the logic entirely within
    >>> the regular expression.

    >
    >> You can do it with a zero-width negative look-ahead assertion in
    >> perl.
    >>
    >> $string=~/^(?!http)/

    >
    > Some JavaScript implementations implement regular expressions but
    > don't implement look-ahead assertions. Here you would need
    >
    > /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)


    Why do people insist on doing things the hardest way possible. Test for
    the condition you don't want, then negate it.

    if (!/^http/i.test(some_string)) { ... }

    By the way, this is pretty much the same solution already provided for
    Perl:

    if ($some_string !~ /^http/) { ... }

    (although I chose to make it case-insensitive, since the protocol in a
    URI isn't case-sensitive, it could be upper, lower or mixed case)

    --
    Grant Wagner <>
    comp.lang.javascript FAQ - http://jibbering.com/faq
     
    Grant Wagner, Feb 4, 2005
    #15
  16. Mark (News)

    Mark (News) Guest

    "Why do people insist on doing things the hardest way possible."? Well,
    as I said in an earlier post, I wanted to do the whole thing within a
    regex rather than resorting to the shell. Mainly because, crazy as it
    sounds, it's a fun intellectual exercise. :) And anyway, if I always
    take the path of least resistance, I'll never learn, right? (But I
    guess that's OT.)
     
    Mark (News), Feb 4, 2005
    #16
  17. "Evertjan." <> wrote in message
    news:Xns95F3C5F131A46eejj99@194.109.133.29...
    > Dietmar Meier wrote on 04 feb 2005 in comp.lang.javascript:
    >
    > > Evertjan. wrote:
    > >
    > >>> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)

    > >
    > >> [The $ cannot be right, I think.]

    > >
    > > For what value of string do you think, the "$" would lead to the
    > > wrong result?

    >
    > "xttp://" should return true
    > "http://" should return false
    >
    > Yes, you are right here.
    >
    > >> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)

    > >
    > > This would not match strings with 3 or less characters.

    >
    > Yes, you are right again.
    >
    > Let me try:
    >
    > r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)
    >
    > [I could loose some () but I like them for clarity
    >


    None of those regular expressions will work. For example, you regexp will
    not match against "this string", since it differs in 4 places in the first 4
    characters.

    You cannot negate a string by negating each character. If you really wanted
    to do it in that way, you would have to negate all possible combinations of
    letters in "http". So, just for fun, it would look something like this
    (newlines added for clarity):

    /^(
    ([^h][^t][^t][^p])|

    (h[^t][^t][^p])|
    ([^h]t[^t][^p])|
    ([^h][^t]t[^p])|
    ([^h][^t][^t]p)|

    (ht[^t][^p])|
    (h[^t]t[^p])|
    (h[^t][^t]p)|
    ([^h]tt[^p])|
    ([^h]t[^t]p)|
    ([^h][^t]tp)|

    (htt[^p])|
    (ht[^t]p)|
    (h[^t]tp)|
    ([^h]ttp)

    )|(.{0,3}$)/

    The moral of this story: "negating" a string in regular expressions is very,
    very ugly (without negative look ahead). Your best bet, as many others have
    mentioned, is to do something akin to perl's !~, i.e. match against ^http,
    and consider matches to be, well, not matches.
     
    Richards Noah \(IFR LIT MET\), Feb 4, 2005
    #17
  18. Mark (News)

    Evertjan. Guest

    Richards Noah (IFR LIT MET) wrote on 04 feb 2005 in
    comp.lang.javascript:

    >> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)
    >>
    >> [I could loose some () but I like them for clarity
    >>

    >
    > None of those regular expressions will work. For example, you regexp
    > will not match against "this string", since it differs in 4 places in
    > the first 4 characters.
    >


    s = "this string"
    r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)
    alert(r)

    shows: true as per OQ.

    So what is the problem?

    Please show a string that does not work.

    --
    Evertjan.
    The Netherlands.
    (Replace all crosses with dots in my emailaddress)
     
    Evertjan., Feb 4, 2005
    #18
  19. [A complimentary Cc of this posting was sent to
    Evertjan.
    <>], who wrote in article <Xns95F3E7D7DF08Ceejj99@194.109.133.29>:
    > r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)


    Too much work.

    [^h]
    | h[^t]
    | ht[^t]
    | htt[^p]
    | .{0,3}$

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Feb 5, 2005
    #19
  20. Mark (News)

    Mark (News) Guest

    Is it true that if zero-width negative look-ahead is not available,
    there is always an alternative regex to do the job?
     
    Mark (News), Feb 5, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stephajn Craig

    Regular Expression Woes

    Stephajn Craig, Jul 17, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    414
    Chris R. Timmons
    Jul 18, 2003
  2. VSK
    Replies:
    2
    Views:
    2,392
  3. =?Utf-8?B?SmltIE1hY2U=?=

    Regular Expression Woes

    =?Utf-8?B?SmltIE1hY2U=?=, May 25, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    333
    =?Utf-8?B?SmltIE1hY2U=?=
    May 25, 2004
  4. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    886
    Alan Moore
    Dec 2, 2005
  5. Mark (News)

    Regular expression woes

    Mark (News), Feb 4, 2005, in forum: Perl Misc
    Replies:
    23
    Views:
    236
    Ilya Zakharevich
    Feb 8, 2005
Loading...

Share This Page