Usename regex

Discussion in 'ASP .Net' started by Mick Walker, Aug 1, 2007.

  1. Mick Walker

    Mick Walker Guest

    Hi All,

    I would like to know how I can limit users to only registering usernames
    which have alphanumberic characters and one underscore.
    From what I understand is I can use a regex to do this. But I have no
    idea how to do it. I have only used a customfieldvalidor before. And the
    reason I cant do it now is because of the design of the page itself.

    So basically I need to know how, upon a button click, I can compare the
    text entered in a textbox, to a regex, if its invalid throw a error, if
    its valid proceed.

    Thanks.
    Mick Walker, Aug 1, 2007
    #1
    1. Advertising

  2. Hello Mick,

    > Hi All,
    >
    > I would like to know how I can limit users to only registering
    > usernames
    > which have alphanumberic characters and one underscore.
    > From what I understand is I can use a regex to do this. But I have no
    > idea how to do it. I have only used a customfieldvalidor before. And
    > the
    > reason I cant do it now is because of the design of the page itself.
    >
    > So basically I need to know how, upon a button click, I can compare
    > the text entered in a textbox, to a regex, if its invalid throw a
    > error, if its valid proceed.
    >
    > Thanks.
    >


    You can indeed use Regex for this. It's actually a simple expression:

    because a regex describes the input from left to right you need to account
    for all possible locations of the single '_' you're allowing.

    It would be even simpler to just allow underscores as a rule.

    The expression comes down to this:

    ^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$

    ^ First make sure we're matching from the beginning of the input.
    ( We have multiple options
    _[a-zA-Z0-9]+ first find the underscore, followed by an unlimited number
    of alphanumeric characters
    | OR....
    [a-zA-Z0-9]+_? find an unlimited number of alphanumeric characters followed
    by an optional underscore
    | OR....
    [a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of alphanumeric
    characters followed by an underscore, followed by more characters.
    ) No more options
    $ End of the string.

    Now add a Regex validator to your page. Set the expression to the expression
    above and the control to validate to your textbox and you're all done.

    Jesse
    Jesse Houwing, Aug 1, 2007
    #2
    1. Advertising

  3. On Aug 1, 8:04 pm, Jesse Houwing <>
    wrote:
    > Hello Mick,
    >
    >
    >
    >
    >
    > > Hi All,

    >
    > > I would like to know how I can limit users to only registering
    > > usernames
    > > which have alphanumberic characters and one underscore.
    > > From what I understand is I can use a regex to do this. But I have no
    > > idea how to do it. I have only used a customfieldvalidor before. And
    > > the
    > > reason I cant do it now is because of the design of the page itself.

    >
    > > So basically I need to know how, upon a button click, I can compare
    > > the text entered in a textbox, to a regex, if its invalid throw a
    > > error, if its valid proceed.

    >
    > > Thanks.

    >
    > You can indeed use Regex for this. It's actually a simple expression:
    >
    > because a regex describes the input from left to right you need to account
    > for all possible locations of the single '_' you're allowing.
    >
    > It would be even simpler to just allow underscores as a rule.
    >
    > The expression comes down to this:
    >
    > ^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    >
    > ^ First make sure we're matching from the beginning of the input.
    > ( We have multiple options
    > _[a-zA-Z0-9]+ first find the underscore, followed by an unlimited number
    > of alphanumeric characters
    > | OR....
    > [a-zA-Z0-9]+_? find an unlimited number of alphanumeric characters followed
    > by an optional underscore
    > | OR....
    > [a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of alphanumeric
    > characters followed by an underscore, followed by more characters.
    > ) No more options
    > $ End of the string.
    >
    > Now add a Regex validator to your page. Set the expression to the expression
    > above and the control to validate to your textbox and you're all done.
    >
    > Jesse- Hide quoted text -
    >
    > - Show quoted text -


    I just can add if the underscore is allowed in the middle of a word
    only (I think which is usual for usernames), then expression can be as

    ^([a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    Alexey Smirnov, Aug 1, 2007
    #3
  4. Hello Alexey,

    > On Aug 1, 8:04 pm, Jesse Houwing <>
    > wrote:
    >
    >> Hello Mick,
    >>
    >>> Hi All,
    >>>
    >>> I would like to know how I can limit users to only registering
    >>> usernames
    >>> which have alphanumberic characters and one underscore.
    >>> From what I understand is I can use a regex to do this. But I have
    >>> no
    >>> idea how to do it. I have only used a customfieldvalidor before. And
    >>> the
    >>> reason I cant do it now is because of the design of the page itself.
    >>> So basically I need to know how, upon a button click, I can compare
    >>> the text entered in a textbox, to a regex, if its invalid throw a
    >>> error, if its valid proceed.
    >>>
    >>> Thanks.
    >>>

    >> You can indeed use Regex for this. It's actually a simple expression:
    >>
    >> because a regex describes the input from left to right you need to
    >> account for all possible locations of the single '_' you're allowing.
    >>
    >> It would be even simpler to just allow underscores as a rule.
    >>
    >> The expression comes down to this:
    >>
    >> ^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    >>
    >> ^ First make sure we're matching from the beginning of the input.
    >> ( We have multiple options
    >> _[a-zA-Z0-9]+ first find the underscore, followed by an unlimited
    >> number
    >> of alphanumeric characters
    >> | OR....
    >> [a-zA-Z0-9]+_? find an unlimited number of alphanumeric characters
    >> followed
    >> by an optional underscore
    >> | OR....
    >> [a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of
    >> alphanumeric
    >> characters followed by an underscore, followed by more characters.
    >> ) No more options
    >> $ End of the string.
    >> Now add a Regex validator to your page. Set the expression to the
    >> expression above and the control to validate to your textbox and
    >> you're all done.
    >>
    >> Jesse- Hide quoted text -
    >>
    >> - Show quoted text -
    >>

    > I just can add if the underscore is allowed in the middle of a word
    > only (I think which is usual for usernames), then expression can be as
    >
    > ^([a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    >



    This regex *forces* the username to contain a '_'

    And don't use this:
    ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

    As it will cause excessive backtracking and use lot's of CPU under the wrong
    circumstances.


    Jess
    Jesse Houwing, Aug 1, 2007
    #4
  5. On Aug 1, 9:32 pm, Jesse Houwing <>
    wrote:
    > Hello Alexey,
    >
    >
    >
    >
    >
    > > On Aug 1, 8:04 pm, Jesse Houwing <>
    > > wrote:

    >
    > >> Hello Mick,

    >
    > >>> Hi All,

    >
    > >>> I would like to know how I can limit users to only registering
    > >>> usernames
    > >>> which have alphanumberic characters and one underscore.
    > >>> From what I understand is I can use a regex to do this. But I have
    > >>> no
    > >>> idea how to do it. I have only used a customfieldvalidor before. And
    > >>> the
    > >>> reason I cant do it now is because of the design of the page itself.
    > >>> So basically I need to know how, upon a button click, I can compare
    > >>> the text entered in a textbox, to a regex, if its invalid throw a
    > >>> error, if its valid proceed.

    >
    > >>> Thanks.

    >
    > >> You can indeed use Regex for this. It's actually a simple expression:

    >
    > >> because a regex describes the input from left to right you need to
    > >> account for all possible locations of the single '_' you're allowing.

    >
    > >> It would be even simpler to just allow underscores as a rule.

    >
    > >> The expression comes down to this:

    >
    > >> ^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$

    >
    > >> ^ First make sure we're matching from the beginning of the input.
    > >> ( We have multiple options
    > >> _[a-zA-Z0-9]+ first find the underscore, followed by an unlimited
    > >> number
    > >> of alphanumeric characters
    > >> | OR....
    > >> [a-zA-Z0-9]+_? find an unlimited number of alphanumeric characters
    > >> followed
    > >> by an optional underscore
    > >> | OR....
    > >> [a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of
    > >> alphanumeric
    > >> characters followed by an underscore, followed by more characters.
    > >> ) No more options
    > >> $ End of the string.
    > >> Now add a Regex validator to your page. Set the expression to the
    > >> expression above and the control to validate to your textbox and
    > >> you're all done.

    >
    > >> Jesse- Hide quoted text -

    >
    > >> - Show quoted text -

    >
    > > I just can add if the underscore is allowed in the middle of a word
    > > only (I think which is usual for usernames), then expression can be as

    >
    > > ^([a-zA-Z0-9]+_[a-zA-Z0-9]+)$

    >
    > This regex *forces* the username to contain a '_'
    >
    > And don't use this:
    > ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    >
    > As it will cause excessive backtracking and use lot's of CPU under the wrong
    > circumstances.
    >
    > Jesse- Hide quoted text -
    >
    > - Show quoted text -


    sorry, it's just a test post: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    Alexey Smirnov, Aug 1, 2007
    #5
  6. On Aug 1, 11:08 pm, Alexey Smirnov <> wrote:
    > On Aug 1, 9:32 pm, Jesse Houwing <>
    > wrote:
    >
    >
    >
    >
    >
    > > Hello Alexey,

    >
    > > > On Aug 1, 8:04 pm, Jesse Houwing <>
    > > > wrote:

    >
    > > >> Hello Mick,

    >
    > > >>> Hi All,

    >
    > > >>> I would like to know how I can limit users to only registering
    > > >>> usernames
    > > >>> which have alphanumberic characters and one underscore.
    > > >>> From what I understand is I can use a regex to do this. But I have
    > > >>> no
    > > >>> idea how to do it. I have only used a customfieldvalidor before. And
    > > >>> the
    > > >>> reason I cant do it now is because of the design of the page itself.
    > > >>> So basically I need to know how, upon a button click, I can compare
    > > >>> the text entered in a textbox, to a regex, if its invalid throw a
    > > >>> error, if its valid proceed.

    >
    > > >>> Thanks.

    >
    > > >> You can indeed use Regex for this. It's actually a simple expression:

    >
    > > >> because a regex describes the input from left to right you need to
    > > >> account for all possible locations of the single '_' you're allowing.

    >
    > > >> It would be even simpler to just allow underscores as a rule.

    >
    > > >> The expression comes down to this:

    >
    > > >> ^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$

    >
    > > >> ^ First make sure we're matching from the beginning of the input.
    > > >> ( We have multiple options
    > > >> _[a-zA-Z0-9]+ first find the underscore, followed by an unlimited
    > > >> number
    > > >> of alphanumeric characters
    > > >> | OR....
    > > >> [a-zA-Z0-9]+_? find an unlimited number of alphanumeric characters
    > > >> followed
    > > >> by an optional underscore
    > > >> | OR....
    > > >> [a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of
    > > >> alphanumeric
    > > >> characters followed by an underscore, followed by more characters.
    > > >> ) No more options
    > > >> $ End of the string.
    > > >> Now add a Regex validator to your page. Set the expression to the
    > > >> expression above and the control to validate to your textbox and
    > > >> you're all done.

    >
    > > >> Jesse- Hide quoted text -

    >
    > > >> - Show quoted text -

    >
    > > > I just can add if the underscore is allowed in the middle of a word
    > > > only (I think which is usual for usernames), then expression can be as

    >
    > > > ^([a-zA-Z0-9]+_[a-zA-Z0-9]+)$

    >
    > > This regex *forces* the username to contain a '_'

    >
    > > And don't use this:
    > > ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

    >
    > > As it will cause excessive backtracking and use lot's of CPU under the wrong
    > > circumstances.

    >
    > > Jesse- Hide quoted text -

    >
    > > - Show quoted text -

    >
    > sorry, it's just a test post: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$- Hide quoted text -
    >
    > - Show quoted text -


    hmm, I think I deleted by mistake the (?) from the expression. It's a
    silly typo, sorry about this

    But I don't get what's wrong with the following expression

    ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

    basically, it's a copy of your expression, I just deleted first two
    parts from it.
    Alexey Smirnov, Aug 1, 2007
    #6
  7. Hello Alexey,

    > On Aug 1, 11:08 pm, Alexey Smirnov <> wrote:
    >
    >> On Aug 1, 9:32 pm, Jesse Houwing <>
    >> wrote:
    >>
    >>> Hello Alexey,
    >>>
    >>>> On Aug 1, 8:04 pm, Jesse Houwing <>
    >>>> wrote:
    >>>>
    >>>>> Hello Mick,
    >>>>>
    >>>>>> Hi All,
    >>>>>>
    >>>>>> I would like to know how I can limit users to only registering
    >>>>>> usernames
    >>>>>> which have alphanumberic characters and one underscore.
    >>>>>> From what I understand is I can use a regex to do this. But I
    >>>>>> have
    >>>>>> no
    >>>>>> idea how to do it. I have only used a customfieldvalidor before.
    >>>>>> And
    >>>>>> the
    >>>>>> reason I cant do it now is because of the design of the page
    >>>>>> itself.
    >>>>>> So basically I need to know how, upon a button click, I can
    >>>>>> compare
    >>>>>> the text entered in a textbox, to a regex, if its invalid throw a
    >>>>>> error, if its valid proceed.
    >>>>>> Thanks.
    >>>>>>
    >>>>> You can indeed use Regex for this. It's actually a simple
    >>>>> expression:
    >>>>>
    >>>>> because a regex describes the input from left to right you need to
    >>>>> account for all possible locations of the single '_' you're
    >>>>> allowing.
    >>>>>
    >>>>> It would be even simpler to just allow underscores as a rule.
    >>>>>
    >>>>> The expression comes down to this:
    >>>>>
    >>>>> ^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    >>>>>
    >>>>> ^ First make sure we're matching from the beginning of the
    >>>>> input.
    >>>>> ( We have multiple options
    >>>>> _[a-zA-Z0-9]+ first find the underscore, followed by an
    >>>>> unlimited
    >>>>> number
    >>>>> of alphanumeric characters
    >>>>> | OR....
    >>>>> [a-zA-Z0-9]+_? find an unlimited number of alphanumeric
    >>>>> characters
    >>>>> followed
    >>>>> by an optional underscore
    >>>>> | OR....
    >>>>> [a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of
    >>>>> alphanumeric
    >>>>> characters followed by an underscore, followed by more characters.
    >>>>> ) No more options
    >>>>> $ End of the string.
    >>>>> Now add a Regex validator to your page. Set the expression to the
    >>>>> expression above and the control to validate to your textbox and
    >>>>> you're all done.
    >>>>> Jesse- Hide quoted text -
    >>>>>
    >>>>> - Show quoted text -
    >>>>>
    >>>> I just can add if the underscore is allowed in the middle of a word
    >>>> only (I think which is usual for usernames), then expression can be
    >>>> as
    >>>>
    >>>> ^([a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    >>>>
    >>> This regex *forces* the username to contain a '_'
    >>>
    >>> And don't use this:
    >>> ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    >>> As it will cause excessive backtracking and use lot's of CPU under
    >>> the wrong circumstances.
    >>>
    >>> Jesse- Hide quoted text -
    >>>
    >>> - Show quoted text -
    >>>

    >> sorry, it's just a test post: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$- Hide
    >> quoted text -
    >>
    >> - Show quoted text -
    >>

    > hmm, I think I deleted by mistake the (?) from the expression. It's a
    > silly typo, sorry about this
    >
    > But I don't get what's wrong with the following expression
    >
    > ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    >
    > basically, it's a copy of your expression, I just deleted first two
    > parts from it.
    >




    The problem is in the fact that it allows for excessive backtracking. Think
    of a string, preferably very long that contains only alphanumeric characters,
    but end in a # sign. This regex will try every combination of the first and
    the second part until all options are exhausted. This can take quite a while.

    I'll try to explain it more graphically so that it's easier to understand.

    Usually what would happen is this:

    input: aaaaa#
    regex: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

    the first part will try to match and captures
    aaaaa

    Then the parser finds the # sign, it's neither a _ so it skips that, nor
    another [a-z0-9A-Z], so it backtracks one position.
    aaaa

    Now the second [a-zA-Z0-9]+ can match the last a
    aaaa|a (| is to show which part of the regex matched what).

    But now it still cannot match the # at the end. So the engine backtracks
    one more position and the second part matches the last two a's.
    aaa|aa

    Still no match...
    aa|aaa

    Still no match
    a|aaaa

    It cannot backtrack further. The first [a-aA-Z0-9] isn't optional. The engine
    finally concludes that no match can be found.

    Remember that when the input gets longer and longer the amount of backtracking
    increases. This is especially bad if there's more regex to come after that...

    Now back to my regular expression.

    I've split the problem into three possible solutions. The enige will try
    each solution and if it fails tries the next.

    So again:

    input: aaaaa#
    regex: ^([a-zA-Z0-9]+_?|_[a-zA-Z0-9]+|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$

    The engine first tries the first part of the regex [a-zA-Z0-9]+

    It captures
    aaaaa

    but then fails. There's still another character in the input and it's not
    a _. There's no need for backtracking. There's no other solution possible.
    Th efirst part of the regex is ignored. So it tries the second part

    It captures nothing, the string doesn't start with a _

    It tries the last part It again captures aaaaa, but then fails because there's
    no _.

    The end result is the samen: No match.

    But my regex took only 3 attempts to find that out. And regardless of the
    length of the input will keep doing 3 tries.

    Yours took 5 tries and will take an extra try for every possible good character
    at the start of the string. So if you have an input of 100 characters it
    would do 100 passes.

    I hope this helps you in understanding why your regex has some issues ;)

    Jesse
    Jesse Houwing, Aug 1, 2007
    #7
  8. On Aug 1, 11:42 pm, Jesse Houwing <>
    wrote:
    > The problem is in the fact that it allows for excessive backtracking. Think
    > of a string, preferably very long that contains only alphanumeric characters,
    > but end in a # sign. This regex will try every combination of the first and
    > the second part until all options are exhausted. This can take quite a while.
    >


    Jesse,

    I understand now why I forget about the (?) in my first post. I
    misread your code. I thought you use the (?) in the third part of your
    expression to find only one (_). I'm sorry I just can't believe I was
    so inattentive.

    Regarding backtracking. I understand now the difference between the
    two expressions, and I believe what you are saying, but I did a small
    quick test of how our expressions work for short and very long strings
    in reality and it looks like there is no big difference.

    Here's the result of the test

    ----------------------------------------------------------------------------
    Regular expression benchmark
    ----------------------------
    Regular expressions : 2
    Test strings : 3
    Iterations : 10000
    Total regex calls : (10000 * 3 * 2) = 60000

    RE: ^([a-zA-Z0-9]+_?|_[a-zA-Z0-9]+|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    MS MAX AVG MIN DEV INPUT
    61 16,759 0,0061 0,0031 0,1768 'aaaaa#'
    101 16,778 0,0101 0,0067 0,1771 'some_realusername'
    178 16,786 0,0178 0,0137 0,178
    'verylongusernamestringwithnumbersandlet
    tersandwithoutunderscoresymbolbutveryveryverylong1234verylong'
    RE: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    MS MAX AVG MIN DEV INPUT
    51 3,251 0,0051 0,0039 0,038 'aaaaa#'
    82 3,255 0,0082 0,007 0,038 'some_realusername'
    161 3,263 0,0161 0,0142 0,047
    'verylongusernamestringwithnumbersandlet
    tersandwithoutunderscoresymbolbutveryveryverylong1234verylong'
    ----------------------------------------------------------------------------

    As you can see sometimes my expression is quicker :)

    The code of the benchmark tool is here
    http://www.codinghorror.com/files/code/regexbenchmark.zip

    I compiled it in .NET2 and ran without debugger. If I run the
    benchmark 4-5 times very often I see that my expression has the best
    reported score, even for a long strings. So, I would say that

    ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

    is better, because it's shorter
    Alexey Smirnov, Aug 2, 2007
    #8
  9. On Aug 2, 1:08 am, Alexey Smirnov <> wrote:
    > On Aug 1, 11:42 pm, Jesse Houwing <>
    > wrote:
    >


    A javascript-based test looks similar to one I did in VB.NET

    <HTML>
    <HEAD>
    <SCRIPT LANGUAGE="JavaScript">
    go=function() {
    //var re = new RegExp('^([a-zA-Z0-9]+_?|_[a-zA-Z0-9]+|[a-zA-Z0-9]+_[a-
    zA-Z0-9]+)$');
    var re = new RegExp('^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$');
    start = new Date();
    for (var i=0; i<=10000; i++)
    {
    x=window.document.getElementById('txtUser').value.match(re);
    }
    end = new Date();
    forLoopTime = end-start;
    alert(forLoopTime + 'ms');
    return false;
    }
    </SCRIPT>

    </HEAD>
    <BODY>
    <form id="form1" runat="server">
    <input type=text ID="txtUser">
    <input type=submit OnClick="javascript:return go();" value=test>
    </form>
    </BODY>
    </HTML>
    Alexey Smirnov, Aug 2, 2007
    #9
  10. Hello Alexey,

    > On Aug 1, 11:42 pm, Jesse Houwing <>
    > wrote:
    >
    >> The problem is in the fact that it allows for excessive backtracking.
    >> Think of a string, preferably very long that contains only
    >> alphanumeric characters, but end in a # sign. This regex will try
    >> every combination of the first and the second part until all options
    >> are exhausted. This can take quite a while.
    >>

    > Jesse,
    >
    > I understand now why I forget about the (?) in my first post. I
    > misread your code. I thought you use the (?) in the third part of your
    > expression to find only one (_). I'm sorry I just can't believe I was
    > so inattentive.
    >
    > Regarding backtracking. I understand now the difference between the
    > two expressions, and I believe what you are saying, but I did a small
    > quick test of how our expressions work for short and very long strings
    > in reality and it looks like there is no big difference.
    >
    > Here's the result of the test
    >
    > ----------------------------------------------------------------------
    > ------
    > Regular expression benchmark
    > ----------------------------
    > Regular expressions : 2
    > Test strings : 3
    > Iterations : 10000
    > Total regex calls : (10000 * 3 * 2) = 60000
    > RE: ^([a-zA-Z0-9]+_?|_[a-zA-Z0-9]+|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    > MS MAX AVG MIN DEV INPUT
    > 61 16,759 0,0061 0,0031 0,1768 'aaaaa#'
    > 101 16,778 0,0101 0,0067 0,1771 'some_realusername'
    > 178 16,786 0,0178 0,0137 0,178
    > 'verylongusernamestringwithnumbersandlet
    > tersandwithoutunderscoresymbolbutveryveryverylong1234verylong'
    > RE: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    > MS MAX AVG MIN DEV INPUT
    > 51 3,251 0,0051 0,0039 0,038 'aaaaa#'
    > 82 3,255 0,0082 0,007 0,038 'some_realusername'
    > 161 3,263 0,0161 0,0142 0,047
    > 'verylongusernamestringwithnumbersandlet
    > tersandwithoutunderscoresymbolbutveryveryverylong1234verylong'
    > ----------------------------------------------------------------------
    > ------
    >
    > As you can see sometimes my expression is quicker :)
    >
    > The code of the benchmark tool is here
    > http://www.codinghorror.com/files/code/regexbenchmark.zip
    > I compiled it in .NET2 and ran without debugger. If I run the
    > benchmark 4-5 times very often I see that my expression has the best
    > reported score, even for a long strings. So, I would say that
    >
    > ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    >
    > is better, because it's shorter
    >




    Yours is probably faster I believe that, less alternatives for correct strings.
    :) but have you tried a very long incorrect input? Something like (the longer
    the better)

    aaaaaaaaaaaavvvvvbbbbbbbbbbbbbbbbbbbbbbbeeeeeeeeeeeeeeeeeeeAAAAAAAAAAAAAAA666666666666666666666666666666666666666666666&

    If the textbox in question is limited to say 16 characters you'd probably
    never trigger the problem, but a hacker could just bypass the length of the
    textbox and supply a very very long string and if he'd send such strings
    in quick succession he'd essentially cause a denial of service.

    There are some optimizations that are possible for thsi regex if you'd want
    to try them. The enige will try the first option first, if that fails it
    will try the second. Keeping that in mind, the order of expressions is of
    importance. Say that you usually have no '_' in usernames, but if they're
    there you mostly have them somewhere in between. Then you'd have to change
    the order to improve performance.
    ^([a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?|[a-zA-Z0-9]+_|_[a-zA-Z0-9]+)$

    Also my expression allows an underscore both at the start, the end and in
    between. Yours only covers the last option. I would probably rewrite my expression
    to this to make it do the same as your expression. It is almost as short,
    but does not allow excessive backtracking.
    ^[a-zA-Z0-9]+(_[a-zA-Z0-9]+)?$

    And then there is the option to handle upper and lower case characters in
    the character groups, or in a regexoption. I'm actually not sure which is
    faster, but it's worth a try... Wrapping the whole expression with (?i: ...
    ) should do the trick. all [a-zA-Z] can then be replaced with just [a-z]
    ^(?i:....)$

    And finally to suppress grouping & capturing we can improve performance by
    replacing the normal ( ... ) with (?: ... ).

    To optimize your expression further you could completely remove the ( ...
    ) they have no use.

    So a fair showdown of expressions would be:
    Yours: ^[a-zA-Z0-9]+_?[a-zA-Z0-9]+$
    Mine: ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$

    Or with case insensitivity:
    Yours: ^(?i:[a-z0-9]+_?[a-z0-9]+)$
    Mine: ^(?i:[a-z0-9]+(?:_[a-z0-9]+)?)$

    I ran it using the same tool you used on my Opteron 185 (2x2.6Ghz) under
    Windows Vista x64. I made a small change to the code of the benchmark though,
    the original benchmark takes the Compile hit in the results. I did one call
    using the specific expression outside of the loop so that the results can
    be compared more easily.

    Input's used:
    0 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",

    1 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", _
    2 "aaaa", _
    3 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%",
    _
    4 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%", _
    5 "aaaa%", _
    6 "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    _
    7 "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaa", _
    8 "aa_aa", _
    9 "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%",
    _
    10 "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaa%", _
    11 "aa_aa%" _

    Results:
    Regular expression benchmark
    ----------------------------
    Regular expressions : 4
    Test strings : 12
    Iterations : 10000
    Total regex calls : (10000 * 12 * 4) = 480000

    Pass 1, measure every regex as it runs (slower due to timing code)...
    ------------------------------------------
    Regular expression library: System.Text.RegularExpressions

    Total time taken: 58905
    ------------------------------------------
    Pass 2, measure total time only..
    ------------------------------------------
    Regular expression library: System.Text.RegularExpressions

    RE: ^[a-zA-Z0-9]+_?[a-zA-Z0-9]+$
    MS MAX AVG MIN DEV INPUT
    110 0,365 0,011 0,0101 0,0073 '0'
    145 0,387 0,0145 0,0134 0,0088 '1'
    167 0,391 0,0167 0,0154 0,0095 '2'
    17804 10,683 1,7804 1,6751 0,3004 '3'
    18291 10,756 1,8291 1,7206 0,3042 '4'
    18319 10,78 1,8319 1,7231 0,3046 '5'
    18429 10,791 1,8429 1,7335 0,3055 '6'
    18464 10,794 1,8464 1,7365 0,3058 '7'
    18487 10,797 1,8487 1,7388 0,3059 '8'
    18741 10,842 1,8741 1,7625 0,3079 '9'
    18891 10,856 1,8891 1,7762 0,3129 '10'
    18918 10,88 1,8918 1,7784 0,3135 '11'
    RE: ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$
    MS MAX AVG MIN DEV INPUT
    107 0,229 0,0107 0,0101 0,0049 '0'
    141 0,484 0,0141 0,0134 0,0076 '1'
    163 0,562 0,0163 0,0154 0,0085 '2'
    512 1,015 0,0512 0,0489 0,0166 '3'
    585 1,022 0,0585 0,0559 0,018 '4'
    611 1,025 0,0611 0,0584 0,0185 '5'
    715 1,035 0,0715 0,0682 0,0204 '6'
    749 1,038 0,0749 0,0715 0,0207 '7'
    773 1,604 0,0773 0,0735 0,0258 '8'
    1010 1,634 0,101 0,0961 0,0299 '9'
    1072 1,64 0,1072 0,102 0,0304 '10'
    1097 1,643 0,1097 0,1045 0,0305 '11'
    RE: ^(?i:[a-z0-9]+_?[a-z0-9]+)$
    MS MAX AVG MIN DEV INPUT
    241 2,098 0,0241 0,0204 0,0231 '0'
    299 2,108 0,0299 0,0254 0,0241 '1'
    325 2,11 0,0325 0,0279 0,0242 '2'
    34692 17,108 3,4692 3,0303 0,4154 '3'
    35653 17,191 3,5653 3,113 0,4274 '4'
    35694 17,195 3,5694 3,1163 0,4284 '5'
    35932 17,215 3,5932 3,137 0,4298 '6'
    35990 17,22 3,599 3,1417 0,4304 '7'
    36018 17,223 3,6018 3,1443 0,4305 '8'
    36505 17,29 3,6505 3,1867 0,4337 '9'
    36782 17,493 3,6782 3,2107 0,438 '10'
    36815 17,496 3,6815 3,2138 0,4387 '11'
    RE: ^(?i:[a-z0-9]+(?:_[a-z0-9]+)?)$
    MS MAX AVG MIN DEV INPUT
    236 1,798 0,0236 0,0201 0,0195 '0'
    291 1,811 0,0291 0,0251 0,0201 '1'
    320 4,578 0,032 0,0274 0,0497 '2'
    946 4,635 0,0946 0,0827 0,0546 '3'
    1067 4,645 0,1067 0,093 0,0611 '4'
    1097 4,648 0,1097 0,0958 0,0613 '5'
    1333 4,669 0,1333 0,1162 0,0631 '6'
    1388 4,674 0,1388 0,1212 0,0633 '7'
    1415 4,677 0,1415 0,1235 0,0634 '8'
    1791 4,711 0,1791 0,1576 0,0654 '9'
    1890 4,989 0,189 0,1659 0,0814 '10'
    1920 4,999 0,192 0,1687 0,0816 '11'
    Total time taken: 58088
    ------------------------------------------
    Press ENTER to continue...

    I wouldn't have thought that case insensitivity would make such a big difference.
    Learned something again today :).

    My expression, without the case insensitivity is clearly the fastest:
    RE: ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$

    So I'd say this one is better, as it's only 3 characters longer than yours,
    but much faster and without a possible DoS problem.

    But as the original post didn't say that the underscore could only be between
    alphanumeric characters. It said that there was one '_' allowed in teh string
    (read anywhere in the string) my first expression best covers the problem.
    ^(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?|[a-zA-Z0-9]+_|_[a-zA-Z0-9]+)$

    Jesse
    Jesse Houwing, Aug 2, 2007
    #10
  11. Hello Jesse,

    > Hello Alexey,
    >
    >> On Aug 1, 11:42 pm, Jesse Houwing <>
    >> wrote:
    >>
    >>> The problem is in the fact that it allows for excessive
    >>> backtracking. Think of a string, preferably very long that contains
    >>> only alphanumeric characters, but end in a # sign. This regex will
    >>> try every combination of the first and the second part until all
    >>> options are exhausted. This can take quite a while.
    >>>

    >> Jesse,
    >>
    >> I understand now why I forget about the (?) in my first post. I
    >> misread your code. I thought you use the (?) in the third part of
    >> your expression to find only one (_). I'm sorry I just can't believe
    >> I was so inattentive.
    >>
    >> Regarding backtracking. I understand now the difference between the
    >> two expressions, and I believe what you are saying, but I did a small
    >> quick test of how our expressions work for short and very long
    >> strings in reality and it looks like there is no big difference.
    >>
    >> Here's the result of the test
    >>
    >> ---------------------------------------------------------------------
    >> -
    >> ------
    >> Regular expression benchmark
    >> ----------------------------
    >> Regular expressions : 2
    >> Test strings : 3
    >> Iterations : 10000
    >> Total regex calls : (10000 * 3 * 2) = 60000
    >> RE: ^([a-zA-Z0-9]+_?|_[a-zA-Z0-9]+|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$
    >> MS MAX AVG MIN DEV INPUT
    >> 61 16,759 0,0061 0,0031 0,1768 'aaaaa#'
    >> 101 16,778 0,0101 0,0067 0,1771 'some_realusername'
    >> 178 16,786 0,0178 0,0137 0,178
    >> 'verylongusernamestringwithnumbersandlet
    >> tersandwithoutunderscoresymbolbutveryveryverylong1234verylong'
    >> RE: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    >> MS MAX AVG MIN DEV INPUT
    >> 51 3,251 0,0051 0,0039 0,038 'aaaaa#'
    >> 82 3,255 0,0082 0,007 0,038 'some_realusername'
    >> 161 3,263 0,0161 0,0142 0,047
    >> 'verylongusernamestringwithnumbersandlet
    >> tersandwithoutunderscoresymbolbutveryveryverylong1234verylong'
    >> ---------------------------------------------------------------------
    >> -
    >> ------
    >> As you can see sometimes my expression is quicker :)
    >>
    >> The code of the benchmark tool is here
    >> http://www.codinghorror.com/files/code/regexbenchmark.zip
    >> I compiled it in .NET2 and ran without debugger. If I run the
    >> benchmark 4-5 times very often I see that my expression has the best
    >> reported score, even for a long strings. So, I would say that
    >> ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
    >>
    >> is better, because it's shorter
    >>

    > Yours is probably faster I believe that, less alternatives for correct
    > strings. :) but have you tried a very long incorrect input? Something
    > like (the longer the better)
    >
    > aaaaaaaaaaaavvvvvbbbbbbbbbbbbbbbbbbbbbbbeeeeeeeeeeeeeeeeeeeAAAAAAAAAAA
    > AAAA666666666666666666666666666666666666666666666&
    >
    > If the textbox in question is limited to say 16 characters you'd
    > probably never trigger the problem, but a hacker could just bypass the
    > length of the textbox and supply a very very long string and if he'd
    > send such strings in quick succession he'd essentially cause a denial
    > of service.
    >
    > There are some optimizations that are possible for thsi regex if
    > you'd want to try them. The enige will try the first option first, if
    > that fails it will try the second. Keeping that in mind, the order of
    > expressions is of importance. Say that you usually have no '_' in
    > usernames, but if they're there you mostly have them somewhere in
    > between. Then you'd have to change the order to improve performance.
    > ^([a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?|[a-zA-Z0-9]+_|_[a-zA-Z0-9]+)$
    >
    > Also my expression allows an underscore both at the start, the end and
    > in
    > between. Yours only covers the last option. I would probably rewrite
    > my expression
    > to this to make it do the same as your expression. It is almost as
    > short,
    > but does not allow excessive backtracking.
    > ^[a-zA-Z0-9]+(_[a-zA-Z0-9]+)?$
    > And then there is the option to handle upper and lower case characters
    > in the character groups, or in a regexoption. I'm actually not sure
    > which is faster, but it's worth a try... Wrapping the whole expression
    > with (?i: ... ) should do the trick. all [a-zA-Z] can then be replaced
    > with just [a-z] ^(?i:....)$
    >
    > And finally to suppress grouping & capturing we can improve
    > performance by replacing the normal ( ... ) with (?: ... ).
    >
    > To optimize your expression further you could completely remove the (
    > ... ) they have no use.
    >
    > So a fair showdown of expressions would be:
    > Yours: ^[a-zA-Z0-9]+_?[a-zA-Z0-9]+$
    > Mine: ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$
    > Or with case insensitivity:
    > Yours: ^(?i:[a-z0-9]+_?[a-z0-9]+)$
    > Mine: ^(?i:[a-z0-9]+(?:_[a-z0-9]+)?)$
    > I ran it using the same tool you used on my Opteron 185 (2x2.6Ghz)
    > under Windows Vista x64. I made a small change to the code of the
    > benchmark though, the original benchmark takes the Compile hit in the
    > results. I did one call using the specific expression outside of the
    > loop so that the results can be compared more easily.
    >
    > Input's used:
    > 0
    > "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    > 1 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", _
    >
    > 2 "aaaa", _
    >
    > 3
    > "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%",
    >
    > _
    >
    > 4 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%", _
    >
    > 5 "aaaa%", _
    >
    > 6
    > "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    >
    > _
    >
    > 7 "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaa", _
    >
    > 8 "aa_aa", _
    >
    > 9
    > "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%",
    >
    > _
    >
    > 10 "aaaaaaaaaaaaaaaaaaaa_aaaaaaaaaaaaaaaaaaaaa%", _
    >
    > 11 "aa_aa%" _
    >
    > Results:
    > Regular expression benchmark
    > ----------------------------
    > Regular expressions : 4
    > Test strings : 12
    > Iterations : 10000
    > Total regex calls : (10000 * 12 * 4) = 480000
    > Pass 1, measure every regex as it runs (slower due to timing code)...
    > ------------------------------------------
    > Regular expression library: System.Text.RegularExpressions
    > Total time taken: 58905
    > ------------------------------------------
    > Pass 2, measure total time only..
    > ------------------------------------------
    > Regular expression library: System.Text.RegularExpressions
    > RE: ^[a-zA-Z0-9]+_?[a-zA-Z0-9]+$
    > MS MAX AVG MIN DEV INPUT
    > 110 0,365 0,011 0,0101 0,0073 '0'
    > 145 0,387 0,0145 0,0134 0,0088 '1'
    > 167 0,391 0,0167 0,0154 0,0095 '2'
    > 17804 10,683 1,7804 1,6751 0,3004 '3'
    > 18291 10,756 1,8291 1,7206 0,3042 '4'
    > 18319 10,78 1,8319 1,7231 0,3046 '5'
    > 18429 10,791 1,8429 1,7335 0,3055 '6'
    > 18464 10,794 1,8464 1,7365 0,3058 '7'
    > 18487 10,797 1,8487 1,7388 0,3059 '8'
    > 18741 10,842 1,8741 1,7625 0,3079 '9'
    > 18891 10,856 1,8891 1,7762 0,3129 '10'
    > 18918 10,88 1,8918 1,7784 0,3135 '11'
    > RE: ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$
    > MS MAX AVG MIN DEV INPUT
    > 107 0,229 0,0107 0,0101 0,0049 '0'
    > 141 0,484 0,0141 0,0134 0,0076 '1'
    > 163 0,562 0,0163 0,0154 0,0085 '2'
    > 512 1,015 0,0512 0,0489 0,0166 '3'
    > 585 1,022 0,0585 0,0559 0,018 '4'
    > 611 1,025 0,0611 0,0584 0,0185 '5'
    > 715 1,035 0,0715 0,0682 0,0204 '6'
    > 749 1,038 0,0749 0,0715 0,0207 '7'
    > 773 1,604 0,0773 0,0735 0,0258 '8'
    > 1010 1,634 0,101 0,0961 0,0299 '9'
    > 1072 1,64 0,1072 0,102 0,0304 '10'
    > 1097 1,643 0,1097 0,1045 0,0305 '11'
    > RE: ^(?i:[a-z0-9]+_?[a-z0-9]+)$
    > MS MAX AVG MIN DEV INPUT
    > 241 2,098 0,0241 0,0204 0,0231 '0'
    > 299 2,108 0,0299 0,0254 0,0241 '1'
    > 325 2,11 0,0325 0,0279 0,0242 '2'
    > 34692 17,108 3,4692 3,0303 0,4154 '3'
    > 35653 17,191 3,5653 3,113 0,4274 '4'
    > 35694 17,195 3,5694 3,1163 0,4284 '5'
    > 35932 17,215 3,5932 3,137 0,4298 '6'
    > 35990 17,22 3,599 3,1417 0,4304 '7'
    > 36018 17,223 3,6018 3,1443 0,4305 '8'
    > 36505 17,29 3,6505 3,1867 0,4337 '9'
    > 36782 17,493 3,6782 3,2107 0,438 '10'
    > 36815 17,496 3,6815 3,2138 0,4387 '11'
    > RE: ^(?i:[a-z0-9]+(?:_[a-z0-9]+)?)$
    > MS MAX AVG MIN DEV INPUT
    > 236 1,798 0,0236 0,0201 0,0195 '0'
    > 291 1,811 0,0291 0,0251 0,0201 '1'
    > 320 4,578 0,032 0,0274 0,0497 '2'
    > 946 4,635 0,0946 0,0827 0,0546 '3'
    > 1067 4,645 0,1067 0,093 0,0611 '4'
    > 1097 4,648 0,1097 0,0958 0,0613 '5'
    > 1333 4,669 0,1333 0,1162 0,0631 '6'
    > 1388 4,674 0,1388 0,1212 0,0633 '7'
    > 1415 4,677 0,1415 0,1235 0,0634 '8'
    > 1791 4,711 0,1791 0,1576 0,0654 '9'
    > 1890 4,989 0,189 0,1659 0,0814 '10'
    > 1920 4,999 0,192 0,1687 0,0816 '11'
    > Total time taken: 58088
    > ------------------------------------------
    > Press ENTER to continue...
    > I wouldn't have thought that case insensitivity would make such a big
    > difference. Learned something again today :).
    >
    > My expression, without the case insensitivity is clearly the fastest:
    > RE: ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$
    >
    > So I'd say this one is better, as it's only 3 characters longer than
    > yours, but much faster and without a possible DoS problem.
    >
    > But as the original post didn't say that the underscore could only be
    > between alphanumeric characters. It said that there was one '_'
    > allowed in teh string (read anywhere in the string) my first
    > expression best covers the problem.
    > ^(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?|[a-zA-Z0-9]+_|_[a-zA-Z0-9]+)$
    >
    > Jesse
    >




    I tried my original expression based on the optimizations I did above and
    these are the results (and tried another variant which seems to be even better):

    Regular expression benchmark
    ----------------------------
    Regular expressions : 2
    Test strings : 12
    Iterations : 10000
    Total regex calls : (10000 * 12 * 2) = 240000

    Pass 1, measure every regex as it runs (slower due to timing code)...
    ------------------------------------------
    Regular expression library: System.Text.RegularExpressions

    Total time taken: 2601
    ------------------------------------------
    Pass 2, measure total time only..
    ------------------------------------------
    Regular expression library: System.Text.RegularExpressions

    RE: ^(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?|[a-zA-Z0-9]+_|_[a-zA-Z0-9]+)$
    MS MAX AVG MIN DEV INPUT
    107 0,407 0,0107 0,0101 0,0072 '0'
    142 0,412 0,0142 0,0134 0,0083 '1'
    163 0,523 0,0163 0,0154 0,0093 '2'
    676 4,552 0,0676 0,0634 0,0553 '3'
    777 4,567 0,0777 0,0726 0,0634 '4'
    805 4,569 0,0805 0,0754 0,0635 '5'
    925 15,511 0,0925 0,0852 0,167 '6'
    960 15,527 0,096 0,0886 0,1673 '7'
    982 15,529 0,0982 0,0905 0,1673 '8'
    1245 15,554 0,1245 0,1154 0,1682 '9'
    1320 15,562 0,132 0,1224 0,1684 '10'
    1347 15,565 0,1347 0,1249 0,1684 '11'
    RE: ^(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]*)?|_[a-zA-Z0-9]+)$
    MS MAX AVG MIN DEV INPUT
    112 3,757 0,0112 0,0101 0,0402 '0'
    148 3,765 0,0148 0,0134 0,0412 '1'
    169 3,767 0,0169 0,0154 0,0412 '2'
    525 3,801 0,0525 0,0492 0,0476 '3'
    599 3,809 0,0599 0,0562 0,0478 '4'
    626 3,811 0,0626 0,0584 0,0494 '5'
    737 4,931 0,0737 0,0687 0,0707 '6'
    773 4,938 0,0773 0,0718 0,0714 '7'
    796 4,941 0,0796 0,074 0,0715 '8'
    1041 4,965 0,1041 0,0972 0,0735 '9'
    1105 4,971 0,1105 0,1031 0,0741 '10'
    1131 4,974 0,1131 0,1056 0,0744 '11'
    Total time taken: 2037
    ------------------------------------------
    Press ENTER to continue...

    It cannot rival the simpler expression for speed, but it's coming close.

    Appart from all this, my guess is that a simple string function would be
    even faster (and not even that hard to read) (C#). But if you want to have
    this clientside as well, you'll need to write a similar function in Javascript
    and maintain that too.

    public bool IsCorrectName(string input)
    {
    bool underscoreFound = false;
    foreach (char c in string input)
    {
    if (c >= 'A' && c<='Z'){ continue };
    if (c >= 'a' && c<='z'){ continue };
    if (c >= '0' && c<='9'){ continue };
    if (c == '_' && !underscoreFound)
    {
    underscoreFound = true;
    continue;
    }
    return false;
    }
    return true;
    }

    I haven't timed it. But I think there will be no contest here.

    Jesse
    Jesse Houwing, Aug 2, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    698
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,622
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    596
  4. Replies:
    3
    Views:
    756
    Reedick, Andrew
    Jul 1, 2008
  5. Clive.Bradley
    Replies:
    4
    Views:
    148
    Clive Bradley
    Jul 29, 2004
Loading...

Share This Page