Best way to search for a string which has N% in a character class?

Discussion in 'Perl Misc' started by Peng Yu, Mar 2, 2012.

  1. Peng Yu

    Peng Yu Guest

    Hi,

    Suppose that I want to search for a substring which has say 50%
    letters are in a letter class say [A-D]. Note that there is some
    ambiguity at the two ends of the substring. But other than that, this
    problem is well defined.

    It seems that this problem can not (or can not easily, please let me
    know if there is a way) be formulated in regex. Since perl is strong
    in processing string, I think that there might be a good way to search
    for such strings in perl. Does anybody have some good way in search
    this type of substring?

    Regards,
    Peng
    Peng Yu, Mar 2, 2012
    #1
    1. Advertising

  2. Peng Yu

    J. Gleixner Guest

    On 03/02/12 10:29, Peng Yu wrote:
    > Hi,
    >
    > Suppose that I want to search for a substring which has say 50%
    > letters are in a letter class say [A-D]. Note that there is some
    > ambiguity at the two ends of the substring. But other than that, this
    > problem is well defined.
    >
    > It seems that this problem can not (or can not easily, please let me
    > know if there is a way) be formulated in regex. Since perl is strong
    > in processing string, I think that there might be a good way to search
    > for such strings in perl. Does anybody have some good way in search
    > this type of substring?


    What have you tried?????????????????

    Using 'tr' and 'length' would probably help you.

    From perldoc perlop:

    y/SEARCHLIST/REPLACEMENTLIST/cds
    [...]Transliterates all occurrences of the characters found in the
    search list with the corresponding character in the replacement list.
    It returns the number of characters replaced or deleted.

    Using that you can get the number of characters in the class.
    e.g. $cnt = tr/[A-D]/[A-D]/;

    Using 'length' you can find how many characters are in the string.

    perldoc -f length

    Divide one by the other, multiply by 100 and you have the percent.
    J. Gleixner, Mar 2, 2012
    #2
    1. Advertising

  3. Peng Yu

    Tim McDaniel Guest

    In article <4f510c5c$0$75670$>,
    J. Gleixner <> wrote:
    >On 03/02/12 10:29, Peng Yu wrote:
    >> Suppose that I want to search for a substring which has say 50%
    >> letters are in a letter class say [A-D]. Note that there is some
    >> ambiguity at the two ends of the substring. But other than that,
    >> this problem is well defined.
    >>
    >> It seems that this problem can not (or can not easily, please let
    >> me know if there is a way) be formulated in regex. Since perl is
    >> strong in processing string, I think that there might be a good way
    >> to search for such strings in perl. Does anybody have some good way
    >> in search this type of substring?

    >
    >What have you tried?????????????????
    >
    >Using 'tr' and 'length' would probably help you.
    >
    > From perldoc perlop:
    >
    > y/SEARCHLIST/REPLACEMENTLIST/cds
    > [...]Transliterates all occurrences of the characters found in the
    >search list with the corresponding character in the replacement list.
    >It returns the number of characters replaced or deleted.
    >
    >Using that you can get the number of characters in the class.
    >e.g. $cnt = tr/[A-D]/[A-D]/;


    "man perlop" continues

    Note that "tr" does not do regular expression character classes
    such as "\d" or "[:lower:]". The <tr> operator is not equivalent
    to the tr(1) utility. If you want to map strings between
    lower/upper cases, see "lc" in perlfunc and "uc" in perlfunc, and
    in general consider using the "s" operator if you need regular
    expressions.

    The expression
    tr/[A-D]/[A-D]/;
    will translate [ to [ and ] to ], so they will be included in the
    count. A-D works because that's a special case in tr. Also,

    If the "/d" modifier is used, the REPLACEMENTLIST is always
    interpreted exactly as specified. Otherwise, if the
    REPLACEMENTLIST is shorter than the SEARCHLIST, the final
    character is replicated till it is long enough. If the
    REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This
    latter is useful for counting characters in a class or for
    squashing character sequences in a class.

    So if you really want a range of characters like A thru D,
    tr/A-D//
    works. If you want all digits, or all alphabetics, or some other
    character class, you need to use s/// instead.

    --
    Tim McDaniel,
    Tim McDaniel, Mar 2, 2012
    #3
  4. Peng Yu

    J. Gleixner Guest

    On 03/02/12 13:06, Tim McDaniel wrote:
    > In article<4f510c5c$0$75670$>,
    > J. Gleixner<> wrote:
    >> On 03/02/12 10:29, Peng Yu wrote:
    >>> Suppose that I want to search for a substring which has say 50%
    >>> letters are in a letter class say [A-D]. Note that there is some
    >>> ambiguity at the two ends of the substring. But other than that,
    >>> this problem is well defined.
    >>>
    >>> It seems that this problem can not (or can not easily, please let
    >>> me know if there is a way) be formulated in regex. Since perl is
    >>> strong in processing string, I think that there might be a good way
    >>> to search for such strings in perl. Does anybody have some good way
    >>> in search this type of substring?

    >>
    >> What have you tried?????????????????
    >>
    >> Using 'tr' and 'length' would probably help you.

    [...]
    > So if you really want a range of characters like A thru D,
    > tr/A-D//
    > works. If you want all digits, or all alphabetics, or some other
    > character class, you need to use s/// instead.
    >


    Thanks for the correction.
    J. Gleixner, Mar 2, 2012
    #4
  5. Peng Yu

    Peng Yu Guest

    On Mar 2, 12:07 pm, "J. Gleixner" <>
    wrote:
    > On 03/02/12 10:29, Peng Yu wrote:
    >
    > > Hi,

    >
    > > Suppose that I want to search for a substring which has say 50%
    > > letters are in a letter class say [A-D]. Note that there is some
    > > ambiguity at the two ends of the substring. But other than that, this
    > > problem is well defined.

    >
    > > It seems that this problem can not (or can not easily, please let me
    > > know if there is a way) be formulated in regex. Since perl is strong
    > > in processing string, I think that there might be a good way to search
    > > for such strings in perl. Does anybody have some good way in search
    > > this type of substring?

    >
    > What have you tried?????????????????
    >
    > Using 'tr' and 'length' would probably help you.
    >
    >  From perldoc perlop:
    >
    >   y/SEARCHLIST/REPLACEMENTLIST/cds
    >      [...]Transliterates all occurrences of the characters found inthe
    > search list with the corresponding character in the replacement list.
    > It returns the number of characters replaced or deleted.
    >
    > Using that you can get the number of characters in the class.
    > e.g. $cnt = tr/[A-D]/[A-D]/;
    >
    > Using 'length' you can find how many characters are in the string.
    >
    > perldoc -f length
    >
    > Divide one by the other, multiply by 100 and you have the percent.


    I don't think that you understand my question.

    Suppose that I have a string $str which the concatenation of $str1,
    $str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
    D] and $str2 have more than 50% of [A-D].

    I need to discovered from $str where $str2 starts and ends. I don't
    see how tr and length alone can address this question.
    Peng Yu, Mar 2, 2012
    #5
  6. Peng Yu

    Guest

    On Fri, 2 Mar 2012 12:53:18 -0800 (PST), Peng Yu <> wrote:

    >On Mar 2, 12:07 pm, "J. Gleixner" <>
    >wrote:
    >> On 03/02/12 10:29, Peng Yu wrote:

    [snip]
    >> Using 'tr' and 'length' would probably help you.
    >>

    [snip]
    >>
    >> Divide one by the other, multiply by 100 and you have the percent.

    >
    >I don't think that you understand my question.
    >
    >Suppose that I have a string $str which the concatenation of $str1,
    >$str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
    >D] and $str2 have more than 50% of [A-D].
    >
    >I need to discovered from $str where $str2 starts and ends. I don't
    >see how tr and length alone can address this question.


    %50 of what? Without boundry conditions, the type of regex solution
    your thinking of is impossible.

    The way you state your problem is that [A-D] can exist randomly
    in sequence or between [^A-D] characters.

    The the only thing you state as known is the total length of random
    length strings after cattenation and before the %50 over/under content
    of each.

    You can slide a regex frame over the final string but ther is not enough
    information about boundry conditions to get real information.
    There is just more unknowns than there are equations.

    For instance,
    - if the length of each substring were the same it could be
    solved, but this way would not need a regex.
    - if the [A-D] were adjacent, still the start/end could not be
    determined, only the knowledge that this match of > %50 is in
    the substring that needs to be found, but still no begin/end information
    about it.

    I think it was a nice try though, futile, but nice.

    -sln
    , Mar 3, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brand Bogard

    8 bit character string to 16 bit character string

    Brand Bogard, May 25, 2006, in forum: C Programming
    Replies:
    8
    Views:
    735
    those who know me have no need of my name
    May 28, 2006
  2. Ed
    Replies:
    6
    Views:
    1,271
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=
    Aug 2, 2007
  3. Abby Lee
    Replies:
    5
    Views:
    398
    Abby Lee
    Aug 2, 2004
  4. Eddy Xu
    Replies:
    5
    Views:
    119
    Eddy Xu
    Apr 11, 2008
  5. Bart Vandewoestyne
    Replies:
    8
    Views:
    729
    Bart Vandewoestyne
    Sep 25, 2012
Loading...

Share This Page