Best way to search for a string which has N% in a character class?

Peng Yu · Mar 2, 2012

Hi,

Suppose that I want to search for a substring which has say 50%
letters are in a letter class say [A-D]. Note that there is some
ambiguity at the two ends of the substring. But other than that, this
problem is well defined.

It seems that this problem can not (or can not easily, please let me
know if there is a way) be formulated in regex. Since perl is strong
in processing string, I think that there might be a good way to search
for such strings in perl. Does anybody have some good way in search
this type of substring?

Regards,
Peng

J. Gleixner · Mar 2, 2012

Hi,

Suppose that I want to search for a substring which has say 50%
letters are in a letter class say [A-D]. Note that there is some
ambiguity at the two ends of the substring. But other than that, this
problem is well defined.

It seems that this problem can not (or can not easily, please let me
know if there is a way) be formulated in regex. Since perl is strong
in processing string, I think that there might be a good way to search
for such strings in perl. Does anybody have some good way in search
this type of substring?

What have you tried?????????????????

Using 'tr' and 'length' would probably help you.

From perldoc perlop:

y/SEARCHLIST/REPLACEMENTLIST/cds
[...]Transliterates all occurrences of the characters found in the
search list with the corresponding character in the replacement list.
It returns the number of characters replaced or deleted.

Using that you can get the number of characters in the class.
e.g. $cnt = tr/[A-D]/[A-D]/;

Using 'length' you can find how many characters are in the string.

perldoc -f length

Divide one by the other, multiply by 100 and you have the percent.

Tim McDaniel · Mar 2, 2012

Suppose that I want to search for a substring which has say 50%
letters are in a letter class say [A-D]. Note that there is some
ambiguity at the two ends of the substring. But other than that,
this problem is well defined.

It seems that this problem can not (or can not easily, please let
me know if there is a way) be formulated in regex. Since perl is
strong in processing string, I think that there might be a good way
to search for such strings in perl. Does anybody have some good way
in search this type of substring?

Click to expand...

What have you tried?????????????????

Using 'tr' and 'length' would probably help you.

From perldoc perlop:

y/SEARCHLIST/REPLACEMENTLIST/cds
[...]Transliterates all occurrences of the characters found in the
search list with the corresponding character in the replacement list.
It returns the number of characters replaced or deleted.

Using that you can get the number of characters in the class.
e.g. $cnt = tr/[A-D]/[A-D]/;

"man perlop" continues

Note that "tr" does not do regular expression character classes
such as "\d" or "[:lower:]". The <tr> operator is not equivalent
to the tr(1) utility. If you want to map strings between
lower/upper cases, see "lc" in perlfunc and "uc" in perlfunc, and
in general consider using the "s" operator if you need regular
expressions.

The expression
tr/[A-D]/[A-D]/;
will translate [ to [ and ] to ], so they will be included in the
count. A-D works because that's a special case in tr. Also,

If the "/d" modifier is used, the REPLACEMENTLIST is always
interpreted exactly as specified. Otherwise, if the
REPLACEMENTLIST is shorter than the SEARCHLIST, the final
character is replicated till it is long enough. If the
REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This
latter is useful for counting characters in a class or for
squashing character sequences in a class.

So if you really want a range of characters like A thru D,
tr/A-D//
works. If you want all digits, or all alphabetics, or some other
character class, you need to use s/// instead.

J. Gleixner · Mar 2, 2012

Suppose that I want to search for a substring which has say 50%
letters are in a letter class say [A-D]. Note that there is some
ambiguity at the two ends of the substring. But other than that,
this problem is well defined.

It seems that this problem can not (or can not easily, please let
me know if there is a way) be formulated in regex. Since perl is
strong in processing string, I think that there might be a good way
to search for such strings in perl. Does anybody have some good way
in search this type of substring?

Click to expand...

What have you tried?????????????????

Using 'tr' and 'length' would probably help you.

Click to expand...

[...]
So if you really want a range of characters like A thru D,
tr/A-D//
works. If you want all digits, or all alphabetics, or some other
character class, you need to use s/// instead.

Thanks for the correction.

Peng Yu · Mar 2, 2012

Hi,

Click to expand...

Suppose that I want to search for a substring which has say 50%
letters are in a letter class say [A-D]. Note that there is some
ambiguity at the two ends of the substring. But other than that, this
problem is well defined.

Click to expand...

It seems that this problem can not (or can not easily, please let me
know if there is a way) be formulated in regex. Since perl is strong
in processing string, I think that there might be a good way to search
for such strings in perl. Does anybody have some good way in search
this type of substring?

Click to expand...

What have you tried?????????????????

Using 'tr' and 'length' would probably help you.

From perldoc perlop:

y/SEARCHLIST/REPLACEMENTLIST/cds
[...]Transliterates all occurrences of the characters found inthe
search list with the corresponding character in the replacement list.
It returns the number of characters replaced or deleted.

Using that you can get the number of characters in the class.
e.g. $cnt = tr/[A-D]/[A-D]/;

Using 'length' you can find how many characters are in the string.

perldoc -f length

Divide one by the other, multiply by 100 and you have the percent.

I don't think that you understand my question.

Suppose that I have a string $str which the concatenation of $str1,
$str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
D] and $str2 have more than 50% of [A-D].

I need to discovered from $str where $str2 starts and ends. I don't
see how tr and length alone can address this question.

sln · Mar 3, 2012

On 03/02/12 10:29, Peng Yu wrote: [snip]
Using 'tr' and 'length' would probably help you.
[snip]

Divide one by the other, multiply by 100 and you have the percent.

Click to expand...

I don't think that you understand my question.

Suppose that I have a string $str which the concatenation of $str1,
$str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
D] and $str2 have more than 50% of [A-D].

I need to discovered from $str where $str2 starts and ends. I don't
see how tr and length alone can address this question.

%50 of what? Without boundry conditions, the type of regex solution
your thinking of is impossible.

The way you state your problem is that [A-D] can exist randomly
in sequence or between [^A-D] characters.

The the only thing you state as known is the total length of random
length strings after cattenation and before the %50 over/under content
of each.

You can slide a regex frame over the final string but ther is not enough
information about boundry conditions to get real information.
There is just more unknowns than there are equations.

For instance,
- if the length of each substring were the same it could be
solved, but this way would not need a regex.
- if the [A-D] were adjacent, still the start/end could not be
determined, only the knowledge that this match of > %50 is in
the substring that needs to be found, but still no begin/end information
about it.

I think it was a nice try though, futile, but nice.

-sln

Measuring a string of text	1	Sep 15, 2022
Calculating a negated character class	2	Jun 18, 2012
Search for a string in another string allowing mismatches	3	Sep 21, 2010
Is there a command or package to search for all instances of asubstring in a string?	4	Nov 30, 2008
Simple string search	16	Nov 1, 2007
Best way to extract content of SVN target/revision for searching?	1	Apr 20, 2010
Best way to search through STL maps?	10	Nov 29, 2009
Is this a good way to learn programming?	2	Sep 16, 2018

Best way to search for a string which has N% in a character class?

Peng Yu

J. Gleixner

Tim McDaniel

J. Gleixner

Peng Yu

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads