Finding non-printable characters using Regular Expressions

M

Michael W. Ryder

As part of a method I am playing with while learning Ruby I need to be
able to determine which characters in a string are non-printable. What
is the "best" method for determining if a character is printable, such
as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this
the best way to do this?
 
A

Alex Young

Michael said:
As part of a method I am playing with while learning Ruby I need to be
able to determine which characters in a string are non-printable. What
is the "best" method for determining if a character is printable, such
as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is this
the best way to do this?
The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"
 
M

Michael W. Ryder

Alex said:
Michael said:
As part of a method I am playing with while learning Ruby I need to be
able to determine which characters in a string are non-printable.
What is the "best" method for determining if a character is printable,
such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is
this the best way to do this?
The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able
to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I
will have to spend some time experimenting and researching to reach the
final step.
 
A

Alex Young

Michael said:
Alex said:
Michael said:
As part of a method I am playing with while learning Ruby I need to
be able to determine which characters in a string are non-printable.
What is the "best" method for determining if a character is
printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is
this the best way to do this?
The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able
to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I
will have to spend some time experimenting and researching to reach the
final step.

You're nearly there. Look a little closer at my suggestion,
particularly the second regex.
 
S

Suraj Kurapati

Michael said:
"A \n B \t C".gsub(/[^[:graph:]]/, '')

I need to keep the spaces and have not been able to figure
out how to include them in the output so that it shows "A B C".

Hint: examine the second parameter of String#gsub
 
M

Michael W. Ryder

Alex said:
Michael said:
Alex said:
Michael W. Ryder wrote:
As part of a method I am playing with while learning Ruby I need to
be able to determine which characters in a string are
non-printable. What is the "best" method for determining if a
character is printable, such as an "A", or unprintable, such as a tab?
While I could create a list of printable characters using ranges is
this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"

This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been able
to figure out how to include them in the output so that it shows "A B C".
Thank you for your assistance, it has given me a starting point and I
will have to spend some time experimenting and researching to reach
the final step.

You're nearly there. Look a little closer at my suggestion,
particularly the second regex.

Thank you very much for your assistance using "A \n B \t
C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was looking
for.
Can you recommend a good reference on regular expressions so I can learn
more?
 
J

John Joyce

THE book on RegEx is "Mastering Regular Expressions" from OReilly.
It is a bit Perl focused in the examples, but the book itself is all
about regular expressions in use.

Alex said:
Michael said:
Alex Young wrote:
Michael W. Ryder wrote:
As part of a method I am playing with while learning Ruby I
need to be able to determine which characters in a string are
non-printable. What is the "best" method for determining if a
character is printable, such as an "A", or unprintable, such as
a tab?
While I could create a list of printable characters using
ranges is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"


This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been
able to figure out how to include them in the output so that it
shows "A B C".
Thank you for your assistance, it has given me a starting point
and I will have to spend some time experimenting and researching
to reach the final step.
You're nearly there. Look a little closer at my suggestion,
particularly the second regex.

Thank you very much for your assistance using "A \n B \t C".gsub(/[^
[:print:]]/, '') gives me "A B C" which is what I was looking for.
Can you recommend a good reference on regular expressions so I can
learn more?
 
M

Michael W. Ryder

John said:
THE book on RegEx is "Mastering Regular Expressions" from OReilly.
It is a bit Perl focused in the examples, but the book itself is all
about regular expressions in use.

I will get a copy of the book as trying to find the information on the
web is very time consuming and hit or miss. Thank you for the suggestion.

Alex said:
Michael W. Ryder wrote:
Alex Young wrote:
Michael W. Ryder wrote:
As part of a method I am playing with while learning Ruby I need
to be able to determine which characters in a string are
non-printable. What is the "best" method for determining if a
character is printable, such as an "A", or unprintable, such as a
tab?
While I could create a list of printable characters using ranges
is this the best way to do this?

The POSIX character classes are for exactly this:

irb(main):001:0> "A \n B \t C".gsub(/[[:graph:]]/, '')
=> " \n \t "
irb(main):002:0> "A \n B \t C".gsub(/[[:print:]]/, '')
=> "\n\t"


This is very close to what I am looking for. If I use
"A \n B \t C".gsub(/[^[:graph:]]/, '')
it returns "ABC", but I need to keep the spaces and have not been
able to figure out how to include them in the output so that it
shows "A B C".
Thank you for your assistance, it has given me a starting point and
I will have to spend some time experimenting and researching to
reach the final step.
You're nearly there. Look a little closer at my suggestion,
particularly the second regex.

Thank you very much for your assistance using "A \n B \t
C".gsub(/[^[:print:]]/, '') gives me "A B C" which is what I was
looking for.
Can you recommend a good reference on regular expressions so I can
learn more?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top