perl join on a non printable variable character ?

J

Jack

Hi I am trying to join on decimal 28, or \034 the non printable field
delimiter. I cant get this to work.. any help would be great ..

assume $delimiter is passed in as \034

works:
@temparray = split(/$delimiter/, $_);
@temparray = split(/\034/, $_);
print OUTFILE1 join("\034",$rowcounter,$_)."\n";

this doesnt work :
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join("\$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join($delimiter,$rowcounter,$_)."\n";
print OUTFILE1 join($$delimiter,$rowcounter,$_)."\n";
or any variation of above ...
 
M

Mintcake

Hi I am trying to join on decimal 28, or \034 the non printable field
delimiter. I cant get this to work.. any help would be great ..

assume $delimiter is passed in as \034

works:
@temparray = split(/$delimiter/, $_);
@temparray = split(/\034/, $_);
print OUTFILE1 join("\034",$rowcounter,$_)."\n";

this doesnt work :
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join("\$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join($delimiter,$rowcounter,$_)."\n";
print OUTFILE1 join($$delimiter,$rowcounter,$_)."\n";
or any variation of above ...

So, in the bit that allegedly works, the value of $delimiter (passed
in but we don't see how) is totally ignored. I'd like to see the sub
call and sub declaration.
 
J

Jack

So, in the bit that allegedly works, the value of $delimiter (passed
in but we don't see how) is totally ignored. I'd like to see the sub
call and sub declaration.- Hide quoted text -

- Show quoted text -

Hey does anyone know how do the join on the non printable /034 ?
Thanks in advance
 
J

John W. Krahn

Jack said:
Hi I am trying to join on decimal 28, or \034 the non printable field
delimiter. I cant get this to work.. any help would be great ..

assume $delimiter is passed in as \034

Do you really mean \034? \034 is a reference to a literal number.

$ perl -MData::Dumper -le'$x = \034; print Dumper $x'
$VAR1 = \28;

"\034" is a single character with an ord() value of 28 (034 octal.)

$ perl -MData::Dumper -le'$x = "\034"; print Dumper $x'
$VAR1 = '';

works:
@temparray = split(/$delimiter/, $_);
@temparray = split(/\034/, $_);
print OUTFILE1 join("\034",$rowcounter,$_)."\n";

this doesnt work :
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join("\$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join($delimiter,$rowcounter,$_)."\n";
print OUTFILE1 join($$delimiter,$rowcounter,$_)."\n";
or any variation of above ...

Does this work?

print OUTFILE1 "$rowcounter$delimiter$_\n";



John
 
G

Greg Bacon

: Hi I am trying to join on decimal 28, or \034 the non printable field
: delimiter. I cant get this to work.. any help would be great ..
:
: assume $delimiter is passed in as \034
:
: works:
: @temparray = split(/$delimiter/, $_);
: @temparray = split(/\034/, $_);
: print OUTFILE1 join("\034",$rowcounter,$_)."\n";
:
: this doesnt work :
: print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n";
: print OUTFILE1 join("\$delimiter",$rowcounter,$_)."\n";
: print OUTFILE1 join($delimiter,$rowcounter,$_)."\n";
: print OUTFILE1 join($$delimiter,$rowcounter,$_)."\n";
: or any variation of above ...

Be more specific than "doesn't work." Tell us what you expected
and how the result failed your expectation.

The output below looks sensible:

$ cat try
#! /usr/bin/perl

use warnings;
use strict;

my $rowcounter = '$rowcounter';
$_ = '$_';

open my $od, "|-", "od -c" or die "$0: fork: $!";

my $separator = "\034";
print $od join($separator,$rowcounter,$_)."\n";

$ ./try
0000000 $ r o w c o u n t e r 034 $ _ \n
0000017

$ perl -v

This is perl, v5.8.8 [...]

Greg
 
B

Brian McCauley

Do you really mean \034? \034 is a reference to a literal number.
"\034" is a single character with an ord() value of 28 (034 octal.)

Using PSI::ESP I suspect the OP really meant he's doing
$delimiter='\034' which is a literal backslash character followed by a
literal zero character and so on.

This thread is a perfect example of why the posting guidelines suggest
a _minimal_ but _complete_ script to illustrate what you are talking
about.
 
B

Brian McCauley

Hi I am trying to join on decimal 28, or \034 the non printable field
delimiter. I cant get this to work.. any help would be great ..

assume $delimiter is passed in as \034

See responses elsewhere in this thread. Never ask us to assume. Show
us actual code.
works:
@temparray = split(/$delimiter/, $_);
this doesnt work :
print OUTFILE1 join($delimiter,$rowcounter,$_)."\n";

I suggest you change split(/$delimiter/, $_) to split(/\Q$delimiter/,
$_)

Then neither will work! But they would work if you'd put the right
value in $delimiter in the first place.

Actually, come to think of it, neither would have worked in your
original code if, as you claim, delimiter really contained \034 but
I'm guessing it really contained '\034'. What it _should_ contain is
"\034".

Just to re-iterate (because I think I may have been unclear
elsewhere), the following are all valid Perl but are very different
statements.

$delimiter=\034; # What you appear to claim you did
$delimiter='\034'; # What you probably actually did
$delimiter="\034"; # What you should have done

By saying in English "$delimiter is passed in as \034" you are not
telling us what you've actually done.

You could have saved us all a lot of trouble by consulting the posting
guidelines before you post.
 
B

benwbrewster

On a related note - I'm not convinced that \034 is the correct character
to use. The OP mentioned using it as a field separator, and my Unicode
reference lists 034-037 as file, group, record, and unit separators, res-
pectively.

sherm--

ok sorry for not being specific - tried suggestions above and none
worked here is code. my fault for not starting with this - command:
perl e:\add_uniquekeyfield.pl e:\tmp\file1.txt e:\tmp\file2.txt \034
if (@ARGV[0] eq undef) {
$filename="no source filename" ;
} else {
$filename=@ARGV[0];
}

if (@ARGV[1] eq undef) {
$outfilename="no dest filename" ;
} else {
$outfilename=@ARGV[1];
}
if (@ARGV[2] eq undef) {
$delimiter="no dest filename" ;
} else {
$delimiter=@ARGV[2];
}
open(OUTFILE1,">$outfilename")|| die 'ERROR : external table not
found :'.$outfilename."\n";
open(SOURCE,$filename) || die 'ERROR : external table not found :'.
$filename."\n";
while (<SOURCE>) {
chomp; #remove the newline from the line
$rowcounter++;
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n";
@temparray = split(/$delimiter/, $_); # works for life of me cant
figure out why join doesnt work, shoot me now:)
# print OUTFILE1 "$rowcounter$delimiter$_\n";
} #end while
close SOURCE;
close OUTFILE1;
 
J

Jack

On a related note - I'm not convinced that \034 is the correct character
to use. The OP mentioned using it as a field separator, and my Unicode
reference lists 034-037 as file, group, record, and unit separators, res-
pectively.

sherm--

sorry all, tried everything no luck here is the code. output should
be : 1IDTERM or put otherwise-
rowcounter[nonprintabledelimiter]field1[nonprintabledelimiter]field2
which is visible in the text file perl e:\add_uniquekeyfield.pl e:\tmp
\file1.txt e:\tmp\file2.txt \034

if (@ARGV[0] eq undef) {
$filename="no source filename" ;
} else {
$filename=@ARGV[0];
}

if (@ARGV[1] eq undef) {
$outfilename="no dest filename" ;
} else {
$outfilename=@ARGV[1];
}
if (@ARGV[2] eq undef) {
$delimiter="no dest filename" ;
} else {
$delimiter=@ARGV[2];
}

open(OUTFILE1,">$outfilename")|| die 'ERROR : external table not
found :'.$outfilename."\n";
open(SOURCE,$filename) || die 'ERROR : external table not found :'.
$filename."\n";

while (<SOURCE>) {
chomp; #remove the newline from the line
if (length($_) == 1 or length($_) == 0) { next; }; # skip the row
# @temparray = split(/$delimiter/, $_); # works !!
$rowcounter++;
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n"; # doesnt work
shoot me now !
print OUTFILE1 "$rowcounter$delimiter$_\n"; # no luck !
} #end while
# print $#temparray;

close SOURCE;
close OUTFILE1;
close OUTFILE2;
 
J

John W. Krahn

Jack said:
On a related note - I'm not convinced that \034 is the correct character
to use. The OP mentioned using it as a field separator, and my Unicode
reference lists 034-037 as file, group, record, and unit separators, res-
pectively.

sorry all, tried everything no luck here is the code. output should
be : 1IDTERM or put otherwise-
rowcounter[nonprintabledelimiter]field1[nonprintabledelimiter]field2
which is visible in the text file perl e:\add_uniquekeyfield.pl e:\tmp
\file1.txt e:\tmp\file2.txt \034

You should put these two lines at the beginning of your Perl program to let
perl help you find mistakes:

use warnings;
use strict;

if (@ARGV[0] eq undef) {

warnings enabled would have told you that using @ARGV[0] is wrong. Also, you
can't use the value of undef in a comparison like that, you want to use the
defined function instead:

if ( defined $ARGV[0] ) {

Or:

if ( not defined $ARGV[0] ) {

$filename="no source filename" ;
} else {
$filename=@ARGV[0];
}

if (@ARGV[1] eq undef) {
$outfilename="no dest filename" ;
} else {
$outfilename=@ARGV[1];
}
if (@ARGV[2] eq undef) {
$delimiter="no dest filename" ;
} else {
$delimiter=@ARGV[2];
}

You are passing the string '\034' to your program from the command line so
your $delimiter variable will contain the string consisting of the four
characters '\', '0', '3' and '4'.

open(OUTFILE1,">$outfilename")|| die 'ERROR : external table not
found :'.$outfilename."\n";
open(SOURCE,$filename) || die 'ERROR : external table not found :'.
$filename."\n";

You should include the $! variable in the error message so you know *why* it
failed.

while (<SOURCE>) {
chomp; #remove the newline from the line
if (length($_) == 1 or length($_) == 0) { next; }; # skip the row
# @temparray = split(/$delimiter/, $_); # works !!

When used in a regular expression the string '\034' is interpolated by the
string interpolater *and* the regular expression engine so it "works".

$ perl -le' $x = q[\034]; $test = qq[ one \034 two \\034 three ]; print for
split /$x/, $test'
one
two \034 three

$rowcounter++;
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n"; # doesnt work
shoot me now !

When interpolated only by the string interpolater it is just a string.

$ perl -le' $x = q[\034]; print qq[ zero $x one \034 two \\034 three ];'
zero \034 one two \034 three

print OUTFILE1 "$rowcounter$delimiter$_\n"; # no luck !
} #end while
# print $#temparray;

close SOURCE;
close OUTFILE1;
close OUTFILE2;



John
 
M

Mintcake

On a related note - I'm not convinced that \034 is the correct character
to use. The OP mentioned using it as a field separator, and my Unicode
reference lists 034-037 as file, group, record, and unit separators, res-
pectively.

sorry all, tried everything no luck here is the code. output should
be : 1 ID TERM or put otherwise-
rowcounter[nonprintabledelimiter]field1[nonprintabledelimiter]field2
which is visible in the text file perl e:\add_uniquekeyfield.pl e:\tmp
\file1.txt e:\tmp\file2.txt \034

if (@ARGV[0] eq undef) {
$filename="no source filename" ;} else {

$filename=@ARGV[0];

}

if (@ARGV[1] eq undef) {
$outfilename="no dest filename" ;} else {

$outfilename=@ARGV[1];}

if (@ARGV[2] eq undef) {
$delimiter="no dest filename" ;} else {

$delimiter=@ARGV[2];

}

open(OUTFILE1,">$outfilename")|| die 'ERROR : external table not
found :'.$outfilename."\n";
open(SOURCE,$filename) || die 'ERROR : external table not found :'.
$filename."\n";

while (<SOURCE>) {
chomp; #remove the newline from the line
if (length($_) == 1 or length($_) == 0) { next; }; # skip the row
# @temparray = split(/$delimiter/, $_); # works !!
$rowcounter++;
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n"; # doesnt work
shoot me now !
print OUTFILE1 "$rowcounter$delimiter$_\n"; # no luck !} #end while

# print $#temparray;

close SOURCE;
close OUTFILE1;
close OUTFILE2;- Hide quoted text -

- Show quoted text -

At last - you've at least given enough information to indicate the
problem. Although several of the replies posted have more or less
identified the problem.

The three bits of useful information are:
1. 'passed in' means passed in from the command line
2. You're running on Windows (on *nix it would be different)
3. The value on the command line is a plain \034

The result is that $delimeter will be set to a 4 byte character string
comprising the characters '\', '0', '3', '4' rather than a single byte
string with the octal value 34.

You can check this yourself. On the command line type:

E:\perl -le "print $ARGV[0]" \034
\034

I'm not sure what wou'd have to enter on the Windows command line to
make it work. It may not even be possible. I'm not a great lover of
the windows CLI

If you really want the user to be able to specify the octal value of
the delimiter on the command line do something like...

$delimiter = chr oct $ARGV[2];

You can check this out on the command line, e.g.

E:\perl -le "print ord chr oct $ARGV[0]" 34
28

One other observation - did you really want the default value of the
delimiter to be "no dest filename"?
 
R

Ron Bergin

sorry all, tried everything no luck here is the code. output should
be : 1 ID TERM or put otherwise-
rowcounter[nonprintabledelimiter]field1[nonprintabledelimiter]field2
which is visible in the text file perl e:\add_uniquekeyfield.pl e:\tmp
\file1.txt e:\tmp\file2.txt \034

if (@ARGV[0] eq undef) {
$filename="no source filename" ;} else {

$filename=@ARGV[0];

}

if (@ARGV[1] eq undef) {
$outfilename="no dest filename" ;} else {

$outfilename=@ARGV[1];}

if (@ARGV[2] eq undef) {
$delimiter="no dest filename" ;} else {

$delimiter=@ARGV[2];

}

open(OUTFILE1,">$outfilename")|| die 'ERROR : external table not
found :'.$outfilename."\n";
open(SOURCE,$filename) || die 'ERROR : external table not found :'.
$filename."\n";

while (<SOURCE>) {
chomp; #remove the newline from the line
if (length($_) == 1 or length($_) == 0) { next; }; # skip the row
# @temparray = split(/$delimiter/, $_); # works !!
$rowcounter++;
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n"; # doesnt work
shoot me now !
print OUTFILE1 "$rowcounter$delimiter$_\n"; # no luck !} #end while

# print $#temparray;

close SOURCE;
close OUTFILE1;
close OUTFILE2;

Several others have pointed out the main reasons why your script
fails, but I'll point out 1 or 2 other issues.

Your script requires 3 parameters, which you test for and assign a
default value, but "no dest filename" is a poor default to assign to
each param.

Even though 1 or more of the required params are missing, you still
proceed with the script. You really should exit with a usage
statement if any one of the required params are missing. You could do
something as simple as this:

if ( @ARGV != 3 || $ARGV[2] =~ /\D/ ) {
die "USAGE $0 [source file] [dest file] [delimiter]\n";
}

A better option would be to use the Getopt::Long or Getopt::Simple
module, in part, so that you don't need to worry about the order in
which the params are passed.
http://search.cpan.org/~rsavage/Getopt-Simple-1.48/lib/Getopt/Simple.pm
http://search.cpan.org/author/JV/Getopt-Long-2.37/lib/Getopt/Long.pm
 
P

Peter J. Holzer

Jack said:
Hi I am trying to join on decimal 28, or \034 the non printable field
delimiter. I cant get this to work.. any help would be great .. [...]
this doesnt work :
print OUTFILE1 join("$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join("\$delimiter",$rowcounter,$_)."\n";
print OUTFILE1 join($delimiter,$rowcounter,$_)."\n";
print OUTFILE1 join($$delimiter,$rowcounter,$_)."\n";
or any variation of above ...

If you try it on Windows then you must use
binmode OUTFILE1;
before you write some to it.
If you use Linux or Mac then ignore me ;-)

Ignore him anyway. binmode is for *binary* files, or - more specifically
- files which don't have a line structure. The file being written looks
like it does consist of lines, and on Windows one probably wants Windows
line endings, not bare LFs, one should not use binmode here.

hp
 
B

Brian McCauley

You are passing the string '\034' to your program from the command line so
your $delimiter variable will contain the string consisting of the four
characters '\', '0', '3' and '4'.

So, the OP is probably really trying to get around to asking...

FAQ: How do I unescape a string?
 
B

Brian McCauley

So, the OP is probably really trying to get around to asking...

FAQ: How do I unescape a string?

Look you babbling moron, just because the OP is _asking_ a FAQ, it
doesn't mean the answer there is really going to help. The answer in
the FAQ only deals with normal character escapes and shies away from
the general case because it's just too complicated.

Next time, you[1] may like to look what you're pointing people to
before crying RTFFAQ.

The answer given in the FAQ "How can I expand variables in text
strings?" is actually closer.

Be aware, however, that the FAQ answer to "How can I expand variables
in text strings?" is full of half-truths and misdirection. See
numerous previous threads on the subject "How can I expand variables
in text strings?" for details.


[1] Er, "I"
 
B

Brian McCauley

FAQ: How do I unescape a string?

Next time, you[1] may like to look what you're pointing people to
before crying RTFFAQ.

OK, OK! How's about...

s/(?<!\\)(?:\\)*(["\@\$])/\\$1/g; # Prevent interpolation
$_ = eval qq{"$_"};
die $@ if $@;

Now, that would really be the right answer to have in the FAQ.

Unless, that is, anyone can see a way to circumvent the check for
characters that may give access to more than just escapes.

Are there any \X{....} constructs that could be dangerous?
 
B

brian d foy

Be aware, however, that the FAQ answer to "How can I expand variables
in text strings?" is full of half-truths and misdirection. See
numerous previous threads on the subject "How can I expand variables
in text strings?" for details.

I thought we'd fixed the problems you had. Now you're calling me a
liar? What's the problem now?
 
J

Jack

I thought we'd fixed the problems you had. Now you're calling me a
liar? What's the problem now?

Thanks but still dont have an answer to the question - thanks for all
the other pointers, but how do I change my code to join on the
argument I passed in - specifically on the join call if possible.

Thanks in advance, Jack
 
M

Mumia W.

Thanks but still dont have an answer to the question - thanks for all
the other pointers, but how do I change my code to join on the
argument I passed in - specifically on the join call if possible.

Thanks in advance, Jack

The delimiter needs to be converted before use. If you don't mind
cheating, you could do this:

$delimiter = eval("\"$delimiter\"");

(Mumia ducks)

Okay, sorry about that. "Eval" is evil I know. You could also do this:

$delimiter =~ s/^\\//;
$delimiter = chr oct $delimiter;

However, that can't deal with delimiters specified in hex on the command
line.
 
B

Brian McCauley

I thought we'd fixed the problems you had.

No, there were two parts. There was the big issue, on which, I thought
we'd agreed to differ and there were a number of smaller issues about
minor details which you have fixed in the version that'll ship with
5.10.
What's the problem now?

OK this is also partly an issue of tense when we speak of "now". When
I speak of "The Perl FAQ" in the present tense (without further
qualification) I refer to the FAQ as released with the current
x.even.x release of Perl.

http://search.cpan.org/~rgarcia/perl/pod/perlfaq4.pod#How_can_I_expand_variables_in_text_strings?

This is much much better that in used to be in that it fixes the
mistakes in what it was trying to say.

But it still a "half truth" in so far as it doesn't show people the
simple, obvious (and very dangerous) solution using eval(STRING) and a
here-doc.

In my experience, when most people ask come to the FAQ with a question
for which the closest approximation in the FAQ is "How can I expand
variables in text strings?" the question that's actually in their mind
is "How do I evaluate the contents of a scalar variable as if it were
a double-quotish string in Perl source?". The question the FAQ chooses
to answer is "How do I implement string templates in Perl?". In some
sense this is the right question to answer because the reader could
quite possibly be in an X-Y situation.

However since the question that's actually Frequently Asked is "How do
I evaluate the contents of a scalar variable as if it were a double-
quotish string in Perl source?" and since there exists a simple answer
to that question, IMNSHO it's somewhat intellectually dishonest not
to mention that answer.

[ slightly out of the original context, brian asked ... ]
Now you're calling me a liar?

I consider it unhelpful to characterise my opinion that the answer in
5.9.5 remains somewhat intellectually dishonest with emotive terms
like "calling you a liar". But if you choose to see it that way then I
suppose I am.

Some people (maybe even you) have argued that the FAQ should not
include the simple answer because executing data is fraught with
dangers. The problem with this is, as I've said so many times before,
that when people come up with (or otherwise come across) the simple
solution they'll typically:

1) Not figure out the here-doc bit so get an inferior eval-based
solution.
2) Loose trust in the FAQ and/or feel insulted.
3) Not realise just how dangerous the simple eval-based solution can
be.

I stand by the position I've held consistently for many years. The FAQ
should mention, at least by reference, the eval-based templating
solution so that people can see it, be made fully aware of the danger
and make in informed choice whether or not to use it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top