writing get_script()

F

Frank Seitz

Franken said:
Somehow I'd like to use perl's pattern matching for input and then use C to
insert the data into a binary tree.

Thanks for your comment.

What a strange approach. Why do you think that you need Perl to
process this simple-structured input data?

Frank
 
J

Jürgen Exner

Franken Sense said:
How do I write a get_script() for data that look like these:

44:004:037 Having land, sold it, and brought the money, and laid it at
the apostles' feet.

44:005:001 But a certain man named Ananias, with Sapphira his wife, sold
a possession,

44:005:002 And kept back part of the price, his wife also being privy to
it, and brought a certain part, and laid it at the apostles'
feet.

Here's what I'm looking to do:

1) Have the numbers populate $book, $chp, $verse.

split() the line at the first space character, then split() the first
part again into ($book, $chp, $verse).
2) Remove the newline when not at the end of a verse.

May I suggest a different structure for your data?
Change the input_record_separator $/ into two consecutive newlines. Then
the whole verse will be read as a single unit, allowing you to
a) extract the numbers without worrying about if you are looking at a
first line or follow-up line within the same verse
b) blindly remove all newlines from within the verse and add whatever
number you like at its end
3) Have get_script() be an external function that is called by main in C.

Can't comment on that.
My first attempts are usually pretty feeble and/or methodologically flawed.
Somehow I'd like to use perl's pattern matching for input and then use C to
insert the data into a binary tree.

Why? You can build binary trees in Perl, no problem.

jue
 
F

Franken Sense

How do I write a get_script() for data that look like these:

44:004:037 Having land, sold it, and brought the money, and laid it at
the apostles' feet.

44:005:001 But a certain man named Ananias, with Sapphira his wife, sold
a possession,

44:005:002 And kept back part of the price, his wife also being privy to
it, and brought a certain part, and laid it at the apostles'
feet.

Here's what I'm looking to do:

1) Have the numbers populate $book, $chp, $verse.

2) Remove the newline when not at the end of a verse.

3) Have get_script() be an external function that is called by main in C.

My first attempts are usually pretty feeble and/or methodologically flawed.
Somehow I'd like to use perl's pattern matching for input and then use C to
insert the data into a binary tree.

Thanks for your comment.
--
Frank

...................... o _______________ _,
` Good Morning! , /\_ _| | .-'_|
`................, _\__`[_______________| _| (_|
] [ \, ][ ][ (_|
 
F

Franken Sense

In Dread Ink, the Grave Hand of Franken Sense Did Inscribe:
My first attempts are usually pretty feeble and/or methodologically flawed.
Somehow I'd like to use perl's pattern matching for input and then use C to
insert the data into a binary tree.

For better or worse, I start where I left off before.

use strict;
use warnings;

# open input file
my $filename = 'text43.txt';
open(my $fh, '<', $filename) or
die "cannot open $filename for reading: $!";

# open output file
my $filename2 = 'outfile10.txt';
open(my $gh, '>', $filename2) or
die "cannot open $filename2 for writing: $!";

# process all lines in input file
while( my $line = <$fh> ) {
chomp($line);
my @s = split /\s+/, $line;

# modify fields
$s[1] =~ s/h//;
$s[2] =~ s/m//;
$s[3] =~ s/s//;
$s[5] =~ s/'//;

# print modified fields
for my $i (0..$#s) {
print "s[$i] = $s[$i]\n";
}

# write modified fields to output file
my $outline = join(' ', @s);
print $gh "$outline\n";
}

# close input and output files
close($gh) or die("Error closing $filename2: $!");
close($fh) or die("Error closing $filename: $!");

__END__

sample output:

44:028:021 And they aid unto him, We neither received letters out of
Judaea concerning thee, neither any of the brethren that came
sewed or pake any harm of thee.

44:028:022 But we deire to hear of thee what thou thinkest: for as
concerning this ect, we know that every where it is spoken
against.

The good news here is that I'm opening and closing files correctly. The
bad news is that these scripts tend to be a few lines long, so I can't
process it line by line and split it up into fields that way.

How do I make a single input record to be what lies between the ##:###:###
?
 
F

Franken Sense

In Dread Ink, the Grave Hand of Frank Seitz Did Inscribe:
What a strange approach. Why do you think that you need Perl to
process this simple-structured input data?

Frank

I frequently find myself wishing I had perl for input. This seems simple
enough for me to solve.

I see you live in Wedel. I used to live in Pinneberg. I miss the
bakeries.
--
Frank

I once asked the most fabulous couple I know, Madonna and Guy Ritchie, how
they kept things fresh despite having been married for almost seven months.
'It's a job, Al,' Guy told me. 'We work at it every day.'
~~ Al Franken,
 
C

Charlton Wilbur

FS> 1) Have the numbers populate $book, $chp, $verse.

FS> 2) Remove the newline when not at the end of a verse.

FS> 3) Have get_script() be an external function that is called by
FS> main in C.

Because of requirement 3, I'd probably do the whole thing in C. The
pain of the parsing is likely to be considerably less than the pain of
integrating C and Perl.

Charlton
 
C

ccc31807

Because of requirement 3, I'd probably do the whole thing in C.  The
pain of the parsing is likely to be considerably less than the pain of
integrating C and Perl.

We don't know what will happen to the data. It's obviously going to be
used for some purpose, and the particular purpose will drive the
method of extracting the data. Maybe the OP would be better off using
Perl for the entire app rather than integrating C and Perl.

As to the newline issue, it strikes me that a 'real' newline is one
that appears by itself, "^\n$", and a 'fake' newline is one that
doesn't, "^.+\n$". I'd append each line to the previous line unless
the first character in the line is \n and then process the next set of
lines.

I'd be interested in the motivation of inserting scripture into a
binary tree. After all, scripture says 'Cursed is anyone that hangs on
a tree.' (quoted in Galatians, I think.)

CC
 
C

Charlton Wilbur

cc> We don't know what will happen to the data.

The OP has informed us that it's going to be processed by C code, so we
do, in fact, know what will happen to the data.

cc> Maybe the OP would be better off using Perl for the entire app
cc> rather than integrating C and Perl.

Maybe he would; but that's a separate question.

Charlton
 
J

Jürgen Exner

Charlton Wilbur said:
cc> We don't know what will happen to the data.

The OP has informed us that it's going to be processed by C code, so we
do, in fact, know what will happen to the data.

And even more: "[...]and then use C to
insert the data into a binary tree."
cc> Maybe the OP would be better off using Perl for the entire app
cc> rather than integrating C and Perl.

Maybe he would; but that's a separate question.

True. However there is no reason why not to build a binary tree in Perl.

jue
 
F

Franken Sense

In Dread Ink, the Grave Hand of Jürgen Exner Did Inscribe:
Charlton Wilbur said:
cc> We don't know what will happen to the data.

The OP has informed us that it's going to be processed by C code, so we
do, in fact, know what will happen to the data.

And even more: "[...]and then use C to
insert the data into a binary tree."

The task is motivated by §11-12 in _C Unleashed_, wherein Heathfield uses a
binary tree to remove duplicate lines in a text. That turns out to be
pretty straightforwardto implement, and I was able to do that following the
development in K&R § 6 and the solns in the clc wiki.
True. However there is no reason why not to build a binary tree in Perl.

jue

Is there a module for it?
--
Frank

No Child Left Behind is the most ironically named act, piece of legislation
since the 1942 Japanese Family Leave Act.
~~ Al Franken, in response to the 2004 SOTU address
 
U

Uri Guttman

FS> In Dread Ink, the Grave Hand of Ben Morrow Did Inscribe:
FS> Thanks, Ben, so far so good:

FS> use strict;
FS> use warnings;

FS> # process all lines in input file
FS> $/ = "";
FS> while( my $line = <$fh> ) {
FS> chomp($line);

chomp removes what is in $/ and you set it to '' (paragraph mode) so it
becomes a no-op here.

FS> my @s = split /\s+/, $line;

don't use single letter variable names. choosing good names are very
important for maintainable code. even in throwaway scripts like this

FS> # modify fields
FS> $s[1] =~ s/1/9/;
FS> $s[9] =~ s/m/xxx/;


FS> # print modified fields
FS> for my $i (0..$#s) {

why loop over the indices when you only print each element?

FS> print "s[$i] = $s[$i]\n";

print "s[$i] = $_\n" for @s ;

FS> # write modified fields to output file
FS> my $outline = join(' ', @s);
FS> print $gh "$outline\n";

arrays interpolated into strings are separated by space. so you can do
this:

print $gh "$s\n";

uri
 
J

Jürgen Exner

Franken Sense said:
In Dread Ink, the Grave Hand of Jürgen Exner Did Inscribe:
And even more: "[...]and then use C to
insert the data into a binary tree."

The task is motivated by §11-12 in _C Unleashed_, wherein Heathfield uses a
binary tree to remove duplicate lines in a text.

Oh, well, aehmmmm, unless you want to do that as a learning excercise
there is a _MUCH_ better method in Perl, see 'perldoc -q duplicate':
"How can I remove duplicate elements from a list or array?"
in particular the very last sentence.
That turns out to be
pretty straightforwardto implement, and I was able to do that following the
development in K&R § 6 and the solns in the clc wiki.

In Perl you just put each line into a hash (as keys) and the semantics
of a hash will automatically eliminate all duplicates.
Is there a module for it?

A quick search for 'Binary Tree" on CPAN returns several hundred
results, the very first on being "Tree::Binary" with many more
interesting modules on the same and the next page.

jue
 
F

Franken Sense

In Dread Ink, the Grave Hand of Jürgen Exner Did Inscribe:
May I suggest a different structure for your data?
Change the input_record_separator $/ into two consecutive newlines. Then
the whole verse will be read as a single unit, allowing you to
a) extract the numbers without worrying about if you are looking at a
first line or follow-up line within the same verse
b) blindly remove all newlines from within the verse and add whatever
number you like at its end

$/ The input record separator, newline by default. This influences
Perl's idea of what a "line" is. Works like awk's RS variable,
including treating empty lines as a terminator if set to the
null string. (An empty line cannot contain any spaces or tabs.)
You may set it to a multi-character string to match a
multi-character terminator, or to "undef" to read through the
end of file. Setting it to "\n\n" means something slightly
different than setting to "", if the file contains consecutive
empty lines. Setting to "" will treat two or more consecutive
empty lines as a single empty line. Setting to "\n\n" will
blindly assume that the next input character belongs to the
next
paragraph, even if it's a newline. (Mnemonic: / delimits line
boundaries when quoting poetry.)

local $/; # enable "slurp" mode
local $_ = <FH>; # whole file now here
s/\n[ \t]+/ /g;

Remember: the value of $/ is a string, not a regex. awk has to
be better for something. :)

Setting $/ to a reference to an integer, scalar containing an
integer, or scalar that's convertible to an integer will
attempt
to read records instead of lines, with the maximum record size
being the referenced integer. So this:

local $/ = \32768; # or \"32768", or \$var_containing_32768
open my $fh, $myfile or die $!;
local $_ = <$fh>;

will read a record of no more than 32768 bytes from FILE. If
you're not reading from a record-oriented file (or your OS
doesn't have record-oriented files), then you'll likely get a
full chunk of data with every read. If a record is larger than
the record size you've set, you'll get the record back in
pieces.

On VMS, record reads are done with the equivalent of "sysread",
so it's best not to mix record and non-record reads on the same
file. (This is unlikely to be a problem, because any file you'd
want to read in record mode is probably unusable in line mode.)
Non-VMS systems do normal I/O, so it's safe to mix record and
non-record reads of a file.

See also "Newlines" in perlport. Also see $..

This seems to be exactly what I need.
Why? You can build binary trees in Perl, no problem.

jue

I'd like to give it a try.

--
Frank

Drug war, well, as Rush Limbaugh said, anyone who uses drugs illegally
should be prosecuted and put away. I don't agree with him; I think they
should be treated, but that's what Rush believes and so, you know, we're
praying for Rush because he's in recovery and you take responsibilities for
your actions so I'm sure any day now Rush will demand to be put away for
the maximum sentence and ask for the most dangerous prison and we'll be
praying for maybe an African American cellmate who saw the Donovan McNabb
comments on ESPN. So we're prayin'.
~~ Al Franken, Book TV, on Rush Limbaugh's illegal drug arrest and racist
remarks
 
F

Franken Sense

In Dread Ink, the Grave Hand of Ben Morrow Did Inscribe:

Thanks, Ben, so far so good:

use strict;
use warnings;

# open input file
my $filename = 'text43.txt';
open(my $fh, '<', $filename) or
die "cannot open $filename for reading: $!";

# open output file
my $filename2 = 'outfile10.txt';
open(my $gh, '>', $filename2) or
die "cannot open $filename2 for writing: $!";

# process all lines in input file
$/ = "";
while( my $line = <$fh> ) {
chomp($line);
my @s = split /\s+/, $line;

# modify fields
$s[1] =~ s/1/9/;
$s[9] =~ s/m/xxx/;


# print modified fields
for my $i (0..$#s) {
print "s[$i] = $s[$i]\n";
}

# write modified fields to output file
my $outline = join(' ', @s);
print $gh "$outline\n";
}

# close input and output files
close($gh) or die("Error closing $filename2: $!");
close($fh) or die("Error closing $filename: $!");

__END__

# perl m2.pl

abridged output:

s[0] = 44:028:022
s[1] = But
s[2] = we
s[3] = desire
s[4] = to
s[5] = hear
s[6] = of
s[7] = thee
s[8] = what
s[9] = thou
s[10] = thinkest:
s[11] = for
s[12] = as
s[13] = concerning
s[14] = this
s[15] = sect,
s[16] = we
s[17] = know
s[18] = that
s[19] = every
s[20] = where
s[21] = it
s[22] = is
s[23] = spoken
s[24] = against.
s[0] = 44:028:023
s[1] = And
s[2] = when
s[3] = they
s[4] = had
s[5] = appointed
s[6] = him
s[7] = a
s[8] = day,
s[9] = there
s[10] = came
s[11] = many
s[12] = to
s[13] = him
s[14] = into
s[15] = his
s[16] = lodging;
s[17] = to
s[18] = whom
s[19] = he
s[20] = expounded
s[21] = and
s[22] = testified
s[23] = the
s[24] = kingdom
s[25] = of
s[26] = God,
s[27] = persuading
s[28] = them
s[29] = concerning
s[30] = Jesus,
s[31] = both
s[32] = out
s[33] = of
s[34] = the
s[35] = law
s[36] = of
s[37] = Moses,
s[38] = and
s[39] = out
s[40] = of
s[41] = the
s[42] = prophets,
s[43] = from
s[44] = morning
s[45] = till
s[46] = evening.

C:\MinGW\source>

question: Can you tell me exactly why this
$/ = "";
gets rid of the newlines and s[0] is the chapter and verse?
 
J

John W. Krahn

Uri said:
FS> In Dread Ink, the Grave Hand of Ben Morrow Did Inscribe:

FS> Thanks, Ben, so far so good:

FS> use strict;
FS> use warnings;

FS> # process all lines in input file
FS> $/ = "";
FS> while( my $line = <$fh> ) {
FS> chomp($line);

chomp removes what is in $/ and you set it to '' (paragraph mode) so it
becomes a no-op here.

Not a no-op:

$ perl -e'
my $x = "one\n\ntwo\n\nthree\n\nfour\n";
open my $FH, "<", \$x or die "\$x: $!";
$/ = "";
while ( <$FH> ) {
print "Length = ", length;
my $y = chomp;
print " and now Length = ", length, " and chomp returned: $y\n";
}
'
Length = 5 and now Length = 3 and chomp returned: 2
Length = 5 and now Length = 3 and chomp returned: 2
Length = 7 and now Length = 5 and chomp returned: 2
Length = 5 and now Length = 4 and chomp returned: 1





John
 
T

Tad J McClellan

FS> $/ = "";


You should get in the habit of localizing changes to global variables.

FS> while( my $line = <$fh> ) {
FS> chomp($line);

chomp removes what is in $/ and you set it to '' (paragraph mode) so it
becomes a no-op here.


No, it removes the record separator.

When the record separator is "one or more blank lines" (ie. para mode),
then chomp() removes "one or more blank lines".


--------------------------
#!/usr/bin/perl
use warnings;
use strict;

local $/ = '';
while ( <DATA> ) {
print "(($_))\n";
chomp;
print "[[$_]]\n";
}

__DATA__
paragraphs can
span many
lines



or they can be all on one line
 
U

Uri Guttman

JWK> Not a no-op:

you are correct and it is even documented:

When in paragraph mode ("$/ = """), it removes all trailing
newlines from the string. When in slurp mode ("$/ = undef") or
fixed-length record mode ($/ is a reference to an integer or the
like, see perlvar) chomp() won't remove anything.

of course it should be mentioned in the docs for $/ but it isn't. this
is action at a distance where a special global var affects a
function. so the docs for the global var should mention that. chomp
isn't mentioned anywhere in perlvar. i would call this a doc bug.

also the above docs show another reason to use single quotes for fixed
strings. the part after paragraph mode is painful to look at.

uri
 
U

Uri Guttman

FS> In Dread Ink, the Grave Hand of Uri Guttman Did Inscribe:
FS> for my $i (0..$#s) {

FS> I think I can answer your question if you can tell me why this is
FS> giving me numbers instead of words:

FS> my $outline = join(' ', (1..$#s));

i don't see any word data in that code. what do you think 1 .. $#s will
do? do you know what $#s does? (these are for you to answer). how would
you think that data has anything to do with the word data you have in
@s?

uri
 
U

Uri Guttman

FS> It would seem that both
FS> local $/ = "";
FS> and
FS> local $/ = '';
FS> induce paragraph mode.

and why would you think otherwise. what is the difference in data
between '' and "" ?

FS> P. 492 of the camel book speaks to this somewhat with the -O switch, which
FS> specifies the record separator as an octal number. I would have thought

that is -0, not -O. please be careful of typos.

uri
 
F

Franken Sense

In Dread Ink, the Grave Hand of Uri Guttman Did Inscribe:
FS> # print modified fields
FS> for my $i (0..$#s) {

why loop over the indices when you only print each element?

One thing I always have trouble with when I start up again with perl is
control structures.

I think I can answer your question if you can tell me why this is giving me
numbers instead of words:

#!/usr/bin/perl
# perl m5.pl
use warnings;
use strict;

local $/="";
while ( <DATA> ) {
my @s = split /\s+/, $_;

# print fields
print $s[0];

my $outline = join(' ', (1..$#s));
print "$outline\n";
}

__DATA__
44:005:017 Then the high priest rose up, and all they that were with him,
(which is the sect of the Sadducees,) and were filled with
indignation,

44:005:018 And laid their hands on the apostles, and put them in the
common prison.

44:005:019 But the angel of the Lord by night opened the prison doors,
and brought them forth, and said,

44:005:020 Go, stand and speak in the temple to the people all the words
of this life.


C:\MinGW\source>perl m5.pl
44:005:0171 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
44:005:0181 2 3 4 5 6 7 8 9 10 11 12 13 14
44:005:0191 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
44:005:0201 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

C:\MinGW\source>
--
Frank

[Newt Gingrich] is the most unpopular politician in America. His favorable
rating is only four points higher than the Unabomber.
~~ Al Franken, 1996
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top