writing get_script()

P

Phil Carmody

James Kuyper said:
That 0 is treated as false is a convention established by the
standard, in it's definition of the behavior of the if() statement and
the ?: operator, and in it's description of the value of the
relational operators, and it it's description of the "true" and
"false" macros defined by <stdbool.h>. That strikes me as a pretty
well-established convention.


I'm sorry, but I can't quite see where in Malcolm's post
he says that the convention isn't a well-established one,
perhaps you could point to the exact words where he says
that?

Phil
 
J

James Kuyper

Phil said:
I'm sorry, but I can't quite see where in Malcolm's post
he says that the convention isn't a well-established one,
perhaps you could point to the exact words where he says
that?

I choose the wording badly in a way that put the emphasis in the wrong
place. My real point wasn't actually that it was well-established, but
that it was established by the same document that specifies the meaning
and value of 0==1, and in words no less authoritative. In fact, the
behavior of if(0==1) is defined only in terms of the fact that 0==1 has
a value of 0 if the relation is false, and that the behavior of
if(condition) depends solely upon whether the condition compares equal to 0.
 
F

Franken Sense

In Dread Ink, the Grave Hand of (e-mail address removed) Did
Inscribe:

[OT]
1. write it in perl

#!/usr/bin/perl
# perl m12.pl
use warnings;
use strict;

# open input file
my $filename = 'text43.txt';
open(my $fh, '<', $filename) or
die "cannot open $filename for reading: $!";

# open output file
my $filename2 = 'outfile15.txt';
open(my $gh, '>', $filename2) or
die "cannot open $filename2 for writing: $!";

local $/="";

while ( <$fh> ) {
my @s = split /\s+/, $_;
my $verse = $s[0];
my $script = join(' ', @s[1..$#s]);
print $gh "$verse $script\n";
}

# close input and output files
close($gh) or die("Error closing $filename2: $!");
close($fh) or die("Error closing $filename: $!");

#abridged output

44:003:004 And Peter, fastening his eyes upon him with John, said, Look on
us.
44:003:005 And he gave heed unto them, expecting to receive something of
them.
44:003:006 Then Peter said, Silver and gold have I none; but such as I have
give I thee: In the name of Jesus Christ of Nazareth rise up and walk.
44:003:007 And he took him by the right hand, and lifted him up: and
immediately his feet and ankle bones received strength.
44:003:008 And he leaping up stood, and walked, and entered with them into
the temple, walking, and leaping, and praising God.


The critical thing here is the
local $/="";
, which puts perl in paragraph mode.

Splitting with \s+ clears away the newlines with multiplicity one.
Screenshot here: http://lomas-assault.net/usenet/z30.jpg

As far as the content goes, it sounds like Bartlett's Beneficent Balm. I'm
rehabbing my foot and ankle, and it seems to be taking much longer.

How does C know that the newline on my machine is OD OA ?
 
J

jameskuyper

Franken said:
In Dread Ink, the Grave Hand of (e-mail address removed) Did
Inscribe:

[OT]
1. write it in perl

#!/usr/bin/perl
# perl m12.pl
use warnings;
use strict;

# open input file
my $filename = 'text43.txt';
open(my $fh, '<', $filename) or
die "cannot open $filename for reading: $!";

# open output file
my $filename2 = 'outfile15.txt';
open(my $gh, '>', $filename2) or
die "cannot open $filename2 for writing: $!";

local $/="";

while ( <$fh> ) {
my @s = split /\s+/, $_;
my $verse = $s[0];
my $script = join(' ', @s[1..$#s]);
print $gh "$verse $script\n";
}

# close input and output files
close($gh) or die("Error closing $filename2: $!");
close($fh) or die("Error closing $filename: $!");

#abridged output

44:003:004 And Peter, fastening his eyes upon him with John, said, Look on
us.
44:003:005 And he gave heed unto them, expecting to receive something of
them.
44:003:006 Then Peter said, Silver and gold have I none; but such as I have
give I thee: In the name of Jesus Christ of Nazareth rise up and walk.
44:003:007 And he took him by the right hand, and lifted him up: and
immediately his feet and ankle bones received strength.
44:003:008 And he leaping up stood, and walked, and entered with them into
the temple, walking, and leaping, and praising God.


The critical thing here is the
local $/="";
, which puts perl in paragraph mode.

Splitting with \s+ clears away the newlines with multiplicity one.
Screenshot here: http://lomas-assault.net/usenet/z30.jpg

As far as the content goes, it sounds like Bartlett's Beneficent Balm. I'm
rehabbing my foot and ankle, and it seems to be taking much longer.

How does C know that the newline on my machine is OD OA ?

"C" doesn't. The particular C standard library that you're using It
doesn't know either, it assumes it; but the thing which makes is
possible for it to make such an assumption is the fact that it was
built for your machine, and would have to be built differently to work
correctly on a machine with a different convention for indicating the
end of a line. The assumption is built deep inside the standard I/O
library routines.

From the point of view of the standard, 0D 0A represents a newline for
your implementation of C because the implementation defined it that
way (and the implementation is perfectly free to define it
differently). In practice, a new C implementation for an existing
platform will define it's answer to this issue to match whatever
convention is already in common use on that platform for indicating
newlines. An implementation for a brand new platform will often be the
one setting the convention, which can be pretty much whatever the
designers want it to be.
 
F

Franken Sense

In Dread Ink, the Grave Hand of Han from China Did Inscribe:
I agree with Keighley here. It's not that you can't accomplish
what you want to do in C; it's that Perl is a lot better for
the task at hand. Since Franken Sense (whom I believe to be
George, Larry Gates, etc.) has been studying Perl, the
advice is not as useless as it may first appear.

Yours,
Han from China

Independent of pseudonym, I tend to work up the same material in two
different syntaxes. I find the comparison and contrast useful and
effective. It also insulates the autodidact from a certain type of insult,
when you can perform the same task in the sexier, modern dialects like
perl.

This compiles. Why no output?

#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
#include <stdlib.h>


#define my_file "text42.txt"
#define NUMBER 8192


bool get_script(FILE *, char *, int );

int main(void)
{
FILE *fp;
char text[NUMBER];
bool len;

if ((fp = fopen(my_file, "r")) == NULL )
{
fprintf(stderr, "can't open file\n");
exit(EXIT_FAILURE);
}
while((len = get_script(fp, text, NUMBER)) > 0)
{
printf("%s\n", text);
}
fclose(fp);
return 0;
}

// gcc s3.c -Wall -o out

bool get_script(FILE* in, char* result, int size)
{
int ch;
int i = 0;
while (i < size - 1 && (ch = getc(in)) != EOF)
{
result[i++] = ch;
int j = 0;
for (; i + j < size - 1 &&
(isdigit(ch) || ch == ':'); ++j)
{
if (j + 1 == 10)
{
result[i + j] = '\0';
return true;
}
if ((ch = getc(in)) == EOF)
{
result[i + j] = '\0';
return false;
}
result[i + j] = ch;
}
i += j;
}
result = '\0';
return false;
}
 
P

Peter 'Shaggy' Haywood

Sorry for the late reply! Anyhow...

Groovy hepcat Franken Sense was jivin' in comp.lang.c on Thu, 7 May 2009
4:39 pm. It's a cool scene! Dig it.
How do I write a get_script() for data that look like these:

44:004:037 Having land, sold it, and brought the money, and laid it at
the apostles' feet.

44:005:001 But a certain man named Ananias, with Sapphira his wife,
sold
a possession,

44:005:002 And kept back part of the price, his wife also being privy
to
it, and brought a certain part, and laid it at the
apostles' feet.

I don't just want a line, so I can't read until '\n'. If I took it in
a char at a time, how do I write the control so that it stops when I
get to a string of ten digits or colons?

So, each part consists of a group of three numbers delimeted by
colons, followed by one or more lines of text, right? So use that
format to write a simple parser. It isn't really all that hard. You
just need to learn the basics.
First, try a Google search for BNF (or Bachus Naur form). It's a
format for specifying language grammars for parsers. (You may already
have come across BNF.) The following EBNF (extended BNF, search for
s026153_ISO_IEC_14977_1996(E).pdf for the ISO spec.) could be used for
your parser.

script = verse, {verse};
verse = versenum, text;
versenum = decnum, colon-symbol, decnum, colon-symbol, decnum;
text = textline, {textline};

Simple, eh? This says that a "script" is one or more instance of a
"verse"; a "verse" is a "versenum" followed by "text"; "versenum" is
a "decnum" followed by a "colon-symbol", another "decnum", a second
"colon-symbol" and a third "decnum"; and "text" is one or more
instances of a "textline". The symbols decnum, colon-symbol and
textline are terminal symbols (items that can't be broken down into
smaller parts). This small parser will be very easy to create.
But first you need a lexical scanner that can return tokens (terminal
symbols) in the form of decnum (a decimal number), colon-symbol (a
literal ':' character) and textline (a line of text). Blank lines can
simply be ignored by your scanner or treated as normal textlines.
Your scanner needs to know how to recognise each token type. That's
very simple in this case. A decnum begins with a decimal digit
(including 0), and ends just before the first character that is not a
decimal digit. A colon-symbol is simply a single ':'. And a textline is
any sequence of text beginning with any character that is not a decimal
digit or ':', and ends with a '\n'.
Next, your parser needs to call your scanner to get tokens, one by
one. It must verify that the token is of the type expected. The parsing
technique known as "recursive descent parsing" is pretty easy to
understand.
You write a function for each non-terminal symbol, and each gets
tokens from the lexical scanner routine, checks them for validity and
takes some action, including calling other non-terminal handlers,
performing semantic actions, checking for errors and displaying
diagnostic output. Your parser would, perhaps, store the verses in some
easily searchable form. (I'm assuming that's what you want, to search
for verses by number.)
Anyhow, there are texts and tutorials on parsing. Search for Compiler
Construction by Niklaus Wirth. The author has made it available for
free download in PDF format. (Sorry, can't remember the URL.) This will
give you a good tutorial on the subject. You could easily write a
simple parser with the knowledge you'll gain from it. The following
pseudocode may give you some idea:

#include all relevant headers

#define MAX_TOKEN_LEN 100

enum toktype {decnum, colon_symbol, textline, end};
struct token
{
enum toktype type;
union
{
char *text;
int num;
} value;
int line;
};

static struct token lookahead;
static FILE *fp;

int next_token(void)
{
static char buf[MAX_TOKEN_LEN];
int c, n;
static int line = 1;

lookahead.line = line;

/* Skip leading white space. */
while(isspace(c = getc(fp)))
{
if('\n' == c)
line++;
}

switch(c)
{
case EOF:
lookahead.type = end;
break;
case ':'
lookahead.type = colon_symbol;
break;
case '0': case '1': case '2': ... case '9':
lookahead.type = decnum;
n = c - '0';
while(isnum(c = getc(fp)))
n = n * 10 + c - '0';
lookahead.value.num = n;
/* Last character read was not a digit, so ungetc it. */
ungetc(c, fp);
break;
case '\n':
line++;
break; /* Ignore blank lines. */
default:
lookahead.type = textline;
buf[0] = c;
for(n = 1; (buf[n] = c) != '\n'; c = getc(fp))
;
buf[n] = '\0';
lookahead.value.text = buf;
line++;
break;
}

return 0;
}

void print_error(char *txt)
{
fprintf(stderr, "Line %ul: %s\n", (unsigned long)lookahead.line, txt);
}

int parse_text(char **txt)
{
size_t len;

/* A text must begin with a textline. */
if(lookahead.type != textline)
{
print_error("Expected a line of text.");
return errorcode;
}

do
{
len = strlen(lookahead.value.text) + (*txt ? strlen(*txt) : 0) + 1;
*txt = realloc(*txt, len);
strcat(*txt, lookahead.value.text);
next_token();
}while(lookahead.type == textline);

return 0;
}

int parse_versenum(int *a, int *b, int *c)
{
/* A versenum must begin with a decnum. */
if(lookahead.type != decnum)
{
print_error("Expected a decimal number.");
return errorcode;
}
*a = lookahead.value.num;
next_token();

/* Next token must be a colon-symbol. */
if(lookahead.type != colon_symbol)
{
print_error("Expected a ':'.");
return errorcode;
}
next_token();

/* Next token must be a decnum. */
if(lookahead.type != decnum)
{
print_error("Expected a decimal number.");
return errorcode;
}
*b = lookahead.value.num;
next_token();

/* Next token must be a colon-symbol. */
if(lookahead.type != colon_symbol)
{
print_error("Expected a ':'.");
return errorcode;
}
next_token();

/* Next token must be a decnum. */
if(lookahead.type != decnum)
{
print_error("Expected a decimal number.");
return errorcode;
}
*c = lookahead.value.num;
next_token();

return 0;
}

int parse_verse(void)
{
int a, b, c;
char *txt = NULL;

do
{
int status = parse_versenum(&a, &b, &c);
if(0 != status)
return errorcode;

int status = parse_text(&txt);
if(0 != status && EOF != status)
return errorcode;
}while(EOF != status);

store_verse_in_easily_searchable_format(a, b, c, txt);

return 0;
}

int parse_script(void)
{
if(lookahead.type == end)
{
print_error("Expected a decimal number.");
return errorcode;
}

int status = parse_verse();
if(0 != status && EOF != status)
return errorcode;

while(lookahead.type != end)
{
int status = parse_verse();
if(0 != status && EOF != status)
return errorcode;
}

return 0;
}

int main(void)
{
int rtn = 0;

fp = fopen("r", "your_file.txt");

/* Initialise the lookahead. */
next_token();

if(errorcode == parse_script())
rtn = EXIT_FAILURE;

fclose(fp);
return rtn;
}
 
P

Peter 'Shaggy' Haywood

Groovy hepcat Peter 'Shaggy' Haywood was jivin' in comp.lang.c on Mon,
18 May 2009 3:44 pm. It's a cool scene! Dig it.

[Snip.]
/* Skip leading white space. */
while(isspace(c = getc(fp)))
{
if('\n' == c)
line++;
}

switch(c)
{

[Snip.]
case '\n':
line++;
break; /* Ignore blank lines. */

Whoops! That's not quite right. This doesn't do everything required of
it. It fails to read a token and update lookahead. And blank lines are
covered by the white space skipping code above anyhow, so this case is
not necessary.
Sorry about that!
 
B

Ben Bacarisse

Barry Schwarz said:
Groovy hepcat Peter 'Shaggy' Haywood was jivin' in comp.lang.c on Mon,
18 May 2009 3:44 pm. It's a cool scene! Dig it.

[Snip.]
/* Skip leading white space. */
while(isspace(c = getc(fp)))
{
if('\n' == c)

Since isspace considers '\n' to be white space, this if can never be
true.

Did you see a ! that is not there? Looks fine to me.
line++;
}

switch(c)
{
[Snip.]

case '\n':

Nor can this case.

Yup. To get to the switch c must not be '\n';

<snip>
 
B

Barry Schwarz

Barry Schwarz said:
Groovy hepcat Peter 'Shaggy' Haywood was jivin' in comp.lang.c on Mon,
18 May 2009 3:44 pm. It's a cool scene! Dig it.

[Snip.]

/* Skip leading white space. */
while(isspace(c = getc(fp)))
{
if('\n' == c)

Since isspace considers '\n' to be white space, this if can never be
true.

Did you see a ! that is not there? Looks fine to me.

Yep, I got it backwards again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top