Read a binary file until "\name\" is encountered...

S

spike

Im trying to write a program that should
read through a binary file searching for the character sequence "\name\"

Then it should read the characters following the "\name\" sequence
until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation vioalation) signal.

Whats wrong?
And is there a better way than mine to solve this task (most likely)

Code:
----------------------------------------------------------------
int main()
{
FILE *fp;
fp = fopen("demo.dem","rb");

char cTkn;
char sName[50];
int i=0,j=0;
while(!(feof(fp)))
{
if(fread(&cTkn, sizeof(cTkn), 1, fp))
{
if(cTkn == ((char)92)) // if the character '\' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)110)) // if the character 'n' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)97)) // if the character 'a' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)109)) // if the character 'm' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)101)) // if the character 'e' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)92)) // if the character '\' is found
{
// here the complete "\name\" string has been found
while(cTkn != ((char)0))
{
fread(&sName[j], sizeof(cTkn), 1, fp);
j++;
}
}
}
}
}
}
}
}
i++;
}
printf("Read %d characters!\n", i);
printf("Found: %s",sName);
fclose(fp);

return 0;
}
----------------------------------------------------------------
 
L

Leor Zolman

Im trying to write a program that should
read through a binary file searching for the character sequence "\name\"

Then it should read the characters following the "\name\" sequence
until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation vioalation) signal.

Whats wrong?
And is there a better way than mine to solve this task (most likely)

I think so. Here's a version I just threw together:


#include <stdio.h>

#define MAX 50

int main()
{
FILE *fp;
char cTkn, c;
char sName[MAX];
int inTxt = 0, pos;
size_t charCount = 0;

const char *txt = "\\name\\";

fp = fopen("demo.dem","rb");

while ((cTkn = getc(fp)) != EOF)
{
++charCount;
if (inTxt == 0)
{
if (cTkn == txt[0])
{
++inTxt;
}
}
else
{
if (cTkn == txt[inTxt])
{
if (txt[++inTxt] == '\0')
break;
}
else
inTxt = 0;
}
}

pos = 0;
while ((c = getc(fp)) != EOF && pos < MAX && c != '\0')
{
sName[pos++] = c;
}
sName[pos] = '\0';

printf("Read %d characters!\n", charCount);
printf("Found: %s",sName);
fclose(fp);

return 0;
}


Your version has a bunch of problems...
Code:
----------------------------------------------------------------
int main()
{
FILE *fp;
fp = fopen("demo.dem","rb");

Just out of curiosity, are you really using a C99 compiler, or a C++
compiler? If you want to maintain C89 compatibility, put all your
declarations /before/ your other statements within a block.
char cTkn;
char sName[50];
int i=0,j=0;

If you're intending i to count total characters, it isn't...
while(!(feof(fp)))
{
if(fread(&cTkn, sizeof(cTkn), 1, fp))

Note that you're not checking for EOF anywhere except at the top of the
loop and (sort of) up in your first fread.

Also, getc is easier to use to read a single character than what you're
doing (although I guess what you're doing is not technically wrong, except
that not checking for EOF is bad news.)
{
if(cTkn == ((char)92)) // if the character '\' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)110)) // if the character 'n' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)97)) // if the character 'a' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)109)) // if the character 'm' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)101)) // if the character 'e' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
if(cTkn == ((char)92)) // if the character '\' is found
{
// here the complete "\name\" string has been found
while(cTkn != ((char)0))

Here you're going into an infinite loop, because cTkn is never changed
based on the fread below. It isn't the fread that's causing the seg fault,
though; it is your incrementing j past the end of the sName buffer and then
having fread try to write into that location...(well, I guess even just
taking the address has undefined behavior, for the purists)

{
fread(&sName[j], sizeof(cTkn), 1, fp);
j++;
}
}
}
}
}
}
}
}
i++;
}
printf("Read %d characters!\n", i);
printf("Found: %s",sName);
fclose(fp);

return 0;
}
----------------------------------------------------------------

Hope that gets you going.
-leor


Leor Zolman
BD Software
(e-mail address removed)
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
J

James Hu

Im trying to write a program that should
read through a binary file searching for the character sequence "\name\"
Then it should read the characters following the "\name\" sequence
until a NULL character is encountered.

It might be more useful to write a more general function that can search
a binary file for any string. Your program can then call this function,
passing "\\name\\" in as input. Untested code:

/*
* Accept binary input stream and string as input.
* Return true or false depending on whether the string
* is found in the input stream, or not, respectively.
*
* Caller can use ftell() to later find the number
* of bytes scanned.
*/
int bf_textsearch(FILE *fp, char *s)
{
int index;
int len;
int c;

index = 0;
len = strlen(s) - 1;
while ((c = fgetc(fp)) != EOF) {
if (c != s[index]) {
index = 0;
ungetc(c, fp);
continue;
}
if (index == len) {
return 1;
}
index++;
}
return 0;
}
But when my program runs it gets a SIGSEGV (Segmentation vioalation) signal.

Whats wrong?
And is there a better way than mine to solve this task (most likely)

Your code checks for or failure of the first call to fread(), but not
subsequent calls. The subsequent calls may have set the EOF indicator,
leaving your file-position indicator in an indeterminate state, and so
subsequent calls to fread result in undefined behavior.

In your innermost if-clause, your loop on reading the file until you
find a '\0' character. However, you are storing all the characters you
have read into sName, which is an array of 50 char. If the number of
characters in your stream following the found string is larger than 50,
then you are accessing an array outside of its defined bounds, and this
results in undefined behavior. Don't bother storing the result into
the array. Just discard your scanned characters (for instance, by
continuing to re-use the cTkn variable to scan into).
Code:
----------------------------------------------------------------
int main()
{
FILE *fp;
fp = fopen("demo.dem","rb");

char cTkn;
char sName[50];
int i=0,j=0;
while(!(feof(fp)))
{
if(fread(&cTkn, sizeof(cTkn), 1, fp))
{
if(cTkn == ((char)92)) // if the character '\' is found
{
fread(&cTkn, sizeof(cTkn), 1, fp);
[snip]

while(cTkn != ((char)0))
{
fread(&sName[j], sizeof(cTkn), 1, fp);
j++;
}
[snip]

----------------------------------------------------------------

-- James
 
N

nrk

James said:
Im trying to write a program that should
read through a binary file searching for the character sequence "\name\"
Then it should read the characters following the "\name\" sequence
until a NULL character is encountered.

It might be more useful to write a more general function that can search
a binary file for any string. Your program can then call this function,
passing "\\name\\" in as input. Untested code:

/*
* Accept binary input stream and string as input.
* Return true or false depending on whether the string
* is found in the input stream, or not, respectively.
*
* Caller can use ftell() to later find the number
* of bytes scanned.
*/
int bf_textsearch(FILE *fp, char *s)
{
int index;
int len;
int c;

index = 0;
len = strlen(s) - 1;
while ((c = fgetc(fp)) != EOF) {
if (c != s[index]) {
index = 0;
ungetc(c, fp);
Change above 3 lines to:

if ( c != s[index] ) {
index = (c == s[0]);

Small improvement, but probably worthwhile.

-nrk.

<snip>
 
C

CBFalconer

Leor said:
....snip ...

I think so. Here's a version I just threw together:

If the value of "name" is a variable, and the read is from a
stream with no backtracking allowed, this is quite an elegant
problem. This indicates the Knuth-Moyer-Pratt algorithm. I was
originally thinking a state machine to follow through marker, but
eventually realized that involved backtracking in the general
case.

The discussion is OT here, but quite well suited to
comp.programming. Here is an outline I threw together :)

#include <stdio.h>
#include <string.h>

#define MAXLEN 254

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action above.
*/

/* --------------------- */

/* Dummy, to test main */
int binfsrch(const char *marker, char *id)
{
if (strcmp(marker, "\\name\\")) return 0;
else {
strcpy(id, "found it");
return 1;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
char idstring[MAXLEN + 1];

if (2 != argc) {
puts("Usage: binfsrch name < file_to_search");
}
else if (binfsrch(argv[1], idstring)) {
printf("\"%s\" : \"%s\"\n", argv[1], idstring);
}
else {
printf("\"%s\" : not found\n", argv[1]);
}
return 0;
} /* main binsrch */


Cross-posted and fups set.
 
N

nrk

Leor said:
Im trying to write a program that should
read through a binary file searching for the character sequence "\name\"

Then it should read the characters following the "\name\" sequence
until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation vioalation)
signal.

Whats wrong?
And is there a better way than mine to solve this task (most likely)

I think so. Here's a version I just threw together:


#include <stdio.h>

#define MAX 50

int main()
{
FILE *fp;
char cTkn, c;
char sName[MAX];
int inTxt = 0, pos;
size_t charCount = 0;

const char *txt = "\\name\\";

fp = fopen("demo.dem","rb");

while ((cTkn = getc(fp)) != EOF)
{
++charCount;
if (inTxt == 0)
{
if (cTkn == txt[0])
{
++inTxt;
}
}
else
{
if (cTkn == txt[inTxt])
{
if (txt[++inTxt] == '\0')
break;
}
else
inTxt = 0;
}
}

pos = 0;
while ((c = getc(fp)) != EOF && pos < MAX && c != '\0')
{
sName[pos++] = c;
}
sName[pos] = '\0';

printf("Read %d characters!\n", charCount);
printf("Found: %s",sName);
fclose(fp);

return 0;
}

Input contains: "\\name\ nrk<null>"

Output is not as expected.

Reason for failure: Oh, the darn machine did what I said, instead of what I
wanted ;-)
Your version has a bunch of problems...

Your's has only one (barring the lack of error checks). That it doesn't
backtrack one character when it has to.

-nrk.

<snip>
 
L

Leor Zolman

Input contains: "\\name\ nrk<null>"

Output is not as expected.

Reason for failure: Oh, the darn machine did what I said, instead of what I
wanted ;-)


Your's has only one (barring the lack of error checks). That it doesn't
backtrack one character when it has to.

Right. Here's a modified version with a bit more error checking, and the
other fixes, rolled-up:

#include <stdio.h>
#include <stdlib.h>

#define MAX 50

int main()
{
FILE *fp;
char cTkn;
int c;
char sName[MAX];
int inTxt = 0, pos;
size_t charCount = 0;

const char *txt = "\\name\\";
const char delim = '\\';

if ((fp = fopen("demo.dem","rb")) == NULL)
{
printf("Error opening file.\n");
exit(EXIT_FAILURE);
}

while ((cTkn = getc(fp)) != EOF)
{
++charCount;

if (cTkn == txt[inTxt])
{
++inTxt;
if (txt[inTxt] == '\0')
break;
}
else if (cTkn == delim)
{
inTxt = 1;
continue;
}
else
inTxt = 0;
}

pos = 0;
while ((c = getc(fp)) != EOF && pos < MAX - 1 && c != '\0')
{
sName[pos++] = c;
}
sName[pos] = '\0';

printf("Read %d characters!\n", charCount);
printf("Found: %s",sName);
fclose(fp);

return 0;
}

It avoids backtracking--not that there's any penalty for ungetting a char,
but just because I ended up not needing to do that. While I think this
approach will work for any delimiter used in the manner of this example,
none of these simplistic (though evidently not simplistic enough for me to
have gotten it right the first time) solutions would handle the undelimited
general case when arbitrary backtracking would be necessary. I think that
was the point Chuck was making.
-leor


Leor Zolman
BD Software
(e-mail address removed)
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
N

nrk

Leor said:
Right. Here's a modified version with a bit more error checking, and the
other fixes, rolled-up:

Still some problems.
#include <stdio.h>
#include <stdlib.h>

#define MAX 50

int main()
{
FILE *fp;
char cTkn;

That should be an int, not a char.
int c;
char sName[MAX];
int inTxt = 0, pos;
size_t charCount = 0;

const char *txt = "\\name\\";
const char delim = '\\';

if ((fp = fopen("demo.dem","rb")) == NULL)
{
printf("Error opening file.\n");
exit(EXIT_FAILURE);
}

while ((cTkn = getc(fp)) != EOF)
{
++charCount;

if (cTkn == txt[inTxt])
{
++inTxt;
if (txt[inTxt] == '\0')
break;
}
else if (cTkn == delim)
{
inTxt = 1;
continue;
}
else
inTxt = 0;
}

pos = 0;
while ((c = getc(fp)) != EOF && pos < MAX - 1 && c != '\0')

You're calling getc without checking the state of the stream, the first time
around. You might want to see if both feof and ferror return 0, before
entering this loop. Alternately, you can simply check that txt[inTxt] is
0. So, something along the lines of:
if ( txt[inTxt] == 0 ) {
while ((c = getc(fp)) != EOF && pos < MAX - 1 && c != '\0')
{
sName[pos++] = c;
}
sName[pos] = '\0';

printf("Read %d characters!\n", charCount);
printf("Found: %s",sName);
fclose(fp);

return 0;
}

It avoids backtracking--not that there's any penalty for ungetting a char,
but just because I ended up not needing to do that. While I think this
approach will work for any delimiter used in the manner of this example,
none of these simplistic (though evidently not simplistic enough for me to
have gotten it right the first time) solutions would handle the
undelimited
general case when arbitrary backtracking would be necessary. I think that
was the point Chuck was making.

No argument on that count. The OP's particular case is simple enough that
you don't have to resort any sophisticated string matching except for a
simple one-step backtrack (and when it is only one step, turns out, you
don't actually need to backtrack :).

-nrk.
 
L

Leor Zolman

You're calling getc without checking the state of the stream, the first time
around. You might want to see if both feof and ferror return 0, before
entering this loop. Alternately, you can simply check that txt[inTxt] is
0. So, something along the lines of:
if ( txt[inTxt] == 0 ) {
while ((c = getc(fp)) != EOF && pos < MAX - 1 && c != '\0')

[among other things]

Thanks, yeah... having programmed in C for 25 years doesn't really count
for beans when not so much of that coding has had to be of "industrial
strength" quality. I'm finding out the hard way, time and again, how good a
place this group is to help come to grips with what constitutes
bullet-proof C coding practice (or at least, in light of platform
variations, how to get as close as possible to that ideal.)
-leor




Leor Zolman
BD Software
(e-mail address removed)
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
C

CBFalconer

/*
Leor said:
I think so. Here's a version I just threw together:

I finally got around to implementing my KMP algorithm version of
this, which allows straightforward file input with no backtracking
and arbitrary key values. Pick at it.

*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action.
This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id != id[j])) j = next[j];
i++; j++;
next = j;
}
#if (0)
for (i = 0; i < lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id, i, next);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
}

/* --------------------- */

int binfsrch(const char *marker)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

if (!(next = malloc(strlen(marker) * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
lgh = strlen(marker);
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, stdin)) {
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getchar())) putchar(ch);
puts("\"");
if (EOF == ch) break;
}
}
return items;
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
if (2 != argc) puts("Usage: kmpsrch name < file_to_search");
else if (binfsrch(argv[1])) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
return 0;
} /* main kmpsrch */
 
B

Bill

(e-mail address removed) (spike) wrote in message
// here the complete "\name\" string has been found
while(cTkn != ((char)0))
{
fread(&sName[j], sizeof(cTkn), 1, fp);
j++;
}

You never alter cTkn within the while-loop making it and endless loop.
Fix this and your original code should work as expected.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top