sscanf() parameters

H

Hans

Continuation of Create a Copy of Files thread:
http://groups.google.com/group/comp.lang.c/browse_frm/thread/44b471e6733a407a

This is the sscanf() line I use to extract the file extension/type.
(Please ignore the values I used. I copied them from somebody's post,
but do suggest proper values if it's critical.)

char Title[512];
char FileType[16];

Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title ,
FileType );

What I want is to get the file extension so I can skip files with
wrong extensions. If I have the following files in the target folder,
what sscanf() parameters do I need to get the following values for
FileType?

ebook1.pdf - FileType = "pdf"
ebook1.pdf.txt - FileType = "txt"
ebook2.djvu - FileType = "djvu"
ebook2.djvu.txt - FileType = "txt"
ebook3.html.pdf - FileType = "pdf"
 
B

Beej Jorgensen

Hans said:
Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title , FileType );

ebook1.pdf - FileType = "pdf"
ebook1.pdf.txt - FileType = "txt"
ebook2.djvu - FileType = "djvu"
ebook2.djvu.txt - FileType = "txt"
ebook3.html.pdf - FileType = "pdf"

Now it becomes trickier, because the sscanf() format string assumes
there would be no intermediate '.'s before the one that delimits the
extension. My scanf-fu isn't strong enough to make a format string that
would handle all your above examples, but I would love to see one that
does.

You could use sscanf() multiple times to scan for both the one- and
two-period forms, but that's getting a little bit zany. (I'm imagining
an array of scanf format strings that are tried in the proper order
until one works.)

Someone else posted a solution that did a bit more character-wise stuff;
something more along those lines might be useful in this case:

char filename[] = "test.pdf.txt";
char *lastdot;

lastdot = strrchr(filename, '.'); // find the last '.' in the string

if (lastdot != NULL) {
printf("extension: %s\n", lastdot+1);
} else {
printf("[filename has no extension??)\n"); // ??>;-)
}

-Beej
 
H

Hans

Hans   said:
Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title , FileType );
ebook1.pdf - FileType = "pdf"
ebook1.pdf.txt - FileType = "txt"
ebook2.djvu - FileType = "djvu"
ebook2.djvu.txt - FileType = "txt"
ebook3.html.pdf - FileType = "pdf"

Now it becomes trickier, because the sscanf() format string assumes
there would be no intermediate '.'s before the one that delimits the
extension.  My scanf-fu isn't strong enough to make a format string that
would handle all your above examples, but I would love to see one that
does.

You could use sscanf() multiple times to scan for both the one- and
two-period forms, but that's getting a little bit zany.  (I'm imagining
an array of scanf format strings that are tried in the proper order
until one works.)

Someone else posted a solution that did a bit more character-wise stuff;
something more along those lines might be useful in this case:

    char filename[] = "test.pdf.txt";
    char *lastdot;

    lastdot = strrchr(filename, '.'); // find the last '.' in the string

    if (lastdot != NULL) {
        printf("extension: %s\n", lastdot+1);
    } else {
        printf("[filename has no extension??)\n"); // ??>;-)
    }

-Beej

I wrote a function to recursively call itself with shorter strings
until I get to the last string after the dot. I just wrapped your code
snippet on sscanf() in a function.

I've been testing it for the last hour and it seems to work well. Of
course, since I have very limited knowledge of C, I can't tell if
there's a better way. (Frankly, I don't know what I'm doing, but I'm
going with my common sense.)

Anyway, do comment on what I've done. I'll also look into how to write
a solution based on the strrchr() function.

//---------------------------------------------------------
char* FileType( char DotString[512] ) {
//------------------------------------------------------
char Left[512];
char Right[16];

// keep calling self with shorter string
if ( sscanf( DotString , "%511[^.].%15s" , Left , Right ) == 2 )
return ( FileType( Right ) );

// convert to lowercase
int i;
for( i = 0 ; ( DotString = tolower( DotString ) ) ; i++ );

return ( DotString ); // last string is the file type

} // end
 
B

Ben Bacarisse

Beej Jorgensen said:
Hans said:
Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title , FileType );

ebook1.pdf - FileType = "pdf"
ebook1.pdf.txt - FileType = "txt"
ebook2.djvu - FileType = "djvu"
ebook2.djvu.txt - FileType = "txt"
ebook3.html.pdf - FileType = "pdf"

Now it becomes trickier, because the sscanf() format string assumes
there would be no intermediate '.'s before the one that delimits the
extension. My scanf-fu isn't strong enough to make a format string that
would handle all your above examples, but I would love to see one that
does.

It is surprisingly simple but tricksy enough that I doubt it would
pass any code review:

Matched = sscanf(thisFile->d_name,
"%511[^.].%15[^.].%15s" , Title, FileType, FileType);

and the test for success become Matched >= 2.
 
B

Ben Bacarisse

Hans said:
Anyway, do comment on what I've done. I'll also look into how to write
a solution based on the strrchr() function.

//---------------------------------------------------------
char* FileType( char DotString[512] ) {
//------------------------------------------------------
char Left[512];
char Right[16];

// keep calling self with shorter string
if ( sscanf( DotString , "%511[^.].%15s" , Left , Right ) == 2 )
return ( FileType( Right ) );

// convert to lowercase
int i;
for( i = 0 ; ( DotString = tolower( DotString ) ) ; i++ );

return ( DotString ); // last string is the file type

} // end


This won't work. You return a pointer to a local array. By the time
you use it, that array has gone (its lifetime has ended) and the
result is undefined (in some systems it might appear to work).
 
H

Hans

<snip>


Anyway, do comment on what I've done. I'll also look into how to write
a solution based on the strrchr() function.
//---------------------------------------------------------
char* FileType( char DotString[512] ) {
//------------------------------------------------------
char Left[512];
char Right[16];
// keep calling self with shorter string
if ( sscanf( DotString , "%511[^.].%15s" , Left , Right ) == 2 )
   return ( FileType( Right ) );
// convert to lowercase
int i;
for( i = 0 ; ( DotString = tolower( DotString ) ) ; i++ );

return ( DotString ); // last string is the file type

This won't work.  You return a pointer to a local array.  By the time
you use it, that array has gone (its lifetime has ended) and the
result is undefined (in some systems it might appear to work).


Can you be more specific as to the results? I just ran it against 500+
ebooks I have in a folder. I was able to create 1,500+ files in under
20 seconds. I checked the files and they are exactly what I wanted.
 
B

Ben Bacarisse

Hans said:
<snip>


Anyway, do comment on what I've done. I'll also look into how to write
a solution based on the strrchr() function.
//---------------------------------------------------------
char* FileType( char DotString[512] ) {
//------------------------------------------------------
char Left[512];
char Right[16];
// keep calling self with shorter string
if ( sscanf( DotString , "%511[^.].%15s" , Left , Right ) == 2 )
   return ( FileType( Right ) );
// convert to lowercase
int i;
for( i = 0 ; ( DotString = tolower( DotString ) ) ; i++ );

return ( DotString ); // last string is the file type

This won't work.  You return a pointer to a local array.  By the time
you use it, that array has gone (its lifetime has ended) and the
result is undefined (in some systems it might appear to work).


It is best not to quote sig blocks.
Can you be more specific as to the results?

Ah, no I can't. Once a program has what C calls "undefined
behaviour", all bets are off. I can't say anything very specific
about the results. I can take your function an use it in a program
that gives me sometimes correct an answers and sometimes wrong ones
but this is not very helpful.

I'd use the strrchr method or my sscanf "hack" from another post.
I just ran it against 500+
ebooks I have in a folder. I was able to create 1,500+ files in under
20 seconds. I checked the files and they are exactly what I wanted.

Chalk it up to being unlucky! Lucky people get to see errors quickly
but C is not good in this respect. There are lots of invalid
constructs that can appear to work for a while.

I know it is too late, by C is not the right tool for this task unless
you are using a system with almost no helpful tools.
 
H

Hans

Can you be more specific as to the results?
Ah, no I can't.  Once a program has what C calls "undefined
behaviour", all bets are off.  I can't say anything very specific
about the results.  I can take your function an use it in a program
that gives me sometimes correct an answers and sometimes wrong ones
but this is not very helpful.

I'd use the strrchr method or my sscanf "hack" from another post.

Your diagnosis of the problem and your proposed solution do not
exactly match each other. If you are saying that "it is unsafe to pass
a pointer to a local array back to the calling program" as the
problem, then maybe the better suggestion is to propose sending the
variable to the called routine. Wouldn't this be better?

By the way, my knowledge of C/C++ is limited and I don't know pointers
well.

Chalk it up to being unlucky!  Lucky people get to see errors quickly
but C is not good in this respect.  There are lots of invalid
constructs that can appear to work for a while.

Is there a way to force/trigger an error? Something like an assertion
to test code. It would be better to propose a line of code I can
insert in my program to break it rather than just relying on chance to
prove a point.

I know it is too late, by C is not the right tool for this task unless
you are using a system with almost no helpful tools.

I'm interested in solving this problem using C.
 
B

Ben Bacarisse

Hans said:
Your diagnosis of the problem and your proposed solution do not
exactly match each other.

Well, they do, I just missed out a link. The recursive solution has a
problem (returning a pointer to automatic storage) and I don't think
it is worth fixing it so I'd go with some other method.

I realised that I was being negative without offing a way forward so I
added the proposed solution. I should maybe have said "I think it is
easier and clearer to use another approach".
If you are saying that "it is unsafe to pass
a pointer to a local array back to the calling program" as the
problem, then maybe the better suggestion is to propose sending the
variable to the called routine. Wouldn't this be better?

If you like that solution, go for it. I did not think that would give
a clear and idiomatic solution so I did not suggest a fix to the
recursive method. I think finding the last '.' is the winner in terms
of clarity and simplicity.
By the way, my knowledge of C/C++ is limited and I don't know pointers
well.

You seem to be doing fine.
Is there a way to force/trigger an error? Something like an assertion
to test code. It would be better to propose a line of code I can
insert in my program to break it rather than just relying on chance to
prove a point.

Not that I know of, although there are memory testing "harnesses" like
valgrind that can help a lot.

<snip>
 
C

CBFalconer

Ben said:
Anyway, do comment on what I've done. I'll also look into how to
write a solution based on the strrchr() function.

//---------------------------------------------------------
char* FileType( char DotString[512] ) {
//------------------------------------------------------
char Left[512];
char Right[16];

// keep calling self with shorter string
if ( sscanf( DotString , "%511[^.].%15s" , Left , Right ) == 2 )
return ( FileType( Right ) );

// convert to lowercase
int i;
for( i = 0 ; ( DotString = tolower( DotString ) ) ; i++ );

return ( DotString ); // last string is the file type

} // end


This won't work. You return a pointer to a local array. By the
time you use it, that array has gone (its lifetime has ended) and
the result is undefined (in some systems it might appear to work).


Just looking at what you posted I think you are in error.
DotString is a parameter to FileType, and is passed as a pointer to
an actual array. It would be better marked as "char *DotString",
since the 512 is meaningless.
 
B

Ben Bacarisse

CBFalconer said:
Ben said:
Anyway, do comment on what I've done. I'll also look into how to
write a solution based on the strrchr() function.

//---------------------------------------------------------
char* FileType( char DotString[512] ) {
//------------------------------------------------------
char Left[512];
char Right[16];

// keep calling self with shorter string
if ( sscanf( DotString , "%511[^.].%15s" , Left , Right ) == 2 )
return ( FileType( Right ) );

// convert to lowercase
int i;
for( i = 0 ; ( DotString = tolower( DotString ) ) ; i++ );

return ( DotString ); // last string is the file type

} // end


This won't work. You return a pointer to a local array. By the
time you use it, that array has gone (its lifetime has ended) and
the result is undefined (in some systems it might appear to work).


Just looking at what you posted I think you are in error.
Gosh.

DotString is a parameter to FileType, and is passed as a pointer to
an actual array. It would be better marked as "char *DotString",
since the 512 is meaningless.


When the original "top-level" argument is returned there is no error,
but this is neither the usual execution path nor the one I was talking
about. Work through what happens when there *is* an extension to by
parsed out of the string.
 
H

Hans

Just looking at what you posted I think you are in error.
DotString is a parameter to FileType, and is passed as a pointer to
an actual array.  It would be better marked as "char *DotString",
since the 512 is meaningless.

Ya, I have my suspicions. The code didn't look right and it seemed
unnecessary coding work to be putting [512] after the parameter name.
 
H

Hans

Gosh.
When the original "top-level" argument is returned there is no error,
but this is neither the usual execution path nor the one I was talking
about.  Work through what happens when there *is* an extension to by
parsed out of the string.

I don't follow, but it sounds bad.
 
B

Ben Bacarisse

I don't follow, but it sounds bad.

Here is code with parts that don't come into the discussion removed:

char *FileType(char *DotString)
{
char Left[512];
char Right[16];
if (sscanf(DotString, "%511[^.].%15s", Left, Right) == 2)
return FileType(Right);
return DotString;
}

I've changed the [512] to a * in parameter -- the compiler does this
itself -- arrays are passed as pointers to their first element.

What happens in the call FileType("abc.def")? Let step though it:

1. Create DotString and assign to it a pointer to the 'a' in
"abc.def".
2. Create Left and Right.
3. Call sscanf passing it DotString (a pointer to "abc.def"), a format
string, and pointers to Left[0] and Right[0].
4. sscanf matches two items. Left gets "abc" put into it (in
locations 0 though 3 -- 3 is the terminating null character) and
Right gets "def" put it (again correctly terminated).
5. Since sscanf returns 2, we call FileType(Right) in order to return
the result we get.
<deep breath...>
5a. Create DotString and assign to it a pointer to the first element
of Right (an array containing "def").
5b. Create Left and Right.
5c. Call sscanf. It won't match. It will not return 2.
5d. Return the DotString (the pointer to out caller's Right array).
5e. When a function return local data is destroyed: DotString, Left
and Right (the new ones with nothing of interest in them) are
thrown away.
6. Return the pointer that FileType(Right) returned to us. This is a
pointer to the start of the array we created in step 2 and filled
in step 4. It contains, correctly "def".
7. Destroy local data. This includes Left and Right despite the fact
that a pointer to this data is being returned to whoever called us.

Does the help?
 
H

Hans

Gosh.
Oops. You didn't have to do that. I just needed to know what you made
you say "Gosh" and then the discovery you made--whether I was right or
wrong; or you were right or wrong; or CBFalconer.

Anyway, I'll just keep your explanation for future reference.

I decided to pursue the strrchr() solution instead of sscanf(), but
now you will know why I didn't in the first place. I am not very good
at pointer arithmetic. (This snippet was originally proposed by Beej.)

////-------------------------------------------------------
void FileType2( char *DotString , char *DotType ) {
////----------------------------------------------------
char *LastDot;

LastDot = strrchr( DotString, '.' ); // find the last '.' in the
string

if ( LastDot != NULL )
printf( "%s\n", LastDot+1 );
else
printf( "%s\n", DotString );

//DotType is the return string

} // end

So, now I ask, how do you get the string value from LastDot+1 to my
target return variable DotType? And how do call FileType2()? Just
FileType2( eBook1, BookType ) or FileType2( eBook1, &BookType )?
 
B

Beej Jorgensen

Ben Bacarisse said:
Matched = sscanf(thisFile->d_name,
"%511[^.].%15[^.].%15s" , Title, FileType, FileType);

and the test for success become Matched >= 2.

Oh yeah--heh!

-Beej
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top