How to extract a string starting with 'abc' & ending with 'xyz' ?

U

Umesh

I want to extract a string abc*xyz from a text file.
* indicates arbitrary no. of characters.

I'm only able to do it when the string has definite no. of characters
or the string length is constant: i.e. five or the string is abc?????
xyz
How can i generalize it for any length of the string?
 
I

Ian Collins

Umesh said:
I want to extract a string abc*xyz from a text file.
* indicates arbitrary no. of characters.

I'm only able to do it when the string has definite no. of characters
or the string length is constant: i.e. five or the string is abc?????
xyz
How can i generalize it for any length of the string?
Have you looked up regular expressions yet?
 
J

Joachim Schmitz

Umesh said:
I want to extract a string abc*xyz from a text file.
* indicates arbitrary no. of characters.

I'm only able to do it when the string has definite no. of characters
or the string length is constant: i.e. five or the string is abc?????
xyz
How can i generalize it for any length of the string?
You may want to look for regex, a librarx for regular expressions.
http://directory.fsf.org/regex.html

Bye, Jojo
 
C

Chris Dollin

Umesh said:
I want to extract a string abc*xyz from a text file.
* indicates arbitrary no. of characters.

I'm only able to do it when the string has definite no. of characters
or the string length is constant: i.e. five or the string is abc?????
xyz
How can i generalize it for any length of the string?

I don't know how you can generalise it, because I don't
know what your specifically doing.

But, assuming you don't just want to fall back to using
regular expressions, I don't see what your problem is.

Find `abc` (easy-peasy strstr-squeezy). Then find `xyz`
starting from after the `b`. Viola.

Perhaps you haven't described your full problem?
 
M

Malcolm McLean

Umesh said:
I want to extract a string abc*xyz from a text file.
* indicates arbitrary no. of characters.

I'm only able to do it when the string has definite no. of characters
or the string length is constant: i.e. five or the string is abc?????
xyz
How can i generalize it for any length of the string?
As others have pointed out, if you want to handle the geneneral problem of
"how can I match a string with charcteristics such and such" you want
regular expressions.
However for a one off simple pattern regular expressions are overkill.

Use strncmp() to compare the first three character to "abc". Call strlen()
to find the end of the string, move back three places (make sure you don't
move into memory you don't own, place minus one), the call strcmp() with
"xyz".

Warp it all up in a little function of its own
 
U

Umesh

/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /

//abc?????xyz
#include<stdio.h>
#include<stdlib.h>
int main()
{

FILE *f,*fp;
f=fopen("c:/1.txt","r");
if(f==NULL)
{
puts("Error opening file");
exit(0);
}
fp=fopen("c:/2.txt","w");

char c[12];
while((c[0]=getc(f))!=EOF)
if(c[0]=='a' && (c[1]=getc(f))!=EOF && c[1]=='b' && (c[2]=getc(f))!
=EOF && c[2]=='c'&& (c[3]=getc(f))!=EOF && c[3]!=' ' && (c[4]=getc(f))!
=EOF && c[4]!=' ' && (c[5]=getc(f))!=EOF && c[5]!=' ' &&
(c[6]=getc(f))!=EOF && c[6]!=' ' && (c[7]=getc(f))!=EOF && c[7]!=' '
&& (c[8]=getc(f))!=EOF && c[8]=='x'&& (c[9]=getc(f))!=EOF &&
c[9]=='y' && (c[10]=getc(f))!=EOF && c[10]=='z')
{
c[11]='\0';
fprintf(fp,"%s\n",c);

}
fclose(f);
fclose(fp);
return 0;

}
 
I

Ian Collins

Umesh said:
/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /
1) learn how to post.
2) find a regular expression library.
 
A

Army1987

Umesh said:
/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /
No, it doesn't. "abc45678xyz" doesn't work.
You are only comparing the string with "abc xyz".
//abc?????xyz
#include<stdio.h>
#include<stdlib.h>
int main()
{

FILE *f,*fp;
f=fopen("c:/1.txt","r");
if(f==NULL)
{
puts("Error opening file");
Write that to stderr, not to stdout...
Use EXIT_FAILURE, not 0 which means 'success'...
}
fp=fopen("c:/2.txt","w"); Check it for NULL, too.

char c[12];
while((c[0]=getc(f))!=EOF)
EOF doesn't fit in a unsigned char and might compare equal to a
valid signed char.
see www.c-faq.com, question 12.1.
if(c[0]=='a' && (c[1]=getc(f))!=EOF && c[1]=='b' && (c[2]=getc(f))!
=EOF && c[2]=='c'&& (c[3]=getc(f))!=EOF && c[3]!=' ' && (c[4]=getc(f))!
=EOF && c[4]!=' ' && (c[5]=getc(f))!=EOF && c[5]!=' ' &&
(c[6]=getc(f))!=EOF && c[6]!=' ' && (c[7]=getc(f))!=EOF && c[7]!=' '
&& (c[8]=getc(f))!=EOF && c[8]=='x'&& (c[9]=getc(f))!=EOF &&
c[9]=='y' && (c[10]=getc(f))!=EOF && c[10]=='z')
What a mess...
What's wrong with
scanf("%11c", c);
if (!strcmp(c, "abc xyz"))...
{
c[11]='\0';
fprintf(fp,"%s\n",c);

}
fclose(f);
fclose(fp); Check wheter these succeeded.
return 0;

}

Try:
/*not compiled, not tested*/
#define MAXLINE 16383
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int status;
FILE *infp, *outfp;
char buf[MAXLINE+1];
char *str, *endstr = NULL;
infp = fopen("c:/1.txt", "r");
if (infp == NULL) {
perror("Unable to open input file");
exit(EXIT_FAILURE);
}
outfp = fopen("c:/2.txt", "r");
if (outfp == NULL) {
perror("Unable to open or create output file");
fclose(infp);
exit(EXIT_FAILURE);
}
while (fgets(buf, MAXLINE, infp)) {
endstr = NULL;
(str = strstr(buf, "abc")) && (endstr = strstr(str,"xyz"));
if (endstr != NULL)
fprintf(outfp, "%*s", endstr-str+3, str);
}
status = ferror(infp) || ferror(infp);
if (fclose(infp)) {
perror("Unable to close input file");
status++;
}
if (fclose(outfp)) {
perror("Unable to close output file");
status++;
}
return status ? EXIT_FAILURE : 0;
}
 
A

Army1987

Army1987 said:
Umesh said:
/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /
No, it doesn't. "abc45678xyz" doesn't work.
You are only comparing the string with "abc xyz".
//abc?????xyz
#include<stdio.h>
#include<stdlib.h>
int main()
{

FILE *f,*fp;
f=fopen("c:/1.txt","r");
if(f==NULL)
{
puts("Error opening file");
Write that to stderr, not to stdout...
Use EXIT_FAILURE, not 0 which means 'success'...
}
fp=fopen("c:/2.txt","w"); Check it for NULL, too.

char c[12];
while((c[0]=getc(f))!=EOF)
EOF doesn't fit in a unsigned char and might compare equal to a
valid signed char.
see www.c-faq.com, question 12.1.
if(c[0]=='a' && (c[1]=getc(f))!=EOF && c[1]=='b' && (c[2]=getc(f))!
=EOF && c[2]=='c'&& (c[3]=getc(f))!=EOF && c[3]!=' ' && (c[4]=getc(f))!
=EOF && c[4]!=' ' && (c[5]=getc(f))!=EOF && c[5]!=' ' &&
(c[6]=getc(f))!=EOF && c[6]!=' ' && (c[7]=getc(f))!=EOF && c[7]!=' '
&& (c[8]=getc(f))!=EOF && c[8]=='x'&& (c[9]=getc(f))!=EOF &&
c[9]=='y' && (c[10]=getc(f))!=EOF && c[10]=='z')
What a mess...
What's wrong with
scanf("%11c", c);
if (!strcmp(c, "abc xyz"))...
{
c[11]='\0';
fprintf(fp,"%s\n",c);

}
fclose(f);
fclose(fp); Check wheter these succeeded.
return 0;

}

Try:
/*not compiled, not tested*/
#define MAXLINE 16383
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int status;
FILE *infp, *outfp;
char buf[MAXLINE+1];
char *str, *endstr = NULL;
infp = fopen("c:/1.txt", "r");
if (infp == NULL) {
perror("Unable to open input file");
exit(EXIT_FAILURE);
}
outfp = fopen("c:/2.txt", "r");
if (outfp == NULL) {
perror("Unable to open or create output file");
fclose(infp);
exit(EXIT_FAILURE);
}
while (fgets(buf, MAXLINE, infp)) {
endstr = NULL;
(str = strstr(buf, "abc")) && (endstr = strstr(str,"xyz"));
if (endstr != NULL)
fprintf(outfp, "%*s", endstr-str+3, str);
fprintf(outfp, "%.*s\n", endstr-str+3, str);
}
status = ferror(infp) || ferror(infp);
status = ferror(infp) || ferror(outfp);
 
E

Eric Sosman

Umesh said:
I want to extract a string abc*xyz from a text file.
* indicates arbitrary no. of characters.

I'm only able to do it when the string has definite no. of characters
or the string length is constant: i.e. five or the string is abc?????
xyz
How can i generalize it for any length of the string?

Using what the Standard library provides, you can do

const char *the_string = ...;
const char *abc, *xyz;

abc = strstr(the_string, "abc");
if (abc != NULL) {
xyz = strstr(abc + 3, "xyz");
if (xyz != NULL) {
printf ("Found \"%.*s\"\n",
(int)(xyz + 3 - abc), abc);
 
O

osmium

Umesh said:
/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /

The first thing you should do is choose a language to learn and then focus
on *that* language. You have posted questions to the C++ group and it is a
more powerful language than C. (Note that "more powerful" does not mean the
same thing as "better".) So I suggest this:

o decide to learn C++
o post to the C++ group only.

OR (Big Google type OR)

o decide to learn C
o post to the C group only

Learning a language is difficult enough. Trying to learn two at the same
time simply compounds the problems.
 
C

Chris Dollin

Umesh said:
/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /

//abc?????xyz
#include<stdio.h>
#include<stdlib.h>
int main()
{

FILE *f,*fp;
f=fopen("c:/1.txt","r");
if(f==NULL)
{
puts("Error opening file");
exit(0);
}
fp=fopen("c:/2.txt","w");

char c[12];
while((c[0]=getc(f))!=EOF)
if(c[0]=='a' && (c[1]=getc(f))!=EOF && c[1]=='b' && (c[2]=getc(f))!
=EOF && c[2]=='c'&& (c[3]=getc(f))!=EOF && c[3]!=' ' && (c[4]=getc(f))!
=EOF && c[4]!=' ' && (c[5]=getc(f))!=EOF && c[5]!=' ' &&
(c[6]=getc(f))!=EOF && c[6]!=' ' && (c[7]=getc(f))!=EOF && c[7]!=' '
&& (c[8]=getc(f))!=EOF && c[8]=='x'&& (c[9]=getc(f))!=EOF &&
c[9]=='y' && (c[10]=getc(f))!=EOF && c[10]=='z')
{
c[11]='\0';
fprintf(fp,"%s\n",c);

Hell's Freelling Teeth, no /wonder/ you're having trouble if
you're trying to write it like that.

Ignoring the deeply relevant fact that you can't portably expect
EOF to be storable in a `char`, read an /entire line/ into a
string and look in /that/. `fgets` is your friend.

Yes, you pay in store. Yes, it's /possible/ to do the job
without having to store entire lines. But if you really
really want to go there, you don't have a C problem -- you
have an algorithms problem.
 
U

Umesh

I actually want to find words starting with a and ending with b in a
text file and put the output in a file. So there will be no spaces
between the words. .
 
O

osmium

Umesh said:
/* This is what I can do. If the length of the string starting with
abc & ending with xyz is known(11 in this case), I can program it as
follows. But if the length of the string varies between 10 to 50* what
should I do? Thanks. /

//abc?????xyz
#include<stdio.h>
#include<stdlib.h>
int main()
{

FILE *f,*fp;
f=fopen("c:/1.txt","r");
if(f==NULL)
{
puts("Error opening file");
exit(0);
}
fp=fopen("c:/2.txt","w");

char c[12];
while((c[0]=getc(f))!=EOF)
if(c[0]=='a' && (c[1]=getc(f))!=EOF && c[1]=='b' && (c[2]=getc(f))!
=EOF && c[2]=='c'&& (c[3]=getc(f))!=EOF && c[3]!=' ' && (c[4]=getc(f))!
=EOF && c[4]!=' ' && (c[5]=getc(f))!=EOF && c[5]!=' ' &&
(c[6]=getc(f))!=EOF && c[6]!=' ' && (c[7]=getc(f))!=EOF && c[7]!=' '
&& (c[8]=getc(f))!=EOF && c[8]=='x'&& (c[9]=getc(f))!=EOF &&
c[9]=='y' && (c[10]=getc(f))!=EOF && c[10]=='z')
{
c[11]='\0';
fprintf(fp,"%s\n",c);

}
fclose(f);
fclose(fp);
return 0;

}

You would be well served by trying something simpler first. Then you can
build on the knowledge you get

Do this: Write a C program that uses getc() to read a file, print the
output, and then prints "EOF received" after the last char in the file.

Note that, despite the message linking, I have read the post you made at
9:34 AM.
 
M

Malcolm McLean

Umesh said:
I actually want to find words starting with a and ending with b in a
text file and put the output in a file. So there will be no spaces
between the words. .
In that case a regular expression type library might make more sense.

You might want to shift your definition of "word" to include or exclude
hyphens, forms with apostrophes, acronyms in upper case, and the like.
That's a lot easier if you have a regular expression rather than going the
hardcoding route.

Having extracted a word you should be able to examine the firstr character

str[0]

and the last

str[strlen(str)-1]

without any problem. Or having acquired a regular expression library you can
use that and do the job properly.
 
W

Walter Roberson

I actually want to find words starting with a and ending with b in a
text file and put the output in a file. So there will be no spaces
between the words. .

All of these text searches you have asked about can be solved
by a simple technique called a "state machine".

set state to 0
while (inchar = getc()) != EOF { /* beginning of a line */
if machinestate is 0 {
if inchar is 'a' {
store inchar for later output
set machinestate to 1;
} else if inchar is '\n'
set machinestate to 0;
else
set machinestate to 99;
} else if machinestate is 1 { /* 'a' detected at beginning of line */
if inchar is '\n' {
discard saved input characters
set machinestate to 0;
} else if inchar is 'b' {
store inchar for later output
set machinestate to 2;
} else {
discard saved input characters
set machinestate to 99;
}
} else if machinestate is 2 { /* 'ab' detected at beginning of line */
if inchar is '\n' {
discard saved input characters
set machinestate to 0;
} else if inchar is 'c' {
store inchar for later output
set machinestate to 3;
} else {
discard saved input characters
set machinestate to 99;
}
} else if machinestate is 3 { /* 'abc' detected at beginning of line */
if inchar is '\n' {
discard saved input characters
set machinestate to 0;
} else if inchar is 'x' {
store inchar for later output
set machinestate to 4;
} else if inchar is a word character {
store inchar for later output
set machinestate to 3;
} else {
discard saved input characters
set machinestate to 99;
}
} else if machinestate is 4 { /* 'abc'*'x' detected */
if inchar is '\n' {
discard saved input characters
set machinestate to 0;
} else if inchar is 'y' {
store inchar for later output
set machinestate to 5;
} else if inchar is 'x' {
store inchar for later output
set machinestate to 4;
} else if inchar is a word character {
store inchar for later output
set machinestate to 3;
} else {
discard saved input characters
set machinestate to 99;
}
} else if machinestate is 5 { /* 'abc'*'xy' detected */
if inchar is '\n' {
discard saved input characters
set machinestate to 0;
} else if inchar is 'z' {
save inchar for later output
set machinestate to 6;
} else if inchar is 'x' {
save inchar for later output
set machinestate to 4;
} else if inchar is a word character {
store inchar for later output
set machinestate to 3;
} else {
discard saved input characters
set machinestate to 99;
}
} else if machinestate is 6 { /* 'abc'*'xyz' detected */
if inchar is '\n' {
output saved input characters
discard saved input characters
set machinestate to 0;
} else if inchar is 'x' {
save inchar for later output
set machinestate to 4;
} else if inchar is a word character {
machinestate inchar for later output
set machinestate to 3;
} else {
discard saved input characters
set machinestate to 99;
}
} else if inchar is '\n' /* machinestate must be 99 here. */
set machinestate to 0;
else
set machinestate to 99;
}

/* EOF reached */

if machinestate is 6 {
output saved input characters
discard saved input characters
} if machinestate is not 0 {
discard saved input characters
}
 
O

osmium

Walter Roberson said:
All of these text searches you have asked about can be solved
by a simple technique called a "state machine".

Yes. But the OP still doesn't understand something as fundamental as
detecting EOF.

This despite the fact that he has been posting since December 2006.
 
J

Jack Klein

No, it doesn't. "abc45678xyz" doesn't work.
You are only comparing the string with "abc xyz".

Write that to stderr, not to stdout...

No, no, no, NO, NO!!!

The automatic assumption that all programs should write error messages
to stderr is one of the causes of a lot of grief by inconsiderate
programmers.

Programs that generate their primary output to stdout might be
justified in outputting error messages to stderr.

Programs that generate their output to a file, as this one attempts to
do, should indeed generate their error messages to stdout. And that
specifically includes compilers.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
K

Keith Thompson

Jack Klein said:
No, no, no, NO, NO!!!

Yes, yes, yes, YES, YES!!!
The automatic assumption that all programs should write error messages
to stderr is one of the causes of a lot of grief by inconsiderate
programmers.

What grief?
Programs that generate their primary output to stdout might be
justified in outputting error messages to stderr.

*Might* be?
Programs that generate their output to a file, as this one attempts to
do, should indeed generate their error messages to stdout. And that
specifically includes compilers.

I disagree (as you've probably guessed by now).

Displaying error messages is exactly what stderr is for. I fail to
see the point of writing them to stdout just because a program's
primary output happens to go to a file.

I've just tried 4 different Unix-based C compilers, and they all write
their error messages to stderr. Why exactly is this a problem? (Note
that if they were changed to write error messages to stdout, I know of
some tools that would break.)

I'm assuming here that your environment lets you manage stdout and
stderr reasonably, for example directing them both to the same
destination. If not, that's a problem with your environment, not with
the compiler.

I suppose this discussion of compiler behavior is strictly off-topic,
since a C compiler needn't be written in C, and may not even have the
concepts of stdout and stderr. But the discussion started with an
ordinary C program writing an error message to stderr, and the same
considerations apply.
 
M

Malcolm McLean

Jack Klein said:
No, no, no, NO, NO!!!

The automatic assumption that all programs should write error messages
to stderr is one of the causes of a lot of grief by inconsiderate
programmers.

Programs that generate their primary output to stdout might be
justified in outputting error messages to stderr.

Programs that generate their output to a file, as this one attempts to
do, should indeed generate their error messages to stdout. And that
specifically includes compilers.
I am writing a lot of short utilties at the moment.
I am not sure whether they are easier to use if they send their output to
stdout or to a named file passed as a parameter. So of course all the
functions are written to take a file pointer, and I can toggle between the
two with a couple of lines in main.

One good question is "what is an error?". For instance if the user just
types the name of the program, traditionally that prints out a usage
message. I always exit with EXIT_FAILURE. The problem is that it is not
aways easy to catch that, say from Perl scripts, and you don't necessarily
want the usage message beign treated as a vaild output.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top