Newbie problem with strings

K

Keith

Hello everyone,

This is my first question to the newsgroup and I'm very new to
programming with C. My background is with Cisco routers, switches, and
firewalls. Over the years I've been collecting rather large syslog
files from firewalls and used several variations of "grep" and
Microsoft's "findstr" to pull out specific information from the
syslogs (layer 4 ports and IP addresses) and ignore all the other
information.

With this in mind I took it upon myself to learn C so that I could
create a simple little program to do all that work for me. I've
created a program listed below that is trying to get me to where I
need to go, but I know I'm not fully understanding the way C uses
pointers to variables when the variables are used over and over again
in a loop. Though most of my syslog files have over 100,000 records
for any one weekday, the problem my program has can be shown with just
a file that has two records. What you will see from the program is
that I open a syslog file called "pix1.txt" with just the contents
listed below:

<166>%PIX-6-302001: Built outbound TCP connection 35700999 for faddr
209.195.178.164/21 gaddr 67.65.127.249/59371 laddr 10.231.3.76/2875
<166>%PIX-6-302005: Built UDP connection for faddr 200.23.1.1/13156
gaddr 67.65.127.26/53 laddr 10.231.200.250/53

I then scan through the records locating various parts so that I can
record if the layer 4 traffic is TCP or UDP, the layer 4 port that is
used (the one I'm interested in comes right before the word "gaddr" in
the record), and the source IP address (which is located right after
the word "laddr" in the record). I'm using the strcpy and strncpy
functions and I'm pretty sure my problem is with the way I'm using the
variables and the way those functions use pointers. If it were working
correctly my newly created file called "ports.txt" would contain the
following data (I delimit with '&' so that I can import into Excel):

TCP&21&10.231.3.76
UDP&13156&10.231.200.250

However, what I'm getting is the following:

TCP&21&10.231.3.76
UDP&13156UDP&10.231.200.250

The first line looks fine but the second line has already starting
including some things that I didn't mean to include. It gets even
worse when the source file has more than two syslog records, and I'm
sure that's due to a snowballing effect of some sort.

My question is that since my goal is pretty simple (I want to take
data from one file and throw out stuff I don't want and write the
stuff I want to another file), there has to be something very
elementary that I'm missing with the way C handles strings and any
help to point me in the right direction would be greatly appreciated.
Below is copy of the source file I'm working on at the moment. I
downloaded lcc-win32 and have used the C tutorial to learn what I've
done so far, and I'm able to use the debugging option of the IDE to
see the errors happening with the pointers during the string
functions.

Thanks for the help,
Ron


/*------------------------------------------------------------------------
Module: c:\lcc\sysread\sysread.c
Author: Ron
Project: Sysread
State: Work in Progress
Creation Date: August 2003
Description: This program reads a file containing syslog
output from a PIX and writes the ports and
source IP's to a new file.
------------------------------------------------------------------------*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXLINELEN 200

int main()
{
FILE *infile,*outfile;
char buf1[MAXLINELEN], protocol[3], port[5], ipaddress[15];
int index1;

infile = fopen("pix1.txt","r");
outfile = fopen("ports.txt","w");
while(fgets(buf1,MAXLINELEN,infile)) {
/*------------------------------------------------------------------
Determine if TCP or UDP.
------------------------------------------------------------------*/
if (strstr(buf1,"TCP") == NULL) {
strcpy(protocol,"UDP");
}
else {
strcpy(protocol,"TCP");
}
/*------------------------------------------------------------------
Determine the layer four port.
------------------------------------------------------------------*/
index1 = strcspn(buf1,"/");
strcpy(buf1,&buf1[index1 + 1]);
index1 = strcspn(buf1,"g");
strncpy(port,buf1,index1 - 1);
/*------------------------------------------------------------------
Determine the source IP address.
------------------------------------------------------------------*/
index1 = strcspn(buf1,"l");
strcpy(buf1,&buf1[index1 + 6]);
index1 = strcspn(buf1,"/");
strncpy(ipaddress,buf1,index1);
fprintf(outfile,"%s&%s&%s\n",protocol,port,ipaddress);
}
fclose(infile);
fclose(outfile);
return 0;
}
 
J

Jirka Klaue

Keith wrote:
[...]
TCP&21&10.231.3.76
UDP&13156&10.231.200.250

However, what I'm getting is the following:

TCP&21&10.231.3.76
UDP&13156UDP&10.231.200.250 [...]
char buf1[MAXLINELEN], protocol[3], port[5], ipaddress[15];

"UDP" contains 4 characters. :)
-> 'U', 'D', 'P' and a trailing '\0'.

They do not fit in protocol[3]. You need protocol[4].
You made the same of-by-one error with port and ipaddress.

I didn't look through your code further, but this could very well
solve your problem.

Jirka
 
B

Ben Pfaff

FILE *infile,*outfile;
char buf1[MAXLINELEN], protocol[3], port[5], ipaddress[15];
int index1;

infile = fopen("pix1.txt","r");
outfile = fopen("ports.txt","w");

You should check the return value of fopen() to make sure that it
succeeded.
while(fgets(buf1,MAXLINELEN,infile)) {
/*------------------------------------------------------------------
Determine if TCP or UDP.
------------------------------------------------------------------*/
if (strstr(buf1,"TCP") == NULL) {
strcpy(protocol,"UDP");
}
else {
strcpy(protocol,"TCP");
}

protocol[] only has room for 3 characters. It needs room for 4:
you're forgetting the null terminator.
/*------------------------------------------------------------------
Determine the layer four port.
------------------------------------------------------------------*/
index1 = strcspn(buf1,"/");
strcpy(buf1,&buf1[index1 + 1]);

You're copying overlapping strings here: one part of buf1 into
another part of buf1. That's undefined behavior. Use a second
buffer.
index1 = strcspn(buf1,"g");
strncpy(port,buf1,index1 - 1);

There is occasionally a good reason to use strncpy(). However:

* Using strncpy() into a large buffer can be very inefficient.
strncpy() always writes to every byte in the destination
buffer, which can waste a lot of time if the destination
buffer is much longer than the source string.

* If the source string is longer than the size of the
destination buffer, then strncpy() doesn't write a
terminating null. So a call to strncpy() must be followed
by explicitly writing a null terminator at the end of the
destination buffer in most cases.

In particular, it seems to me that you might better here use
memcpy() followed by setting the null terminator.

Similar comments on later code.
 
J

Jos De Laender

Keith said:
Hello everyone,

This is my first question to the newsgroup and I'm very new to
programming with C. My background is with Cisco routers, switches, and
firewalls. Over the years I've been collecting rather large syslog
files from firewalls and used several variations of "grep" and
Microsoft's "findstr" to pull out specific information from the
syslogs (layer 4 ports and IP addresses) and ignore all the other
information.

<large snip>

When it is the intention to solve this particular problem, you should
consider the right tooling.

I guess the right tooling is called Perl in this case.

It's very flexible in all kind of pattern matching , substituting ,
reporting etc.

When it's for learning C , take the advice already given here by others.

By the way , in my opinion , pure C is poor for this kind of task.

You should consider common extensions such as lexical analysis (
flex/lex) or grammar handlers (yacc/bison).
Those will generate code that is and faster and easier to maintain for
this kind of problems.

Regards,

Jos
 
K

Keith

I've got to hand it to the people of this newsgroup. You do excellent
work! Thanks for the advice in both replies. The problem was a
combination of what both replies stated. I misunderstood that setting
port[3] did not leave room for the terminating byte. I was under the
impression the [3] meant bytes 0,1,2,3 which would equal four. Now I
understand it means only 0,1,2. Since I made the same error with all
three variables, I've set them all to the correct size.

The other issue that was brought up in the replies has also been
fixed. I should have paid closer attention to the fact that strncpy
(or memcpy) does not bring over a terminating byte at the end of the
string I'm copying. I switched to memcpy to avoid the wasting of
resources that was mentioned about strncpy.

Thanks for the help in getting over this obstacle in my understanding
of working with strings in C. I have included the corrected code below
to compare to the program in my original post. I also added the code
to check to see if the fopen worked correctly (from the tutorial
example that came with the editor).

The only thing I haven't changed yet is the second buffer that was
mentioned. When I perform the debug it shows the line
"strcpy(buf1,&buf1[index1 + 1]);" as working the way I intended it to.
What I'm doing is just chopping off the text from the beginning of
buf1 up to the point where the index is pointing. I'm actually running
this command in three different sections of the code (with varying
index1 settings) so that the buffer ends up walking down the syslog
record while I pull out the needed information. I'm thinking that if I
use a second buffer (buf2) then I would probably need to toggle back
and forth between the two. It's just the debugs show the buffer
getting resized correctly so I think I'm ok there.

Thanks for the help,
Ron

/*------------------------------------------------------------------------
Module: c:\lcc\sysread\sysread.c
Author: Ron
Project: Sysread
State: Work in Progress
Creation Date: August 2003
Description: This program reads a file containing syslog
output from a PIX and writes the ports and
source IP's to a new file.
------------------------------------------------------------------------*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXLINELEN 200

int main()
{
FILE *infile,*outfile;
char buf1[MAXLINELEN], protocol[4], port[6], ipaddress[16];
int index1;

infile = fopen("pix1.txt","r");
if (infile == NULL) {
printf("File pix1.txt doesn't exist\n");
exit(1);
}
outfile = fopen("ports.txt","w");
while(fgets(buf1,MAXLINELEN,infile)) {
/*------------------------------------------------------------------
Determine if TCP or UDP.
------------------------------------------------------------------*/
if (strstr(buf1,"TCP") == NULL) {
strcpy(protocol,"UDP");
}
else {
strcpy(protocol,"TCP");
}
/*------------------------------------------------------------------
Determine the layer four port.
------------------------------------------------------------------*/
index1 = strcspn(buf1,"/");
strcpy(buf1,&buf1[index1 + 1]);
index1 = strcspn(buf1,"g");
memcpy(port,buf1,index1 - 1);
port[index1 - 1] = '\0';
/*------------------------------------------------------------------
Determine the source IP address.
------------------------------------------------------------------*/
index1 = strcspn(buf1,"l");
strcpy(buf1,&buf1[index1 + 6]);
index1 = strcspn(buf1,"/");
memcpy(ipaddress,buf1,index1);
ipaddress[index1] = '\0';
fprintf(outfile,"%s&%s&%s\n",protocol,port,ipaddress);
}
fclose(infile);
fclose(outfile);
return 0;
}
 
B

Ben Pfaff

The only thing I haven't changed yet is the second buffer that was
mentioned. When I perform the debug it shows the line
"strcpy(buf1,&buf1[index1 + 1]);" as working the way I intended it to.

It might work on your system, but it's non-portable. Copying
overlapping buffer with strcpy() invokes undefined behavior, and
you shouldn't do it. Use memmove() instead, or copy into a
different buffer.
 
R

Robert B. Clark

a file that has two records. What you will see from the program is
that I open a syslog file called "pix1.txt" with just the contents
listed below:

<166>%PIX-6-302001: Built outbound TCP connection 35700999 for faddr
209.195.178.164/21 gaddr 67.65.127.249/59371 laddr 10.231.3.76/2875
<166>%PIX-6-302005: Built UDP connection for faddr 200.23.1.1/13156
gaddr 67.65.127.26/53 laddr 10.231.200.250/53
variables and the way those functions use pointers. If it were working
correctly my newly created file called "ports.txt" would contain the

TCP&21&10.231.3.76
UDP&13156&10.231.200.250

However, what I'm getting is the following:

TCP&21&10.231.3.76
UDP&13156UDP&10.231.200.250

This just screams buffer overrun to me....
The first line looks fine but the second line has already starting
including some things that I didn't mean to include. It gets even
worse when the source file has more than two syslog records, and I'm
sure that's due to a snowballing effect of some sort.

Buffer overrun still looks good...
stuff I want to another file), there has to be something very
elementary that I'm missing with the way C handles strings and any
help to point me in the right direction would be greatly appreciated.

See comments below:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXLINELEN 200

int main()

int main(void)
{
FILE *infile,*outfile;
char buf1[MAXLINELEN], protocol[3], port[5], ipaddress[15];
int index1;

This should be a size_t, not an int. strcspn returns a size_t.
infile = fopen("pix1.txt","r");
outfile = fopen("ports.txt","w");
while(fgets(buf1,MAXLINELEN,infile)) {
/*------------------------------------------------------------------
Determine if TCP or UDP.
------------------------------------------------------------------*/
if (strstr(buf1,"TCP") == NULL) {
strcpy(protocol,"UDP");

This is probably your smoking gun.

You allocate 'protocol' three chars, but here you're writing four
('U','D','P',\0).

Don't forget that strings in C are terminated with a NUL character.

Your definition for 'protocol' should be

char protocol[4];

Better yet, #define this and the other constants as you've done with buf1:

#define PROT_LEN 4 /* length of protocol name (TCP or UDF) */

char protocol[PROT_LEN];

Self-documentation is a Good Thing.

You similarly allocate insufficient storage for port and ipaddress.
/*------------------------------------------------------------------
Determine the layer four port.
------------------------------------------------------------------*/
index1 = strcspn(buf1,"/");

What if the '/' character is not present? Can there be more than one '/'
in the record?
strcpy(buf1,&buf1[index1 + 1]);
index1 = strcspn(buf1,"g");
strncpy(port,buf1,index1 - 1);

<snip>

To my mind, strcspn is not the best tool for this job.

If we assume that the faddr, gaddr and laddr fields are always in the order
shown in your sample data file, we can use '/' as the field
delimiter/search key. Thus, each field has the form

[xaddr] [ip_address]/[port]

Let buf1 hold the following record (one line terminated with a newline):

<166>%PIX-6-302001: Built outbound TCP connection 35700999 for faddr
209.195.178.164/21 gaddr 67.65.127.249/59371 laddr 10.231.3.76/2875

Then,

char *p, *q;

/* Determine if record is TCP or UDF data */

if (strstr(buf1, "TCP") != NULL)
strcpy(protocol, "TCP");
else
strcpy(protocol, "UDF");


/* Get the layer four port number (faddr, field #1) */

if ((p = q = strchr(buf1, '/')) != NULL)
{
p++; /* ex. p => '2' */
while (!isspace(*q)) /* skip til space */
q++;

strncpy(port, p, q-p);
port[q-p] = '\0';
}

/* Get the source IP address (laddr, field #3) */
/* Skip to the third ip/port pair */

if ((p = strchr(p, '/')) != NULL) /* alt: p=strrchr(buf1,'/') */
p = strchr(p+1, '/');

if (p)
{
q = p-1;
while (!isspace(*p))
p--;

strncpy(ipaddress, p+1, q-p);
ipaddress[q-p] = '\0';
}

Hope this helps.
 
K

Keith

Ben Pfaff said:
The only thing I haven't changed yet is the second buffer that was
mentioned. When I perform the debug it shows the line
"strcpy(buf1,&buf1[index1 + 1]);" as working the way I intended it to.

It might work on your system, but it's non-portable. Copying
overlapping buffer with strcpy() invokes undefined behavior, and
you shouldn't do it. Use memmove() instead, or copy into a
different buffer.

I didn't think about the portable issue. Thanks. I'll look into the
memmove() function to fix the overlapping buffer issue.
 
K

Keith

char *p, *q;
/* Determine if record is TCP or UDF data */

if (strstr(buf1, "TCP") != NULL)
strcpy(protocol, "TCP");
else
strcpy(protocol, "UDF");


/* Get the layer four port number (faddr, field #1) */

if ((p = q = strchr(buf1, '/')) != NULL)
{
p++; /* ex. p => '2' */
while (!isspace(*q)) /* skip til space */
q++;

strncpy(port, p, q-p);
port[q-p] = '\0';
}

/* Get the source IP address (laddr, field #3) */
/* Skip to the third ip/port pair */

if ((p = strchr(p, '/')) != NULL) /* alt: p=strrchr(buf1,'/') */
p = strchr(p+1, '/');

if (p)
{
q = p-1;
while (!isspace(*p))
p--;

strncpy(ipaddress, p+1, q-p);
ipaddress[q-p] = '\0';
}

Hope this helps.

Thanks, Robert, for the sample code. That is definitely a better
approach than what I was using with the buffer overflow problem. On
your question about the records always containing "/" I do understand
what you are getting at. My current code is assuming the characters
I'm searching for will always be there. Normally a syslog server will
have a variety of records pertaining to several network devices (not
just the firewalls I'm interested in) and I have prepared the source
file with Microsoft's "findstr" program to only include the outgoing
firewall records before I run the C program on it. In the long run I'm
planning on avoiding the "findstr" preparation stage and just have the
C program search through all available syslog records distinguishing
which ones are firewall records. That's some error detection code that
I hope to get on in a few days.

Also, I know some of the other replies to my original post mentioned I
should look into scripting tools instead of C since what I'm doing is
overkill. I understand that but there are other things I'm wanting to
do with the data once I strip it out of the syslog records. My next
step will be to avoid collecting duplicate records (syslog databases
have tons of the same records since network applications tend to go to
the same place over and over again throughout the week) and from the
reading I've done so far it looks like I'll need to create a structure
and an array so that before a new record is collected I can run back
through the ones collected so far and check for a duplicate. After
that my third step will be to sort the records to get all the TCP FTP,
HTTP, etc. traffic together and all the UDP DNS, SNMP, etc. traffic
together.

I didn't mention the goal of deleting duplicates and sorting in my
original post so as not to muddy the water with the original string
issue I was having. So, I'm thinking that C would be the best way to
handle extracting records, looking for duplicates, and sorting the new
records before writing to a file. I don't think the scripting tools
would provide all that functionality, but I may be wrong.

Thanks,
Ron
 
J

Jos De Laender

Keith wrote:

I didn't mention the goal of deleting duplicates and sorting in my
original post so as not to muddy the water with the original string
issue I was having. So, I'm thinking that C would be the best way to
handle extracting records, looking for duplicates, and sorting the new
records before writing to a file. I don't think the scripting tools
would provide all that functionality, but I may be wrong.

I guess you are wrong indeed. Looks like a task where you really should
use something like perl.

On the other hand , it is a good exercise in teaching yourself C.
I'm only not sure where you will finally gain by doing it in C.
 
B

Bruno Desthuilliers

Keith said:
Ben Pfaff said:
(e-mail address removed) (Keith) writes:

The only thing I haven't changed yet is the second buffer that was
mentioned. When I perform the debug it shows the line
"strcpy(buf1,&buf1[index1 + 1]);" as working the way I intended it to.

It might work on your system, but it's non-portable. Copying
overlapping buffer with strcpy() invokes undefined behavior, and
you shouldn't do it. Use memmove() instead, or copy into a
different buffer.


I didn't think about the portable issue. Thanks. I'll look into the
memmove() function to fix the overlapping buffer issue.

Well, beside of portability issue, there's one word (well, two in fact)
in this post that should *always* ring an alarm bell when it comes to C
programming : *undefined behaviour*.

This means that your program could do absolutely anything - and even
appear to work, which is somewhat the worst case since you may think it
really work, until... oops :(

Bruno
 
R

Robert B. Clark

I didn't mention the goal of deleting duplicates and sorting in my
original post so as not to muddy the water with the original string
issue I was having. So, I'm thinking that C would be the best way to
handle extracting records, looking for duplicates, and sorting the new
records before writing to a file. I don't think the scripting tools
would provide all that functionality, but I may be wrong.

There may be some scripting tools that could handle the job--Perl is one
that comes to mind.

However, your problem is certainly well within the capabilities of C, and
is an excellent learning project. You will learn about data structures,
the string library, pointers, arrays and stream I/O.

So go ahead. Your problem is a nail*. You have a hammer. Have fun. :)

*reference:
If all you have is a hammer, every problem you encounter tends to look like
a nail.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,108
Latest member
AlbertEste
Top