Reduce duplication

T

Tcc

Hi All,



Assume there are some data in "a.txt":

e.g.

ABC

DEF<--------duplicate

GHI

DEF<--------duplicate

JKL

MNO

PQR

STU<--------duplicate

STU<--------duplicate

STU<--------duplicate

VWX

YZA

XYZ

XYZ

CAD

YZA

KLS

GAE

PQR

GAE

ABC

SAC

MTF



I would like to reduce the duplication of data so that to form:

e.g.

ABC

DEF

GHI

JKL

MNO

PQR

STU

VWX

YZA

XYZ

CAD

KLS

GAE

SAC

MTF



Here are my codes, but seems they aren't work:



#include <stdio.h>
#include <stdlib.h>
#include <string.h>



struct H {

char name[5];

int l;

};



H r[50];



int main() {

char a[5];

int count =0;

FILE *fin = fopen("a.txt", "r");



for(i=0; !feof(fin); i++) {


//Set as Null
strcpy(r.name, "N");
fscanf(f, "%s", a);
for(i=0; i<50 ; i++) {


// Check for null
if(strcmp(r.name, "N") == 0) {
strcpy(r.name, a);
i=50;
break;
} else {
for(i=0; i<50 ; i++) {
if(strcmp(r.name, a) == 0) {
count++;
i=50+i;
} else {
for(i=0; i<50 ; i++) {
if(strcmp(r[i+1].name, a) == 0) {
count++;
i=50+i;
} else {
i++;
strcpy(r[i+count].name, a);
}
}
}
}
}
}
}





How should I improve those so that to generate data with no duplication?

Please help.



THANKS ALL!
 
S

S.Tobias

Tcc said:
Assume there are some data in "a.txt":
e.g.
[snip]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct H {
char name[5];
int l;
};
struct H r[50];

int main() {
char a[5];
int count =0;
FILE *fin = fopen("a.txt", "r");
for(i=0; !feof(fin); i++) {
You didn't declare `i'. Did this compile?
feof() will tell you the truth after read attempt, not before (see below).
//Set as Null
strcpy(r.name, "N");
fscanf(f, "%s", a);

Here you might check:
if (feof(fin))
break;
for(i=0; i<50 ; i++) {
Here (and below) you use `i' again for indexing. This will overwrite
the `i' from the outer loop. I don't think you want this, you have
to use another variable.
// Check for null
if(strcmp(r.name, "N") == 0) {

Here you check if you've reached current element in the output table.
Will work until a.txt actually contains the word "N".
strcpy(r.name, a);
i=50;
break;

If you break from the current loop, you don't have to set the index
to the end value.

The rest is just plain nonsense.
} else {
for(i=0; i<50 ; i++) {
if(strcmp(r.name, a) == 0) {
count++;
i=50+i;
} else {
for(i=0; i<50 ; i++) {
if(strcmp(r[i+1].name, a) == 0) {
count++;
i=50+i;
} else {
i++;
strcpy(r[i+count].name, a);
}
}
}
}
}
}
}
How should I improve those so that to generate data with no duplication?

Rethink, redesign, start all over. Best on paper first.
What you need is (pseudo-code):

#define MAX 50
char words[MAX][5]
count=0
while read(word) && count < MAX
/* check if we have it */
found = false
for i=0; i<count; ++i
if word == words
found = true
break; //not inevitable, makes program faster
/* if word is new, add it */
if !found
words[count++] = word
 
B

Ben Pfaff

Tcc said:
I would like to reduce the duplication of data so that to form:

If you're trying to learn how to write C code, be my guest.
However, you should be aware that there are lots of existing
tools to do what you want already. For example, under Unix the
command
sort < x | uniq > y
is probably what you want, although it has the extra side effect
of sorting the output.
 
G

g r

Tcc said:
Hi All,

I would like to reduce the duplication of data so that to form:
It is better to do it with Perl, AWK, from within vi editor by substitution,
or from UNIX/LINUX command line (sort first) etc.

In C, one way (little bit clumsy, but works) is this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


typedef struct lNode
{
char val[5];
struct lNode *next;
} lNode;

lNode *append (lNode *p, char val[5])
{
lNode *last = p;
while (p)
{
last = p;
if (!strcmp(p->val,val))
return NULL;
p = p->next;
}
if (last)
{
last->next = (lNode *)malloc(sizeof(lNode));
strcpy(last->next->val,val);
last->next->next = NULL;
last = last->next;
}
else
{
// First node
last = (lNode *)malloc(sizeof(lNode));
strcpy(last->val,val);
last->next = NULL;
}
return last;
}

void list_print(lNode *list)
{
lNode *node;
node = list;
while (node)
{
printf("%s",node->val);
node = node->next;
}
printf("\n");
}


int main (void)
{
lNode *head = NULL;
lNode *node, *p;

FILE *fin = fopen("a.txt", "r");
FILE *fout;
char line[5];
int i;

// Make a linked list with words
for (i = 0; (i < 50)&&!feof(fin); i++)
{
if(fgets(line, 5, fin) != NULL)
node = append (head, line);
if (i == 0) head = node;
}
// Output the result
printf("The list:\n");
list_print(head);
fclose(fin);

// Put it back to txt file
fout = fopen("a.txt", "w");
p = head;
while(p != NULL)
{
fputs(p->val,fin);
p = p->next;
}
fclose(fout);

return 0;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top