delete based on duplicate field

S

Shiraz

(i have a file with the format below:
AA,BB,CC
AA,BB,FF
AA,BB,RF
AA,QQ,VV


and after processing i need to have in it


AA,QQ,WW


throw out all the duplicates and leave none in the system


bwlow is my code: please help!!!!!!!!!!


open (FH, "<rates/temp2.csv") or die ("No input file\n");
($memLine1, $memLine2, $memLine3, $lineCount)=("","","",0);
$lineCount = 2;
while ($record11 = <FH>)
{
$lineCount=$lineCount+1;
if ($lineCount == "1") {$memLine1 = $record11; print
("xxxxxx1xxxxxx");}
if ($lineCount == "2") {$memLine2 = $record11; print
("xxxxxx2xxxxxx");}
if ($lineCount == "3") {$memLine3 = $record11; print
("xxxxxx3xxxxxx");}
if ($lineCount == 3)
{
if ( ($memLine2 != $memLine1) || ($memLine2 != $memLine3))
{
print ("Debug1 $memLine1\n Debug2 $memLin2\n Debug3 $memLine3
\n");
# print ("$memLine2");
$memLine1 = $memLine2;
$memLine2 = $memLine3;
}
$lineCount = 2;
}


}
 
M

Mark Clements

(i have a file with the format below:
AA,BB,CC
AA,BB,FF
AA,BB,RF
AA,QQ,VV


and after processing i need to have in it


AA,QQ,WW


throw out all the duplicates and leave none in the system
<snip code>
Your requirements are very badly defined. What constitutes a duplicate? As
far as I can tell, none of the above lines are duplicated.

You probably want to look at putting the data into a hash before writing it
out again, though this may not be practical for very large files.

perldoc perldata

Mark
 
P

Paul Lalli

Shiraz said:
(i have a file with the format below:
AA,BB,CC
AA,BB,FF
AA,BB,RF
AA,QQ,VV

and after processing i need to have in it
AA,QQ,WW

Where did 'WW' come from? That line isn't in your sample data.
throw out all the duplicates and leave none in the system

None of the lines in your sample data appear to be 'duplicates' to me.
What definition of 'duplicate' are you using? Do you mean any lines
whose second field is duplicated by other lines?
bwlow is my code: please help!!!!!!!!!!

Talking like you're using IRC or AIM when posting to Usenet rarely
encourages people to help you.
open (FH, "<rates/temp2.csv") or die ("No input file\n");
($memLine1, $memLine2, $memLine3, $lineCount)=("","","",0);

You appear to not be using 'strict'. Please read the posting
guidelines for this newsgroup (posted twice per week). Moreover,
explicit declarations to empty values are rarely necessary in perl.
$lineCount = 2;

What is this variable supposed to represent? Are you saying you've
already got two lines before processing any data?
while ($record11 = <FH>)
{
$lineCount=$lineCount+1;
if ($lineCount == "1")

You are comparing a string using the numeric comparison operator. 'use
warnings' would have told you this. Again, please read the posting
guidelines for this group. Change this to either:
if ($lineCount == 1)
or
if ($lineCount eq "1")
{$memLine1 = $record11; print
("xxxxxx1xxxxxx");}
if ($lineCount == "2") {$memLine2 = $record11; print
("xxxxxx2xxxxxx");}
if ($lineCount == "3") {$memLine3 = $record11; print
("xxxxxx3xxxxxx");}

I'm guessing all these xxx lines are debugging statements?
if ($lineCount == 3)

So this will happen the first time through your loop, since you started
$lineCount off at 2, and immediately incremented it. Is that what you
wanted?
{
if ( ($memLine2 != $memLine1) || ($memLine2 != $memLine3))

You are again comparing strings in numeric context. ne and != are very
different operators. Please read `perldoc perlop`
{
print ("Debug1 $memLine1\n Debug2 $memLin2\n Debug3 $memLine3
\n");
# print ("$memLine2");
$memLine1 = $memLine2;
$memLine2 = $memLine3;
}
$lineCount = 2;

Really not understanding this logic...

You need to do three things to help us help you:
1) Better define your goal, with more accurate sample data
2) Explain your algorithm and/or document your sample code, because
it's rather unintelligable as is.
3) Read the posting guidelines for this group, and follow all the
advice contained therein (Pay special attention to the bits about
posting a *short but complete* script which includes your sample data
in the __DATA__ section)

Paul Lalli
 
J

Jürgen Exner

Shiraz wrote:

Your example
(i have a file with the format below:
AA,BB,CC
AA,BB,FF
AA,BB,RF
AA,QQ,VV
and after processing i need to have in it
AA,QQ,WW

doesn't match your description
throw out all the duplicates and leave none in the system

AA seems to appear 4 time, why isn't is considered a duplicate?
CC, FF, RF, and VV appear only one time, why are they thrown out as
duplicates?
Where does WW come from, it doesn't seem to be part of your original data?

If you are talking about duplicated lines rather then duplicated items then
your example still doesn't make sense as none of the lines is duplicated.

Bottom line: what is your definition of "duplicated"?

Anyway, "I don't know what your original problem was but I suggest to use a
hash" (quote stolen from someone).

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top