Help: Process data

A

Amy Lee

Hello,

I use perl to parse something I need to further process. Here's the data.

1306 0
7539 0
7595 0
7626 0
7671 0
7705 0
7835 0
7903 0
7906 1
7941 0
7952 0
7956 0
7977 0
7988 0
8030 1
8091 0
8122 0
8180 0
8259 1
8413 1
8546 0
8653 0
8689 0
8709 0
8761 1
8766 1
8793 0
8825 0
8845 0
9046 0
9080 0
9104 0
9148 0
9183 0
9220 0
9337 0
9361 0
9363 0
9500 0
9518 0
9805 0
10644 1
10775 1
11121 0
11241 1
11268 0
11600 1
11609 0
11628 0
11642 0
11655 1
11666 0
11825 0
12088 0
12154 0
12165 0
12244 0
12261 0
12281 0
12308 1
12327 0
12404 0
12477 0
12744 1
12768 0
12878 0
12936 0
12954 0
12995 0
13061 0
13096 0
13192 0
13257 0
13274 0
13349 1
13388 0
13437 0
13453 0
13587 1
13628 0
13655 0
13927 0
13938 0
13968 0
13998 0
14008 0
14078 0
14114 0
14117 0
14142 0
14156 0
14330 1
14342 0
14474 0
14562 0
14601 0
14652 0
14661 0
14704 0
14838 0
14888 1
15043 0
15046 0
15138 1
15155 0
15488 0
15628 0
15652 0
15684 1
15837 0
15843 0
16006 0
16239 0
16257 0
16338 0
16466 0
16504 0
16647 0
16838 1
16885 0
16908 1
16939 1
17112 0
17134 0
17159 0
17171 1
17273 0
17380 0
17390 0
17401 0
17591 1
17886 1
17943 0
18014 0
18128 0
18156 0
18208 0
18360 1
18505 0
18705 1
18742 0
18765 0
19127 0
19379 0
19412 0
19652 0
19843 1
20042 0
20152 0
20169 0
20185 1
20350 0
20390 0
20396 0
20462 1
27431 1
27620 1
57044 0
229054 1
272925 1
292029 1
331301 1
331383 0
350184 1
351519 1
351737 1
352558 1
354488 1
356501 1
357387 1
359564 1
360429 1
360731 1
361239 1
363227 1
364226 1
364438 1

And what I want to do is count how many continuous 0 and 1 present (the
second column). I have set up a minimal value such 5. All of them who has
less than 5 continuous 0 and 1 will ommit and larger than or equal to 5 I
will save as result.

Could you show me some ideas to do this?

Thank you very much.

Best Regards,

Amy Lee
 
T

Tad J McClellan

Amy Lee said:
And what I want to do is count how many continuous 0 and 1 present (the
second column). I have set up a minimal value such 5. All of them who has
less than 5 continuous 0 and 1 will ommit and larger than or equal to 5 I
will save as result.

Could you show me some ideas to do this?


my @buffer;
my $value = '666';
while ( <DATA> ) {
my(undef, $this) = split;
if ( $this == $value )
{ push @buffer, $_ }
else {
print @buffer, "\n" if @buffer >= 5;
$value = $this;
@buffer = ();
}

}
 
A

A. Sinan Unur

And what I want to do is count how many continuous 0 and 1 present
(the second column). I have set up a minimal value such 5. All of them
who has less than 5 continuous 0 and 1 will ommit and larger than or
equal to 5 I will save as result.

Could you show me some ideas to do this?

Use proper statistics software to analyze runs. Other than that, doing
this in Perl is not hard but you should actually try to do it instead of
asking others to do it for you. You have been here long enough to know how
to ask a question.

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
A

Amy Lee

my @buffer;
my $value = '666';
while ( <DATA> ) {
my(undef, $this) = split;
if ( $this == $value )
{ push @buffer, $_ }
else {
print @buffer, "\n" if @buffer >= 5;
$value = $this;
@buffer = ();
}

}
Dear sir,

Thank you very much. Anyway, I find that my data contains something like

18163447 0
18163544 1
18166485 0
18166503 0
18166850 0
18166940 0
18167197 0
18167775 +
18168287 0
18168474 0
18168621 1
18168869 1
18170273 1
18170629 1
18171082 1
18171510 +
18171767 0
18173494 1
18173873 1
... ...
So that means if it reads "+" signal then will stop counting but start
count from the next cycle. For example.

18166485 0 #1
18166503 0 #2
18166850 0 #3
18166940 0 #4
18167197 0 #5, it's okay
18167775 + #no counting
18168287 0 #1
18168474 0 #2

So could you tell me how to do this?

Thank you very much.

Amy
 
A

Amy Lee

my @buffer;
my $value = '666';
while ( <DATA> ) {
my(undef, $this) = split;
if ( $this == $value )
{ push @buffer, $_ }
else {
print @buffer, "\n" if @buffer >= 5;
$value = $this;
@buffer = ();
}

}
I have a little problem, could you tell me why do you use such block?

my(undef, $this) = split;
if ( $this == $value )
{ push @buffer, $_ }

I cannot read understand. Thank you.

Amy
 
T

Tim Greer

Amy said:
I have a little problem, could you tell me why do you use such block?

my(undef, $this) = split;

The above will split (on whitespace, by default), with the $this
variable being assigned the value of 0 or 1 -- given your example (the
second field in the line, is either 1 or 0, and since the conditional
isn't checking or caring what the first field is, the first field is
just undefined and not assigned anything).

If the value of $this is equal to (==) $value (whatever $value you set
to), then it will...

Push (add) $_ (the string value for that line) to the @buffer array.
 
T

Tad J McClellan

Tad J McClellan said:
my @buffer;
my $value = '666';
while ( <DATA> ) {
my(undef, $this) = split;
if ( $this == $value )
{ push @buffer, $_ }
else {
print @buffer, "\n" if @buffer >= 5;
$value = $this;
@buffer = ();


Oops! My code has an off-by-one error. The line above should instead be:

@buffer = ($_);
 
T

Todd

Amy said:
Hello,

And what I want to do is count how many continuous 0 and 1 present (the
second column). I have set up a minimal value such 5. All of them who has
less than 5 continuous 0 and 1 will ommit and larger than or equal to 5 I
will save as result.

Could you show me some ideas to do this?

I'm intereted in this question, below is a simple solution given
threshold as 3:
cat b.txt
18705 1
18742 0
18765 0
19127 0
19379 0
19412 0
19843 1
20042 0
229054 1
272925 1
292029 1
331301 1
331383 0
350184 1

# step 1: split it paragraphs
cat b.txt | perl -lp0e 's/(?<=0\n)(?=.*1$)|(?<=1\n)(?=.*0$)/\n/mg'
18705 1

18742 0
18765 0
19127 0
19379 0
19412 0

19843 1

20042 0

229054 1
272925 1
292029 1
331301 1

331383 0

350184 1

# step 2: grep the paragraphs with > 3 lines
cat b.txt | perl -lp0e 's/(?<=0\n)(?=.*1$)|(?<=1\n)(?=.*0$)/\n/
mg' | perl -ln00e 'tr{\n}{\n}+1>3 and print'
18742 0
18765 0
19127 0
19379 0
19412 0
229054 1
272925 1
292029 1
331301 1

-Todd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,188
Latest member
Crypto TaxSoftware

Latest Threads

Top