A
Amy Lee
Hello,
I wrote a script to check the location. Here's my codes.
#!/usr/bin/perl -w
use warnings;
use strict;
my $location = $ARGV[0];
open my $LOCATION, '<', "$location";
my @stack;
while (<$LOCATION>)
{
chomp;
my $raw_seq_id = (split /\s+/)[0];
my $mature_start = (split /\s+/)[1];
my $mature_end = (split /\s+/)[2];
my $mature_id = (split /_/, $raw_seq_id)[3];
my $i;
my $num = 0;
open my $CT, '<', "$raw_seq_id"."\.ct";
while (<$CT>)
{
if ((?dG? ... /dG/) && (!/dG/))
{
push @stack, (split /\s+/)[4];
}
}
for ($i = $mature_start - 1;$i < $mature_end;$i++)
{
if ($stack[$i] == 0)
{
$num = $num + 1;
}
else
{
next;
}
}
if ($num <= 2)
{
print "$raw_seq_id $num\n";
}
else
{
next;
}
}
And this script will read the data, the first column is the sequence tag,
the second column is start position, the third column is end position.
Here's the sample data.
Gm17_37759439_37759542_ath-MIR156c 12 31
.... ... ... ...
Then, read the first column, there's a file called
Gm17_37759439_37759542_ath-MIR156c.ct,
Here's the CT format data.
1 U 0 2 0 1 0 0
2 G 1 3 0 2 0 0
3 G 2 4 0 3 0 4
4 A 3 5 102 4 3 5
5 C 4 6 101 5 4 6
6 A 5 7 100 6 5 0
7 G 6 8 97 7 0 8
8 A 7 9 96 8 7 9
9 A 8 10 95 9 8 10
10 A 9 11 94 10 9 0
11 U 10 12 0 11 0 0
12 U 11 13 0 12 0 0
13 G 12 14 91 13 0 14
14 A 13 15 90 14 13 15
15 C 14 16 89 15 14 16
16 A 15 17 88 16 15 17
17 G 16 18 0 17 16 18
18 A 17 19 85 18 17 19
19 A 18 20 0 19 18 20
20 G 19 21 83 20 19 21
21 A 20 22 82 21 20 22
22 G 21 23 0 22 21 23
23 A 22 24 79 23 22 24
24 G 23 25 78 24 23 25
25 U 24 26 0 25 24 26
26 G 25 27 76 26 25 27
27 A 26 28 0 27 26 28
28 G 27 29 74 28 27 29
29 C 28 30 73 29 28 30
30 A 29 31 72 30 29 31
31 C 30 32 0 31 30 32
32 A 31 33 0 32 31 0
33 A 32 34 0 33 0 0
34 A 33 35 68 34 0 35
35 G 34 36 67 35 34 36
36 A 35 37 66 36 35 37
37 G 36 38 65 37 36 38
38 G 37 39 64 38 37 39
39 C 38 40 63 39 38 40
40 A 39 41 62 40 39 0
41 C 40 42 0 41 0 0
42 U 41 43 0 42 0 0
43 U 42 44 60 43 0 44
44 G 43 45 59 44 43 45
45 A 44 46 58 45 44 46
46 U 45 47 57 46 45 47
47 A 46 48 56 47 46 48
48 U 47 49 55 48 47 49
49 A 48 50 54 49 48 0
50 A 49 51 0 50 0 0
51 A 50 52 0 51 0 0
52 U 51 53 0 52 0 0
53 C 52 54 0 53 0 0
54 U 53 55 49 54 0 55
55 A 54 56 48 55 54 56
56 U 55 57 47 56 55 57
57 A 56 58 46 57 56 58
58 U 57 59 45 58 57 59
59 C 58 60 44 59 58 60
60 A 59 61 43 60 59 0
61 C 60 62 0 61 0 0
62 U 61 63 40 62 0 63
63 G 62 64 39 63 62 64
According to the CT file, the fifth column is what I will process, read
the start position to the end position from sample data, then get the same
position from the fifth column. If at this range(start - end) contains
less than 2 zeros, will print the proper sequences tag and the number of
zero.
However, when I run my script, just print sequences tag and the number 2.
Actually, my code should print nothing because this isn't fit in the
condition I write.
So could anyone show me the hints? Thank you very much.
Best Regards,
Amy Lee
I wrote a script to check the location. Here's my codes.
#!/usr/bin/perl -w
use warnings;
use strict;
my $location = $ARGV[0];
open my $LOCATION, '<', "$location";
my @stack;
while (<$LOCATION>)
{
chomp;
my $raw_seq_id = (split /\s+/)[0];
my $mature_start = (split /\s+/)[1];
my $mature_end = (split /\s+/)[2];
my $mature_id = (split /_/, $raw_seq_id)[3];
my $i;
my $num = 0;
open my $CT, '<', "$raw_seq_id"."\.ct";
while (<$CT>)
{
if ((?dG? ... /dG/) && (!/dG/))
{
push @stack, (split /\s+/)[4];
}
}
for ($i = $mature_start - 1;$i < $mature_end;$i++)
{
if ($stack[$i] == 0)
{
$num = $num + 1;
}
else
{
next;
}
}
if ($num <= 2)
{
print "$raw_seq_id $num\n";
}
else
{
next;
}
}
And this script will read the data, the first column is the sequence tag,
the second column is start position, the third column is end position.
Here's the sample data.
Gm17_37759439_37759542_ath-MIR156c 12 31
.... ... ... ...
Then, read the first column, there's a file called
Gm17_37759439_37759542_ath-MIR156c.ct,
Here's the CT format data.
1 U 0 2 0 1 0 0
2 G 1 3 0 2 0 0
3 G 2 4 0 3 0 4
4 A 3 5 102 4 3 5
5 C 4 6 101 5 4 6
6 A 5 7 100 6 5 0
7 G 6 8 97 7 0 8
8 A 7 9 96 8 7 9
9 A 8 10 95 9 8 10
10 A 9 11 94 10 9 0
11 U 10 12 0 11 0 0
12 U 11 13 0 12 0 0
13 G 12 14 91 13 0 14
14 A 13 15 90 14 13 15
15 C 14 16 89 15 14 16
16 A 15 17 88 16 15 17
17 G 16 18 0 17 16 18
18 A 17 19 85 18 17 19
19 A 18 20 0 19 18 20
20 G 19 21 83 20 19 21
21 A 20 22 82 21 20 22
22 G 21 23 0 22 21 23
23 A 22 24 79 23 22 24
24 G 23 25 78 24 23 25
25 U 24 26 0 25 24 26
26 G 25 27 76 26 25 27
27 A 26 28 0 27 26 28
28 G 27 29 74 28 27 29
29 C 28 30 73 29 28 30
30 A 29 31 72 30 29 31
31 C 30 32 0 31 30 32
32 A 31 33 0 32 31 0
33 A 32 34 0 33 0 0
34 A 33 35 68 34 0 35
35 G 34 36 67 35 34 36
36 A 35 37 66 36 35 37
37 G 36 38 65 37 36 38
38 G 37 39 64 38 37 39
39 C 38 40 63 39 38 40
40 A 39 41 62 40 39 0
41 C 40 42 0 41 0 0
42 U 41 43 0 42 0 0
43 U 42 44 60 43 0 44
44 G 43 45 59 44 43 45
45 A 44 46 58 45 44 46
46 U 45 47 57 46 45 47
47 A 46 48 56 47 46 48
48 U 47 49 55 48 47 49
49 A 48 50 54 49 48 0
50 A 49 51 0 50 0 0
51 A 50 52 0 51 0 0
52 U 51 53 0 52 0 0
53 C 52 54 0 53 0 0
54 U 53 55 49 54 0 55
55 A 54 56 48 55 54 56
56 U 55 57 47 56 55 57
57 A 56 58 46 57 56 58
58 U 57 59 45 58 57 59
59 C 58 60 44 59 58 60
60 A 59 61 43 60 59 0
61 C 60 62 0 61 0 0
62 U 61 63 40 62 0 63
63 G 62 64 39 63 62 64
According to the CT file, the fifth column is what I will process, read
the start position to the end position from sample data, then get the same
position from the fifth column. If at this range(start - end) contains
less than 2 zeros, will print the proper sequences tag and the number of
zero.
However, when I run my script, just print sequences tag and the number 2.
Actually, my code should print nothing because this isn't fit in the
condition I write.
So could anyone show me the hints? Thank you very much.
Best Regards,
Amy Lee