T
thomasvangurp
Dear Fellow programmers,
I'm using Python scripts too organize some rather large datasets
describing DNA variation. Information is read, processed and written
too a file in a sequential order, like this
1+
1-
2+
2-
etc.. The files that i created contain positional information
(nucleotide position) and some other info, like this:
file 1+:
--------------------------------------------
1 73 0 1 0 0
1 76 1 0 0 0
1 77 0 1 0 0
--------------------------------------------
file 1-
--------------------------------------------
1 74 0 0 6 0
1 78 0 0 4 0
1 89 0 0 0 2
Now the trick is that i want this:
File 1+ AND File 1-
--------------------------------------------
1 73 0 1 0 0
1 74 0 0 6 0
1 76 1 0 0 0
1 77 0 1 0 0
1 78 0 0 4 0
1 89 0 0 0 2
-------------------------------------------
So the information should be sorted onto position. Right now I've
written some very complicated scripts that read a number of lines from
file 1- and 1+ and then combine this output. The problem is of course
that the running number of file 1- can be lower then 1+, resulting in
a incorrect order. Since both files are too large to input in a
dictionary at once (both are 100 MB+) I need some sort of a
alternative that can quickly sort everything without crashing my pc..
Your thoughts are appreciated..
Kind regards,
Thomas
I'm using Python scripts too organize some rather large datasets
describing DNA variation. Information is read, processed and written
too a file in a sequential order, like this
1+
1-
2+
2-
etc.. The files that i created contain positional information
(nucleotide position) and some other info, like this:
file 1+:
--------------------------------------------
1 73 0 1 0 0
1 76 1 0 0 0
1 77 0 1 0 0
--------------------------------------------
file 1-
--------------------------------------------
1 74 0 0 6 0
1 78 0 0 4 0
1 89 0 0 0 2
Now the trick is that i want this:
File 1+ AND File 1-
--------------------------------------------
1 73 0 1 0 0
1 74 0 0 6 0
1 76 1 0 0 0
1 77 0 1 0 0
1 78 0 0 4 0
1 89 0 0 0 2
-------------------------------------------
So the information should be sorted onto position. Right now I've
written some very complicated scripts that read a number of lines from
file 1- and 1+ and then combine this output. The problem is of course
that the running number of file 1- can be lower then 1+, resulting in
a incorrect order. Since both files are too large to input in a
dictionary at once (both are 100 MB+) I need some sort of a
alternative that can quickly sort everything without crashing my pc..
Your thoughts are appreciated..
Kind regards,
Thomas