re-organize original text data then write in files


J

Junhui Liao

Dear all,

Recently, I have to do this job.
Re-organize the original text data then write in files.
The original data is like this (tsv format).

First line: time_1.1 signal_1.1 time_2.1 signal_2.1 ...
time_4096.1 signal_4096.1 (total 4096 pairs).
Second line: time_1.2 signal_1.2 time_2.2 signal_2.2 ...
time_4096.2 signal_4096.2(total 4096 pairs).
.......
last line(totally 2048 lines): time_1.2048 signal_1.2048 time_2.2048
signal_2.2048 ... time_4096.2048 signal_4096.2048 (total 4096 pairs).

What shall I do is,

Step 0, all of the time_n.* should subtract to the time_n.1. That is to
say,
time_1.1, time_1.2, ... time_1.2048 should subtract time_1.1.
time_2.1, time_2.2, ... time_2.2048 should subtract time_2.1.
....
time_4096.1, time_4096.2, ... time_4096.2048 should subtract
time_4096.1.

Step 1, make all of the time_k.* and signal_k.* in each line collected
together and save in files, let's say, file_k.tsv .
Namely, all of the time_1.1 , signal_1.1, time_1.2, signal_1.2 ......
time_1.2048, signal_1.2048 should save in file_1.tsv. And the first line
is time_1.1 signal_1.1; the second line is time_1.2, signal_1.2 ......
the last line is time_1.2048, signal_1.2048.

All of the time_2.1 , signal_2.1, time_2.2, signal_2.2 ......
time_2.2048, signal_1.2048 should save in file_2.tsv. And the first line
is time_2.1 signal_2.1; the second line is time_2.2, signal_2.2 ......
the last line is time_2.2048, signal_2.2048.
......
All of the time_4096.1 , signal_4096.1, time_4096.2, signal_4096.2
...... time_4096.2048, signal_4096.2048 should save in file_4096.tsv.
And the first line is time_4096.1 signal_4096.1; the second line is
time_4096.2, signal_4096.2 ...... the last line is time_4096.2048,
signal_4096.2048.


Already, I developed a script in C++, but it cost around 3 hours to deal
with this job.
And I am totally new guy to ruby, perl, a little on Python.
So, my question is,

1, how many time it will be cost to do this job under ruby?
If the time less than one and a half hours, then it worth to study for
me. I was attracted by the beautiful ruby, already : ) .
2, Is there any similar example ?

Best regards !
Junhui
 
Ad

Advertisements

C

Colin Bartlett

[Note: parts of this message were removed to make it a legal post.]

...
Already, I developed a script in C++, but it cost around 3 hours to deal
with this job.
And I am totally new guy to ruby, perl, a little on Python.
So, my question is,

1, how many time it will be cost to do this job under ruby?
If the time less than one and a half hours, then it worth to study for
me. I was attracted by the beautiful ruby, already : ) .


I'm neither a Ruby expert nor an expert programmer, but I have been using
Ruby (for my own purposes) for over 8 years, and as a thought exercise I
tried this (not actually running anything), and it took me about 20 to 30
minutes, *provided* the computer memory is big enough to hold all the data.
(I couldn't think of an easy way to what I think you want to do without
reading in all the data first, modifying it, then writing it out. That, or
open 4096 files at the same time: neither way seems elegant.)

And if you can do that in C++ then I'm sure you can probably do it in Ruby,
Perl, Python, etc, etc. If you can program in C++ then I see no reason why
you wouldn't be able to program in Ruby, Perl, Python, etc. (It might look
like C++ rewritten in R, P, P, etc, but so what if you're trying things
out.)

Personally, if I didn't have much time, and I wanted to try something out in
another computer language, I'd go with a language that I knew a little
about, so in my case that would be Ruby, Pascal, Qbasic (!!!), and - in your
case - maybe try something quick in Python. (But I'd also encourage you to
look at Ruby sometime and try it.)

Maybe it partly depends on what standard methods/functions are available:
for example, in Ruby you can read a line from a file into a String, and then
use a builtin method on the String to split it into an array of values using
a specified delimiter, so in your case a space character? But I'd be very
surprised if there weren't similar builtins in Perl and Python.
 
J

Junhui Liao

(I couldn't think of an easy way to what I think you want to do without
reading in all the data first, modifying it, then writing it out. That,
or
open 4096 files at the same time: neither way seems elegant.)



Actually, I developed two versions of C++ script. One is opening 4096
files
at the same time. This cost 3 hours. Another version is saving all of
the
data in a big vector, then scanning the vector to pick the right items
to write
in files. This cost 2 hours and 45 minutes. :).

Personally, if I didn't have much time, and I wanted to try something
out in
another computer language, I'd go with a language that I knew a little
about, so in my case that would be Ruby, Pascal, Qbasic (!!!), and - in
your
case - maybe try something quick in Python. (But I'd also encourage you
to
look at Ruby sometime and try it.)


Thanks a lot for your encourage, I tried to read something on ruby
already.
Since this language is very simple and beautiful, no matter it works for
my case
or not(But I hope it could be).


Maybe it partly depends on what standard methods/functions are
available:
for example, in Ruby you can read a line from a file into a String, and
then
use a builtin method on the String to split it into an array of values
using
a specified delimiter, so in your case a space character?




I need this kind of comment seriously, saying, what are the knowledge
which is necessary and enough to do my job. If there are some special
and
powerful methods or stances to do this kind of stuff.
Or be better, give a example just very close my case. I can get the
detailed
by reading book(s) or googling.

Anyway, thanks a lot for your reply!
Best !
Junhui
 
Ad

Advertisements

C

Colin Bartlett

I'm putting this at the top of my post because I think the basic problem
here may be intensive numeric calculations, and - even more so - disk (inpu=
t
and) output of about 16 MiB x N bytes of data, where N is 8 bytes (? for
Floating point numbers), so about 128 MiB in total, and other people will
have a better knowledge of some possibly useful links.

Actually, I developed two versions of C++ script.
One is opening 4096 files at the same time. This cost 3 hours.
Another version is saving all of the data in a big vector,
then scanning the vector to pick the right items to write
in files. This cost 2 hours and 45 minutes. :).
Sorry - in my post I misunderstood what you meant by "cost". I think it is
(very?) unlikely that any Ruby (or Perl or Python, etc?) program will run
faster than your C++ scripts. Where Ruby (or Python - I'm not so sure about
Perl, I haven't used it) does have an advantage is that I think development
may be quicker. So there are trade-offs. (Incidentally, I'm not an expert,
but those timings suggest to me that the major processing cost may be in
writing the results out to disk, so changing the language for all or part o=
f
the processing is unlikely to make a large difference?)

But I'm open to correction: there are people who have used Ruby for fairly
intensive large data sets processing, but my understanding is that they use
a mixture of Ruby as "glue" with any intensive calculations in C, etc. For
example, from some limited experience I have the speed of Ruby reading
strings of bytes in from files is similar to the speed of Java or compiled
Pascal, but for calculating CRCs of files the speed of pure Ruby calculatin=
g
the CRCs once the bytes had read in was much slower than Java or compiled
Pascal: so I used Ruby (or rather JRuby) to read in the strings of bytes
from the files, and then called Java code from Ruby to calculate the CRC
from the bytes. Overall the speed of this was similar to a pure Java or pur=
e
compiled Pascal program.

Piet Hut and Jun Makino have been using Ruby to model dense star clusters.
(Note that this is something I know nothing about! I'm just intrigued by th=
e
underlying principle of using Ruby for intensive numerical calculations by
developing in Ruby without worrying about speed by using smaller unrealisti=
c
models, and then using more realistic models by translating part (or all!)
of the Ruby code to a faster language.)

http://www.kira.org/index.php?option=3Dcom_content&task=3Dview&id=3D124&Ite=
mid=3D154
...MODEST is the new name for the Stellar Dynamics workshop. It stands for:
MOdeling DEnse STellar systems
...
The basic idea is to start a kind of N-body wikipedia, as a group's process=
 

Top