read write integer in binary into a file

Vianney Lecroart · Oct 25, 2007

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

Park Heesob · Oct 25, 2007

SGksDQotLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tIA0KRnJvbTogIlZpYW5uZXkgTGVjcm9h
cnQiIDxhY2VtdHBAZ21haWwuY29tPg0KTmV3c2dyb3VwczogY29tcC5sYW5nLnJ1YnkNClRvOiAi
cnVieS10YWxrIE1MIiA8cnVieS10YWxrQHJ1YnktbGFuZy5vcmc+DQpTZW50OiBUaHVyc2RheSwg
T2N0b2JlciAyNSwgMjAwNyAxMTozNiBQTQ0KU3ViamVjdDogcmVhZCB3cml0ZSBpbnRlZ2VyIGlu
IGJpbmFyeSBpbnRvIGEgZmlsZQ0KDQoNCj4gSGVsbG8sDQo+IA0KPiBJIGhhdmUgc29tZSBiaWcg
ZmlsZXMgd2l0aCBsb3Qgb2YgInVuc2lnbmVkIGludCIgKDQgYnl0ZXMpIG51bWJlcnMgYW5kIEkN
Cj4gd2FudCB0byByZWFkIGFuZCB3cml0ZSBvbiB0aGVzZSBmaWxlcy4NCj4gDQo+IEN1cnJlbnRs
eSwgSSBmb3VuZCB0aGlzIHRvIHdyaXRlOg0KPiANCj4gbXlmaWxlIDw8IFtteW51bV0ucGFjaygi
aSIpDQo+IA0KPiBhbmQgdG8gcmVhZDoNCj4gDQo+IG15bnVtID0gbXlmaWxlLnJlYWQoNCkudW5w
YWNrKCJpIikuZmlyc3QNCj4gDQo+IEkgd29uZGVyIGlmIHRoZXJlJ3Mgbm90IHNvbWV0aGluZyBm
YXN0ZXIvc2ltcGxlciB0byBkbyB0aGF0IHdpdGhvdXQgdGhlDQo+IG5lZWQgdG8gY29udmVydCB0
aGUgbnVtYmVyIGludG8gYW4gYXJyYXkgaW50byBhIHN0cmluZyB0byBmaW5hbGx5DQo+IHNlcmlh
bGl6ZSBpdC4NCj4gDQo+IFRoYW5rIHlvdS4NCg0KSG93IGFib3V0IE1hcnNoYWw/DQoNCiBteWZp
bGUgPDwgTWFyc2hhbC5kdW1wKG15bnVtKQ0KIA0KYW5kDQoNCiBteW51bSA9IE1hcnNoYWwubG9h
ZChteWZpbGUucmVhZCkNCiANClJlZ2FyZHMsDQoNClBhcmsgSGVlc29i

Vianney Lecroart · Oct 25, 2007

How about Marshal?

Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.

Michael Linfield · Oct 25, 2007

Vianney said:
Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.

What file format? I dont see any problem with using Marshal, it doesnt
need a file format specified its simply just a marshal dump.

Vianney Lecroart · Oct 25, 2007

It seems that the marshaling of a number doesn't give a 4 bytes:

irb(main):036:0> mynum
=> 56515
irb(main):037:0> [mynum].pack("i")
=> "\303\334\000\000"
irb(main):038:0> Marshal.dump(mynum)
=> "\004\bi\002\303\334"

yermej · Oct 25, 2007

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

Do you have to deal with each number individually? Maybe you could
build up an array of numbers and then pack them all at once:

arr = []
while work_to_do do
mynum = generate_next_number
arr << mynum
end
myfile.write arr.pack('i*')

That way you aren't creating a new array for each number.

Similarly, for reading the file:
data = file.read
num_array = data.unpack('i*')

The '*' in (un)pack means to process the rest of the data in the same
way.

Adam Preble · Oct 25, 2007

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
str = ' '
str[3] = n >> 24
str[2] = n >> 16
str[1] = n >> 8
str[0] = n
str
end

Here are the benchmark results vs the other methods mentioned:

user system total real
[].pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
x.report('[].pack(i):') { n.times do; [number].pack('i'); end }
x.report('pack_int32:') { n.times do; pack_int32(number); end }
x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Adam

Phrogz · Oct 25, 2007

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
str = ' '
str[3] = n >> 24
str[2] = n >> 16
str[1] = n >> 8
str[0] = n
str
end

Here are the benchmark results vs the other methods mentioned:

user system total real
[].pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
x.report('[].pack(i):') { n.times do; [number].pack('i'); end }
x.report('pack_int32:') { n.times do; pack_int32(number); end }
x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Using only the number 2_000_000 seems to skew the results. I see your
results with your test, but if I change it slightly to use a variety
of integers, I get more balanced results:

require 'benchmark'
MAX = 2**30
n = 1_000_000
nums = (0..n).map{ (rand*MAX).to_i }

Benchmark.bmbm do |x|
x.report('pack(i):') { nums.each{ |num| [num].pack('i') } }
x.report('pack32:') { nums.each{ |num| pack_int32(num) } }
x.report('Dump:') { nums.each{ |num| Marshal.dump(num) } }
end

Rehearsal --------------------------------------------
pack(i): 5.813000 0.109000 5.922000 ( 5.984000)
pack32: 5.234000 0.000000 5.234000 ( 5.281000)
Dump: 5.906000 0.125000 6.031000 ( 6.063000)
---------------------------------- total: 17.187000sec

user system total real
pack(i): 5.687000 0.125000 5.812000 ( 5.875000)
pack32: 5.141000 0.016000 5.157000 ( 5.188000)
Dump: 6.000000 0.078000 6.078000 ( 6.141000)

Wu Junchen · Dec 13, 2007

Vianney said:
Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.

irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!

Tim Hunter · Dec 13, 2007

Wu said:
irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!

irb

irb(main):001:0> x = [720850].pack('i')
=> "\322\377\n\000"
irb(main):002:0> x.length
=> 4

So clearly the integer 720850 is packed into 4 bytes as requested. Why
does it occupy 5 bytes in the file? But see the "\n" in position 2? That
means that the 3rd byte is a newline character, and on Windows, in text
files, Ruby turns newlines into CRLF. 2 bytes! Since you've got binary
data in your file you don't want to write a text file, so you must open
the file with the "b" flag in addition to "w":

f = open("test", "wb")

How do I save information from an GUI into a XML-file?	0	Aug 17, 2022
writing binary file	2	Jun 25, 2010
write binary with struct.pack_into	3	Oct 6, 2012
Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022
read binary .dat file?	1	May 24, 2007
How to write fast into a file in python?	28	May 17, 2013
Fast forward-backward (write-read)	7	Oct 23, 2012
Optimize write of large file	6	May 12, 2011

read write integer in binary into a file

Vianney Lecroart

Park Heesob

Vianney Lecroart

Michael Linfield

Vianney Lecroart

yermej

Adam Preble

Phrogz

Wu Junchen

Tim Hunter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads