apparent 4 GB memory limit for brk() in solaris 8 on some hardware...?

C

clsmyth

Folks,

Hi, I have never posted to a language group before so please excuse me
if this is inappropriate. I have posted this to comp.unix.solaris
(well, I am one of the folks on the thread at least)...the subject is
"4 GB hard constraint on a Solaris 8 server". I figured I'd post over
here because we aren't getting anywhere too fast over there, and I
think it is a code issue.

We have a piece of software that we purchased. It is a single-thread
process that does a lot of calculations in memory and then spits out
some results. The server it is running on is dedicated to it and it
alone. It is a 4-way (yes, a waste) 16 GB RAM SunFire V440 running
Solaris 8, patched to the latest recommended/security patch cluster (as
in, yesterday). The process grows to about 4 GB in RAM and then spits
out an error like this:

malloc(33554440) failed
app_name is aborting because it can't get more than 3838 megabytes of
memory
Add additional swap space or terminate other large programs before
continuing.

When I truss the process while it is in this state, I see:

# truss -p 24792
brk(0x1F3C10000) = 0
brk(0x1F3E10000) Err#12 ENOMEM
brk(0x1F3D10000) Err#12 ENOMEM
brk(0x1F3C90000) Err#12 ENOMEM
brk(0x1F3C50000) Err#12 ENOMEM
brk(0x1F3C30000) Err#12 ENOMEM
brk(0x1F3C20000) Err#12 ENOMEM
brk(0x1F3C10000) = 0
brk(0x1F3E10000) Err#12 ENOMEM
brk(0x1F3D10000) Err#12 ENOMEM
brk(0x1F3C90000) Err#12 ENOMEM
brk(0x1F3C50000) Err#12 ENOMEM
brk(0x1F3C30000) Err#12 ENOMEM
brk(0x1F3C20000) Err#12 ENOMEM
brk(0x1F3C10000) = 0
brk(0x1F3E10000) Err#12 ENOMEM
brk(0x1F3D10000) Err#12 ENOMEM
brk(0x1F3C90000) Err#12 ENOMEM
brk(0x1F3C50000) Err#12 ENOMEM
brk(0x1F3C30000) Err#12 ENOMEM
brk(0x1F3C20000) Err#12 ENOMEM
..
..
..

But there is plenty of memory and swap - nothing else is running on
this box:

$ df -lk
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c1t0d0s0 498039 81783 366453 19% /
/dev/dsk/c1t0d0s6 4133838 805010 3287490 20% /usr
/proc 0 0 0 0% /proc
fd 0 0 0 0% /dev/fd
mnttab 0 0 0 0% /etc/mnttab
/dev/dsk/c1t0d0s3 2058319 192851 1803719 10% /var
swap 26490784 16 26490768 1% /var/run
swap 1048576 24 1048552 1% /tmp
/dev/dsk/c1t0d0s7 45455873 289046 44712269 1% /usr1
/dev/dsk/c1t0d0s5 2012959 727434 1225137 38% /opt

$ swap -s
total: 3988120k bytes allocated + 46792k reserved = 4034912k used,
26492744k available

$ swap -l
swapfile dev swaplo blocks free
/dev/dsk/c1t0d0s1 32,9 16 33163568 33163568

When I do a "file" on the executable I get this:

# file /path/to/exe
/path/to/exe: ELF 64-bit MSB executable SPARCV9 Version 1,
dynamically linked, stripped

The box is definitely running 64-bit Solaris 8:

# eeprom | grep boot-f
boot-file: data not available.

# eeprom | grep diag-
diag-passes=1
diag-file: data not available.
diag-device=net
diag-trigger=error-reset power-on-reset
diag-script=normal
diag-level=min
diag-switch?=false

# isainfo -v
64-bit sparcv9 applications
32-bit sparc applications

Here's some pmap data from today's failure, too:

# pmap 24792
24792: /path/to/exe
0000000100000000 54504K read/exec /path/to/exe
0000000103638000 5896K read/write/exec /path/to/exe
0000000103BFA000 3932248K read/write/exec [ heap ]
FFFFFFFF7CC00000 8K read/write [ anon ]
FFFFFFFF7CE08000 8K read/write [ anon ]
FFFFFFFF7CF00000 8K read/write/exec/shared [ anon ]
FFFFFFFF7D002000 8K read/write [ anon ]
FFFFFFFF7D204000 8K read/write [ anon ]
FFFFFFFF7D406000 8K read/write [ anon ]
FFFFFFFF7D608000 8K read/write [ anon ]
FFFFFFFF7D80A000 8K read/write [ anon ]
FFFFFFFF7DA08000 8K read/write [ anon ]
FFFFFFFF7DA0C000 8K read/write [ anon ]
FFFFFFFF7DC0A000 8K read/write [ anon ]
FFFFFFFF7DC0E000 8K read/write [ anon ]
FFFFFFFF7DE0C000 8K read/write [ anon ]
FFFFFFFF7DE10000 8K read/write [ anon ]
FFFFFFFF7E00E000 8K read/write [ anon ]
FFFFFFFF7E100000 8K read/write/exec [ anon ]
FFFFFFFF7E200000 16K read/exec /usr/lib/sparcv9/libmp.so.2
FFFFFFFF7E304000 8K read/write/exec /usr/lib/sparcv9/libmp.so.2
FFFFFFFF7E400000 728K read/exec /usr/lib/sparcv9/libc.so.1
FFFFFFFF7E5B6000 56K read/write/exec /usr/lib/sparcv9/libc.so.1
FFFFFFFF7E5C4000 8K read/write/exec /usr/lib/sparcv9/libc.so.1
FFFFFFFF7E600000 8K read/write/exec [ anon ]
FFFFFFFF7E700000 128K read/exec /usr/lib/sparcv9/libthread.so.1
FFFFFFFF7E820000 16K read/write/exec
/usr/lib/sparcv9/libthread.so.1
FFFFFFFF7E824000 64K read/write/exec
/usr/lib/sparcv9/libthread.so.1
FFFFFFFF7E900000 8K read/exec
/usr/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
FFFFFFFF7EA00000 56K read/exec /usr/lib/sparcv9/libCrun.so.1
FFFFFFFF7EB0C000 16K read/write/exec /usr/lib/sparcv9/libCrun.so.1
FFFFFFFF7EB10000 16K read/write/exec /usr/lib/sparcv9/libCrun.so.1
FFFFFFFF7EC00000 216K read/exec /usr/lib/sparcv9/libm.so.1
FFFFFFFF7ED34000 16K read/write/exec /usr/lib/sparcv9/libm.so.1
FFFFFFFF7EE00000 8K read/write/exec /usr/lib/sparcv9/libdl.so.1
FFFFFFFF7EF00000 8K read/write/exec [ anon ]
FFFFFFFF7F000000 672K read/exec /usr/lib/sparcv9/libnsl.so.1
FFFFFFFF7F1A8000 64K read/write/exec /usr/lib/sparcv9/libnsl.so.1
FFFFFFFF7F1B8000 32K read/write/exec /usr/lib/sparcv9/libnsl.so.1
FFFFFFFF7F200000 56K read/exec /usr/lib/sparcv9/libsocket.so.1
FFFFFFFF7F30E000 16K read/write/exec
/usr/lib/sparcv9/libsocket.so.1
FFFFFFFF7F400000 8K read/exec /usr/lib/sparcv9/libw.so.1
FFFFFFFF7F500000 8K read/write/exec [ anon ]
FFFFFFFF7F600000 176K read/exec /usr/lib/sparcv9/ld.so.1
FFFFFFFF7F72C000 16K read/write/exec /usr/lib/sparcv9/ld.so.1
FFFFFFFF7FFF2000 56K read/write [ stack ]
total 3995256K

# pmap -r 24792
24792: /path/to/exe
0000000100000000 54504K read/exec /path/to/exe
0000000103638000 5896K read/write/exec /path/to/exe
0000000103BFA000 3932248K read/write/exec [ heap ]
FFFFFFFF7B800000 8K - [ anon ]
FFFFFFFF7B802000 20480K read/write [ anon ]
FFFFFFFF7CE00000 8K - [ anon ]
FFFFFFFF7CE02000 32K read/write [ anon ]
FFFFFFFF7CF00000 8K read/write/exec/shared [ anon ]
FFFFFFFF7D000000 8K - [ anon ]
FFFFFFFF7D002000 2048K read/write [ anon ]
FFFFFFFF7D202000 8K - [ anon ]
FFFFFFFF7D204000 2048K read/write [ anon ]
FFFFFFFF7D404000 8K - [ anon ]
FFFFFFFF7D406000 2048K read/write [ anon ]
FFFFFFFF7D606000 8K - [ anon ]
FFFFFFFF7D608000 2048K read/write [ anon ]
FFFFFFFF7D808000 8K - [ anon ]
FFFFFFFF7D80A000 2048K read/write [ anon ]
FFFFFFFF7DA0A000 8K - [ anon ]
FFFFFFFF7DA0C000 2048K read/write [ anon ]
FFFFFFFF7DC0C000 8K - [ anon ]
FFFFFFFF7DC0E000 2048K read/write [ anon ]
FFFFFFFF7DE0E000 8K - [ anon ]
FFFFFFFF7DE10000 2048K read/write [ anon ]
FFFFFFFF7E100000 8K read/write/exec [ anon ]
FFFFFFFF7E200000 16K read/exec /usr/lib/sparcv9/libmp.so.2
FFFFFFFF7E304000 8K read/write/exec /usr/lib/sparcv9/libmp.so.2
FFFFFFFF7E400000 728K read/exec /usr/lib/sparcv9/libc.so.1
FFFFFFFF7E5B6000 56K read/write/exec /usr/lib/sparcv9/libc.so.1
FFFFFFFF7E5C4000 8K read/write/exec /usr/lib/sparcv9/libc.so.1
FFFFFFFF7E600000 8K read/write/exec [ anon ]
FFFFFFFF7E700000 128K read/exec /usr/lib/sparcv9/libthread.so.1
FFFFFFFF7E820000 16K read/write/exec
/usr/lib/sparcv9/libthread.so.1
FFFFFFFF7E824000 64K read/write/exec
/usr/lib/sparcv9/libthread.so.1
FFFFFFFF7E900000 8K read/exec
/usr/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
FFFFFFFF7EA00000 56K read/exec /usr/lib/sparcv9/libCrun.so.1
FFFFFFFF7EB0C000 16K read/write/exec /usr/lib/sparcv9/libCrun.so.1
FFFFFFFF7EB10000 16K read/write/exec /usr/lib/sparcv9/libCrun.so.1
FFFFFFFF7EC00000 216K read/exec /usr/lib/sparcv9/libm.so.1
FFFFFFFF7ED34000 16K read/write/exec /usr/lib/sparcv9/libm.so.1
FFFFFFFF7EE00000 8K read/write/exec /usr/lib/sparcv9/libdl.so.1
FFFFFFFF7EF00000 8K read/write/exec [ anon ]
FFFFFFFF7F000000 672K read/exec /usr/lib/sparcv9/libnsl.so.1
FFFFFFFF7F1A8000 64K read/write/exec /usr/lib/sparcv9/libnsl.so.1
FFFFFFFF7F1B8000 32K read/write/exec /usr/lib/sparcv9/libnsl.so.1
FFFFFFFF7F200000 56K read/exec /usr/lib/sparcv9/libsocket.so.1
FFFFFFFF7F30E000 16K read/write/exec
/usr/lib/sparcv9/libsocket.so.1
FFFFFFFF7F400000 8K read/exec /usr/lib/sparcv9/libw.so.1
FFFFFFFF7F500000 8K read/write/exec [ anon ]
FFFFFFFF7F600000 176K read/exec /usr/lib/sparcv9/ld.so.1
FFFFFFFF7F72C000 16K read/write/exec /usr/lib/sparcv9/ld.so.1
FFFFFFFF7F800000 8192K read/write [ stack ]
total 4040256K

# pmap -x 24792
24792: /path/to/exe
Address Kbytes Resident Shared Private Permissions
Mapped File
0000000100000000 54504 43528 43528 - read/exec exe
0000000103638000 5896 5896 304 5592 read/write/exec exe
0000000103BFA000 3932248 3897776 - 3897776 read/write/exec [
heap ]
FFFFFFFF7CC00000 8 8 - 8 read/write [
anon ]
FFFFFFFF7CE08000 8 8 - 8 read/write [
anon ]
FFFFFFFF7CF00000 8 8 8 - read/write/exec/shared
[ anon ]
FFFFFFFF7D002000 8 8 - 8 read/write [
anon ]
FFFFFFFF7D204000 8 8 - 8 read/write [
anon ]
FFFFFFFF7D406000 8 8 - 8 read/write [
anon ]
FFFFFFFF7D608000 8 8 - 8 read/write [
anon ]
FFFFFFFF7D80A000 8 8 - 8 read/write [
anon ]
FFFFFFFF7DA08000 8 8 - 8 read/write [
anon ]
FFFFFFFF7DA0C000 8 8 - 8 read/write [
anon ]
FFFFFFFF7DC0A000 8 8 - 8 read/write [
anon ]
FFFFFFFF7DC0E000 8 8 - 8 read/write [
anon ]
FFFFFFFF7DE0C000 8 8 - 8 read/write [
anon ]
FFFFFFFF7DE10000 8 8 - 8 read/write [
anon ]
FFFFFFFF7E00E000 8 8 - 8 read/write [
anon ]
FFFFFFFF7E100000 8 8 - 8 read/write/exec [
anon ]
FFFFFFFF7E200000 16 16 16 - read/exec libmp.so.2
FFFFFFFF7E304000 8 8 - 8 read/write/exec
libmp.so.2
FFFFFFFF7E400000 728 728 728 - read/exec libc.so.1
FFFFFFFF7E5B6000 56 56 - 56 read/write/exec
libc.so.1
FFFFFFFF7E5C4000 8 8 - 8 read/write/exec
libc.so.1
FFFFFFFF7E600000 8 8 - 8 read/write/exec [
anon ]
FFFFFFFF7E700000 128 128 128 - read/exec
libthread.so.1
FFFFFFFF7E820000 16 16 - 16 read/write/exec
libthread.so.1
FFFFFFFF7E824000 64 56 - 56 read/write/exec
libthread.so.1
FFFFFFFF7E900000 8 8 8 - read/exec
libc_psr.so.1
FFFFFFFF7EA00000 56 56 56 - read/exec libCrun.so.1
FFFFFFFF7EB0C000 16 16 - 16 read/write/exec
libCrun.so.1
FFFFFFFF7EB10000 16 8 - 8 read/write/exec
libCrun.so.1
FFFFFFFF7EC00000 216 216 216 - read/exec libm.so.1
FFFFFFFF7ED34000 16 16 - 16 read/write/exec
libm.so.1
FFFFFFFF7EE00000 8 8 - 8 read/write/exec
libdl.so.1
FFFFFFFF7EF00000 8 8 - 8 read/write/exec [
anon ]
FFFFFFFF7F000000 672 672 672 - read/exec libnsl.so.1
FFFFFFFF7F1A8000 64 64 - 64 read/write/exec
libnsl.so.1
FFFFFFFF7F1B8000 32 32 - 32 read/write/exec
libnsl.so.1
FFFFFFFF7F200000 56 56 56 - read/exec
libsocket.so.1
FFFFFFFF7F30E000 16 16 - 16 read/write/exec
libsocket.so.1
FFFFFFFF7F400000 8 8 8 - read/exec libw.so.1
FFFFFFFF7F500000 8 8 - 8 read/write/exec [
anon ]
FFFFFFFF7F600000 176 176 176 - read/exec ld.so.1
FFFFFFFF7F72C000 16 16 - 16 read/write/exec
ld.so.1
FFFFFFFF7FFF2000 56 56 - 56 read/write [
stack ]
---------------- ------ ------ ------ ------
total Kb 3995256 3949792 45904 3903888

Now, what really throws a wrench into the whole thing is that we have
another box with less memory on which the process seems to run fine.
At least, not throw an error. Both boxes are patched up. Both exes
are the same. We are not setting limits anywhere on either system.

I think it is a code error, but I guess I'm sorta inclined that way, as
I am the sysadmin :) But I think I have a leg to stand on, because the
particular failure shown by truss is an ENOMEM on brk(), and when I
look in the solaris 8 manpage for brk() I see this, among other things:

[snip]
The behavior of brk() and sbrk() is unspecified if an appli-
cation also uses any other memory functions (such as
malloc(3C), mmap(2), free(3C)). The brk() and sbrk() func-
tions have been used in specialized cases where no other
memory allocation function provided the same capability.
The use of mmap(2) is now preferred because it can be used
portably with all other memory allocation functions and with
any function that uses other allocation functions.

It is unspecified whether the pointer returned by sbrk() is
aligned suitably for any purpose.
[/snip]

One last thing...as I mentioned, we purchased this app. We do not have
the source, and we have no support from the vendor.

SO......

Any of you folks have any ideas?
 
C

CBFalconer

Folks,

Hi, I have never posted to a language group before so please excuse me
if this is inappropriate. I have posted this to comp.unix.solaris
(well, I am one of the folks on the thread at least)...the subject is
"4 GB hard constraint on a Solaris 8 server". I figured I'd post over
here because we aren't getting anywhere too fast over there, and I
think it is a code issue.

We have a piece of software that we purchased. It is a single-thread
process that does a lot of calculations in memory and then spits out
some results. The server it is running on is dedicated to it and it
alone. It is a 4-way (yes, a waste) 16 GB RAM SunFire V440 running
Solaris 8, patched to the latest recommended/security patch cluster (as
in, yesterday). The process grows to about 4 GB in RAM and then spits
out an error like this:
.... snip ...

It is inappropriate here. We discuss the _portable_ C language, as
defined in the ISO standards. The moment you have to mention a
particular compiler or system it doesn't belong here. I think you
had the right milieu in the first place, but are too verbose. Cut
things down to something compilable that exhibits the problem and
is under 100 lines.
 
C

clsmyth

Thanks, CBF.

FWIW, I wouldn't have dumped so much crap into my first post, but I
thought I'd summarize the 18 messages in the other thread right away,
and avoid the iterative "Well, have you tried this? Well, what about
this, then?" process over here :)

Is there a C-on-solaris newsgroup, or any other newsgroup where C-savvy
Solaris app developers hang out? I'm not trying to insult the folks on
comp.unix.solaris, I'm just wondering if that's the only place there is
for this question.

Thanks,
CLS
 
E

Eric Sosman

Folks,

Hi, I have never posted to a language group before so please excuse me
if this is inappropriate.

It's not -- but it's closer to appropriate than some of
the nonsense that goes on around here, ergo ego te absolvo
if you'll say ten "Hail, Dennises" and give up undefined
behavior for a week.
[...] The process grows to about 4 GB in RAM and then spits
out an error like this:

malloc(33554440) failed
app_name is aborting because it can't get more than 3838 megabytes of
memory

<off-topic>

I betcha I betcha I just betcha the program was compiled
and linked as a 32-bit application. A 32-bit pointer can have
only 4G distinct values, and so can address only 4GB of memory
no matter how much the system may have available. A 32-bit
process is therefore limited to an address of no more than
4GB -- and it turns out that your O/S (Solaris) claims a little
bit of that space for its own purposes.

The cure -- well, there are two possible cures. First, you
may be looking at a memory leak in the program that causes it
to grow without limit. If so, you need to find and fix that
bug. Second, the program may be all right but your problem size
may require more memory than 4GB-minus-a-sliver. If so, you need
to rebuild the program as a 64-bit application; this will get
you a "modest increase" in the amount of memory you can handle ...

</off-topic>

Good luck!
 
K

Keith Thompson

FWIW, I wouldn't have dumped so much crap into my first post, but I
thought I'd summarize the 18 messages in the other thread right away,
and avoid the iterative "Well, have you tried this? Well, what about
this, then?" process over here :)

Is there a C-on-solaris newsgroup, or any other newsgroup where C-savvy
Solaris app developers hang out? I'm not trying to insult the folks on
comp.unix.solaris, I'm just wondering if that's the only place there is
for this question.

comp.unix.programmer?
 
R

Rob Thorpe

Folks,

Hi, I have never posted to a language group before so please excuse me
if this is inappropriate. I have posted this to comp.unix.solaris
(well, I am one of the folks on the thread at least)...the subject is
"4 GB hard constraint on a Solaris 8 server". I figured I'd post over
here because we aren't getting anywhere too fast over there, and I
think it is a code issue.

We have a piece of software that we purchased. It is a single-thread
process that does a lot of calculations in memory and then spits out
some results. The server it is running on is dedicated to it and it
alone. It is a 4-way (yes, a waste) 16 GB RAM SunFire V440 running
Solaris 8, patched to the latest recommended/security patch cluster (as
in, yesterday). The process grows to about 4 GB in RAM and then spits
out an error like this:

malloc(33554440) failed
app_name is aborting because it can't get more than 3838 megabytes of
memory

See the documentation for getrlimit and setrlimit. As Eric said, many
Solaris programs are 32-bit programs with 32-bit pointers (because it's
faster).

If that doesn't help post to comp.unix.programmer
 
G

Grumble

Eric said:
[...] The process grows to about 4 GB in RAM and then spits
out an error like this:

malloc(33554440) failed
app_name is aborting because it can't get more than 3838 megabytes of
memory

<off-topic>

I betcha I betcha I just betcha the program was compiled
and linked as a 32-bit application.

Eric: did you miss the part where he wrote...

# file /path/to/exe
/path/to/exe: ELF 64-bit MSB executable SPARCV9 Version 1,
dynamically linked, stripped

clsmyth: I suggest you try comp.unix.programmer
 
E

Eric Sosman

Grumble said:
Eric: did you miss the part where he wrote...

# file /path/to/exe
/path/to/exe: ELF 64-bit MSB executable SPARCV9 Version 1,
dynamically linked, stripped

Yes, I missed that. Back to the drawing board -- or
at any rate, off to some other newsgroup ...
 
C

clsmyth

Thanks very much, folks...I found comp.unix.programmer and posted over
there. I really appreciate the help.

-cls
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top