Bitwise OR on large memory regions

O

oswald.kluge

Dear Reader,

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

Thanks for your help!

Best,
Oswald


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
V

Victor Bazarov

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or
copy this arrays very fast, but are there similar functions for
bitwise OR?

I would use

template <class T> struct binOR : std::binary_function<T,T,T> {
T operator()(const T& x, const T& y) const {
return x | y;
}
};
...
std::transform(begin1, end1, begin2, output, binOR);

And hope that the compiler is capable of optimizing that stuff.

Don't forget to include <functional> and <algorithm>.

V
 
F

Frederick Gotham

Oswald posted:
what efficient ways are there to OR two large memory regions?


Here's one idea:


If an "unsigned int" has no trap representations, then OR the memory as if
you're OR-ing an array of unsigned integers.

#include <cstddef>
#include <cassert>

unsigned *ArrayOR(unsigned const *p1,unsigned const *p2,std::size_t len)
{
assert(p1);
assert(p2);
assert(len);

unsigned *const retval = new unsigned[len];

unsigned *p = retval;

do *p++ = *p1++ | *p2++; while(--len);

return retval;
}


Then do something like:

void *MemOR(void const *const p1arg,
void const *const p2arg,
std::size_t const bytes)
{
unsigned char const *p1 = p1arg;
unsigned char const *p2 = p2arg;

/* Align the pointers correctly, then
clean up any extra bytes later on. */


ArrayOR(p1,p2,a,amount_int); /* Casts necessary */


/* Now deal with the extra bytes at the start
and at the end. */
}
 
H

Howard

Frederick Gotham said:
Oswald posted:
what efficient ways are there to OR two large memory regions?


Here's one idea:


If an "unsigned int" has no trap representations, then OR the memory as if
you're OR-ing an array of unsigned integers.

#include <cstddef>
#include <cassert>

unsigned *ArrayOR(unsigned const *p1,unsigned const *p2,std::size_t len)
{
assert(p1);
assert(p2);
assert(len);

unsigned *const retval = new unsigned[len];

unsigned *p = retval;

do *p++ = *p1++ | *p2++; while(--len);

Yuk. This is the kind of thing I hate to see in C++. Everything on one
line, full of ++ and --. Why not make a for loop, or a loop that's more
readable at least? The optimizer will likely resolve it to the same code in
the end.
return retval;
}


Then do something like:

void *MemOR(void const *const p1arg,
void const *const p2arg,
std::size_t const bytes)
{
unsigned char const *p1 = p1arg;
unsigned char const *p2 = p2arg;

/* Align the pointers correctly, then
clean up any extra bytes later on. */


ArrayOR(p1,p2,a,amount_int); /* Casts necessary */


/* Now deal with the extra bytes at the start
and at the end. */
}

You've new'd memory there in ArrayOR, but not assigned it to anything.
Besides not using the results, it's a memory leak, eh?

If the poster wanted an in-place OR, a simple loop would suffice, instead of
a function:

for (int i = 0; i < len; ++i)
a |= b;

Quick, easy, and obvious. My kind of C++. :)

-Howard
 
F

Frederick Gotham

Howard posted:
Yuk. This is the kind of thing I hate to see in C++. Everything on one
line, full of ++ and --. Why not make a for loop, or a loop that's more
readable at least? The optimizer will likely resolve it to the same
code in the end.


Different strokes for different strokes.

I quite like the code.

(Also, a "for" loop performs an evaluation before its first iteration.)

You've new'd memory there in ArrayOR, but not assigned it to anything.
Besides not using the results, it's a memory leak, eh?


Incomplete sample code.

If the poster wanted an in-place OR, a simple loop would suffice,
instead of a function:

for (int i = 0; i < len; ++i)
a |= b;

Quick, easy, and obvious. My kind of C++. :)



Quick, easy, obvious, and inefficient.

Pointers are superior to array subscripting in this context.

int const *const p_over = p1 + len;

do *p1++ |= *p2++; while(p1 != p_over);
 
H

Howard

Frederick Gotham said:
Howard posted:



Different strokes for different strokes.

I quite like the code.

(Also, a "for" loop performs an evaluation before its first iteration.)

So? Is one check that much slower? (And it allows for length of zero, as
well.)
You've new'd memory there in ArrayOR, but not assigned it to anything.
Besides not using the results, it's a memory leak, eh?


Incomplete sample code.

If the poster wanted an in-place OR, a simple loop would suffice,
instead of a function:

for (int i = 0; i < len; ++i)
a |= b;

Quick, easy, and obvious. My kind of C++. :)



Quick, easy, obvious, and inefficient.


Says who?
Pointers are superior to array subscripting in this context.

Really? Why? How do you know that the compiler's optimizer would produce
code that was slower?
int const *const p_over = p1 + len;

do *p1++ |= *p2++; while(p1 != p_over);

Ok, now suppose there's a problem with the code. Where do you set the
breakpoint if it's all on one line? Putting code on one line doesn't make
it any clearer, or any faster, and only makes it harder to read and harder
to debug.

Like you said, "different strokes...", but I prefer to make life easier for
myself and for anyone who follows me. (Of course, there are those who
figure the more difficult the code is to understand, the less likely they'll
get replaced... :))

-Howard
 
H

Howard

Dear Reader,

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

Thanks for your help!

Best,
Oswald

There's nothing particularly slow about a loop. If you're just ORing one
array against the other, a simple loop should be fine. (If the hardware
being compiled for has a built-in function for ORing data in a loop, it's
likely that the compiler will optimize for that anyway.)

-Howard




[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
T

Thomas Richter

In said:
what efficient ways are there to OR two large memory regions?

Portable or non-portable?
I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

No, not in the standard libraries. The fastest you can get and still be
portable is to write the loop. A smart compiler might actually be able to
vectorize it. That said, some compilers offer special built-in functions
that make use of specialized assembly instructions for vectors.
IIRC, the GNU g++ does provide such functions. Note that this is a non-
portable solution. Another (non-portable) catch on some processors would
be to cast the char * to unsigned long * and binary or' several char's
at once (again, non-portable).

Have you actually checked that your bottleneck is within the char * array
so it requires improvement?

So long,
Thomas

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
F

faceman28208

Frederick said:
Howard posted:
If the poster wanted an in-place OR, a simple loop would suffice,
instead of a function:

for (int i = 0; i < len; ++i)
a |= b;

Quick, easy, and obvious. My kind of C++. :)



Quick, easy, obvious, and inefficient.

Pointers are superior to array subscripting in this context.


I was asked to review a "Software Engineering" book where the author
devoted two chapters to generating "more efficient" code through
various code obfuscation methods, including pointers instead of arrays.
Using both assembly listings and actual timings on various systems, I
showed that not one of his techniques produced faster code.

I find it amazing how many people think they can out optimize the
opimizer through coding quirks.

However, think about how the two different loops translate directly
into a generic assembly language: The number of instructions differ
only in the number of initialization steps. If the underlying system
does not have an autoincrement addressing mode (ie the pentium) you
have to add two more instructions to the pointer version. If the system
does not support a displacement addressing mmode, you have to add
instructions to the array version.

Of course none of this takes into account optimization.

In any event the blanket statement that one produces better code over
the other is nonsense.

;; Index

MOV #1, R0
@1:
MOV A [R0], R1
OR B[R0], R1
MOV R1, A[R0]
INC R0
CMP R0, len
BLSS @1

;; Pointer
MOVA A, R0
MOVA B, R1
MOV R0, R2
ADD LEN, R2
@1:
MOV (R0), R3
OR (R1)+, R3
MOV R3, (R0)+
CMP R0, R2
BLSS @1
 
T

Thomas Tutone

Frederick said:
Howard posted:
do *p++ = *p1++ | *p2++; while(--len);

If the poster wanted an in-place OR, a simple loop would suffice,
instead of a function:

for (int i = 0; i < len; ++i)
a |= b;

Quick, easy, and obvious. My kind of C++. :)



Quick, easy, obvious, and inefficient.

Pointers are superior to array subscripting in this context.


Oh, for goodness sake.

Can you provide an example of even one implementation in common use
today in which using pointers instead of array subscripts in THIS
context results in measurably faster code?

That's a rhetorical question.

Any decent optimizer would hoist the repeated addition operation
required for the array subscript out of the loop, so that the resulting
machine code is identical for the two.

If you really want to get into the weeds with respect to these
micro-optimizations - and I don't recommend it - at least spend some
time reviewing the assembly generated by your compiler with
optimization turned on, and see whether these "inefficiencies" are real
or imagined.

Best regards,

Tom
 
T

Thomas Tutone

Thomas said:
Frederick said:
Howard posted:
do *p++ = *p1++ | *p2++; while(--len);
If the poster wanted an in-place OR, a simple loop would suffice,
instead of a function:

for (int i = 0; i < len; ++i)
a |= b;

Quick, easy, and obvious. My kind of C++. :)



Quick, easy, obvious, and inefficient.

Pointers are superior to array subscripting in this context.


Oh, for goodness sake.

Can you provide an example of even one implementation in common use
today in which using pointers instead of array subscripts in THIS
context results in measurably faster code?

That's a rhetorical question.

Any decent optimizer would hoist the repeated addition operation
required for the array subscript out of the loop, so that the resulting
machine code is identical for the two.


Oops - that's an exaggeration - I mean to say that the resulting code
is "substantially the same for the two."

My point stands, though.
If you really want to get into the weeds with respect to these
micro-optimizations - and I don't recommend it - at least spend some
time reviewing the assembly generated by your compiler with
optimization turned on, and see whether these "inefficiencies" are real
or imagined.

Best regards,

Tom
 
M

Maxim Yegorushkin

Dear Reader,

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

If speed is paramount you could consider using sse-like instructions
for your target platform.

Otherwise, you could use standard c++ algorithms. Something like this:

#include <iostream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <boost/lambda/lambda.hpp>

int main(int ac, char**)
{
std::vector<unsigned long> v1(ac * 1000000);
std::vector<unsigned long> v2(ac * 1000000);
generate(v1.begin(), v1.end(), rand);
generate(v2.begin(), v2.end(), rand);
std::vector<unsigned long> r(v1.size());
transform(v1.begin(), v1.end(), v2.begin(), r.begin(),
boost::lambda::_1 | boost::lambda::_2);
unsigned long x = accumulate(r.begin(), r.end(), 0ul);
std::cout << x << '\n';
}

The machine code generated from the code above may be not that bad.
With gcc 4.1.1 the transform loop looks like this (-O3
-fomit-frame-pointer -march=pentium-m)

..L38:
mov %eax, DWORD PTR [%ebx]
add %ebx, 4
or %eax, DWORD PTR [%ecx]
add %ecx, 4
mov DWORD PTR [%edx], %eax
add %edx, 4
cmp %edi, %ecx
jne .L38

Some unix platforms (linux among them) have type long of the same size
as void*. For some popular x86-64 processors it may imply that the
processor is able to deal with 64-bit integers efficiently. On opteron
(-fomit-frame-pointer -march=opteron) the same loop is:

..L38:
mov %rax, QWORD PTR [%rsi]
or %rax, QWORD PTR [%rcx]
add %rcx, 8
add %rsi, 8
mov QWORD PTR [%rdx], %rax
add %rdx, 8
cmp %r9, %rcx
jne .L38


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
J

Jeffrey Schwab

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

At the least, you could bitwise-or multiple characters at a time (on
most popular architectures) by casting to a larger built-in type. E.g.,
given two arrays of N characters each, casting the arrays to T* can
reduce the number of bitwise-or operations from N to ((N / sizeof(T)) +
(N % sizeof(T))).

Here's an absurdly quick & dirty (and non-portable) example. The
casting technique seems to give a speedup near the 8x I would expect on
my platform. (I chose to bitwise-or a bunch of 'A' with the bit that
converts it to lowercase in my local encoding.)

#include <algorithm>
#include <ctime>
#include <iostream>

struct Bitwise_or {
template<typename T>
T operator()(T t1, T t2) {
return t1 | t2;
}
};

int main() {
int const n = 0x10000000; // Work with big arrays.

char* a = new char[n];
char* b = new char[n];
char* c = new char[n];

std::fill(a, a + n, 'A');
std::fill(b, b + n, 'a' & ~'A');

// c = a | b
std::time_t t = std::time(0);
std::transform(a, a + n, b, c, Bitwise_or());
std::cout << (std::time(0) - t) << '\n';

typedef long long T;

T* p = reinterpret_cast<T*>(a - sizeof(T));
T* q = reinterpret_cast<T*>(b - sizeof(T));
T* r = reinterpret_cast<T*>(c - sizeof(T));

t = std::time(0);
std::transform(p, p + n / sizeof(T), q, r, Bitwise_or());
std::cout << (std::time(0) - t) << '\n';

delete[] a;
delete[] b;
delete[] c;
}

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
G

glenlow

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

You can look at my cross-platform SIMD library macstl. It will run the
bitwise OR using SSE on Intel platforms and Altivec on PowerPC
platforms as follows:

stdext::refarray <char> ref1 (region1, lengthOfRegion); // regions need
to be 16-byte aligned
stdext::refarray <char> ref2 (region2, lengthOfRegion);
stdext::refarray <char> ref3 (region3, lengthOfRegion);
ref1 = ref2 | ref3;

Simple!

http://www.pixelglow.com/macstl/

Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
K

kanze

what efficient ways are there to OR two large memory regions?
I'm especially interested in applying this operation to two
large character arrays. I know memset and memcopy lets you
initialise or copy this arrays very fast, but are there
similar functions for bitwise OR?

The obvious solution is std::transform. For some strange
reason, there is no or operator in the objects in <functional>,
but it should be no problem to come up with one.

If for some reason this isn't fast enough, then you'll have to
look for special tricks. If the arrays are always word aligned,
and a multiple of bytes in word in length, reinterpret_cast'ing
the char* to word* (where word is unsigned or unsigned long),
and using transform on the words, could help. And if you really
need speed, a lot of machines have graphic cards with
specialized processors just for this sort of thing; *IF* you can
access the card (and that's a big if), offloading the work to it
could result in a big gain. (There's some sort of special DLL
for this under Windows, I think. I don't know much about it,
but I know that every time my son installs a game on his Windows
machine, it asks if he wants to update this DLL as well.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
A

Allan W

Jeffrey said:
#include <algorithm>
#include <ctime>
#include <iostream>

struct Bitwise_or {
template<typename T>
T operator()(T t1, T t2) {
return t1 | t2;
}
};

int main() {
int const n = 0x10000000; // Work with big arrays.

char* a = new char[n];
char* b = new char[n];
char* c = new char[n];
std::fill(a, a + n, 'A');
std::fill(b, b + n, 'a' & ~'A'); [snip]
typedef long long T;
T* p = reinterpret_cast<T*>(a - sizeof(T));
T* q = reinterpret_cast<T*>(b - sizeof(T));
T* r = reinterpret_cast<T*>(c - sizeof(T));

Why do you subtract sizeof(T) on these three lines?
Isn't this guaranteeing a memory overwrite?
t = std::time(0);
std::transform(p, p + n / sizeof(T), q, r, Bitwise_or());
std::cout << (std::time(0) - t) << '\n';

delete[] a;
delete[] b;
delete[] c;
}


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
J

Jeffrey Schwab

Allan said:
Why do you subtract sizeof(T) on these three lines?
Isn't this guaranteeing a memory overwrite?

Yep. Good catch. Version 0.2:

#include <algorithm>
#include <ctime>
#include <iostream>
#include <iterator>

struct Bitwise_or {
template<typename T>
T operator()(T t1, T t2) {
return t1 | t2;
}
};

int main() {
int const n = 0x10000000; // Work with big arrays.

char* a = new char[n];
char* b = new char[n];
char* c = new char[n];

std::fill(a, a + n, 'A');
std::fill(b, b + n, 'a' & ~'A');

// c = a | b
std::time_t t = std::time(0);
std::transform(a, a + n, b, c, Bitwise_or());
std::cout << (std::time(0) - t) << '\n';

typedef long long T;

T* p = reinterpret_cast<T*>(a);
T* q = reinterpret_cast<T*>(b);
T* r = reinterpret_cast<T*>(c);

t = std::time(0);
std::transform(p, p + n / sizeof(T), q, r, Bitwise_or());
std::cout << (std::time(0) - t) << '\n';

delete[] c;
delete[] b;
delete[] a;
}

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
J

Jerry Coffin

Dear Reader,

what efficient ways are there to OR two large memory regions?

I'm especially interested in applying this operation to two large
character arrays. I know memset and memcopy lets you initialise or copy
this arrays very fast, but are there similar functions for bitwise OR?

Depending on your compiler, standard library, etc., it's _possible_
that using valarray's and their |= operator would be useful. There's
no guarantee of that though. In fact, most implementations of
valarray see very little real use, so they may not work particularly
well at all. OTOH, the interface was designed to make it easy to
implement with vector operations, so it's conceivable that your
library vendor has done that...

--
Later,
Jerry.

The universe is a figment of its own imagination.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
O

oswald.kluge

Thank you for all your replies! However I encountered another practical
problem:
Here's an absurdly quick & dirty (and non-portable) example. The
casting technique seems to give a speedup near the 8x I would expect on
my platform. (I chose to bitwise-or a bunch of 'A' with the bit that
converts it to lowercase in my local encoding.)

.......


typedef long long T;

T* p = reinterpret_cast<T*>(a - sizeof(T));
T* q = reinterpret_cast<T*>(b - sizeof(T));
T* r = reinterpret_cast<T*>(c - sizeof(T));

Now unfortunately in my case 'a' is a two-dimensional character-array,
say

char a[1000000][10];

When we look up an entry by a[j], let's call the entries indexed by
i 'rows' and the ones indexed by j 'columns'. Now the task is described
easily:

How can one do a bitwise-or on two rows of 'a' efficiently? Or more
general: how can one do a bitwise-or on n>1 rows of 'a' efficiently?

Thanks for any help!

Oswald


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top