Fast way to add null after each char

T

tni

My bad. I intentionally posted a wrong suggestion without clearly
marking it as such - I thought I was going to be castigated immediately,
but since the punishment didn't come at once, I kept it on to see what
was going to happen... now I realize that it wasn't all that fun for the
others, so I present my apologies to the group for the wasted time.

The sad thing is, I've seen quadratic stuff like that far too often in
production code. There certainly are more than enough (clueless) people
who wouldn't consider it a joke.
 
F

Francesco S. Carta

The sad thing is, I've seen quadratic stuff like that far too often in
production code. There certainly are more than enough (clueless) people
who wouldn't consider it a joke.

Indeed. And since it seems that most of those programmers has no idea of
what the O() notation is, simply saying that an algorithm has linear or
quadratic complexity is not enough to make the point clear.

Unfortunately it needs more words (and practical examples as the one you
posted in the other branch) and a big fat "quadratic is BAD" sign three
meters on each side, but it's worth the effort.
 
B

Brad

You're right, of course, and finally somebody posted the correct,
explicit objection to the first response of mine, which was
over-zealously half-snipped by SG:

"Faster, I don't know (measure it), in place, yes: use the
std::string::insert() method."

My purpose was to push the OP to make all the tests and the reasonings.

But the OP disappeared and the group took circa ten posts to come down
to this, I won't post any bait like this anymore, just to save my time :)

Busy weekend. Thanks for the replies (all of you).

Why do such a thing?

Over simplified, but here is why: A Microsoft NT hash is a string
(encoded as I described (utf-16le)) with MD4 then applied. In short,
that produces an NT hash. So if I wanted to create an NT hash for the
word "easy" and I had a std::string the process would look something
like this:

std::string NTHash = MD4(utf-16le_encode(std::string));

My initial post confused the issue by using the term "unicode" in a
variable name (although this is a form of unicode (utf-16le)
encoding). And no, by default wstring doesn't magically do this.

--------------------------------------

insert works fine for in place. Faster than my for loop? Does not seem
so. I'm not sure it needs to be any faster. I was just wondering.
Maybe my approach is slow. Obviously, the encoding is an additional
step to perform (plain MD4 on std::string is faster than encode
std::string then MD4... but not a whole lot). BTW, these strings would
never be 10 megs ;)

That's about it. Thanks again.

Brad
 
F

Francesco S. Carta

Busy weekend. Thanks for the replies (all of you).

Why do such a thing?

Over simplified, but here is why: A Microsoft NT hash is a string
(encoded as I described (utf-16le)) with MD4 then applied. In short,
that produces an NT hash. So if I wanted to create an NT hash for the
word "easy" and I had a std::string the process would look something
like this:

std::string NTHash = MD4(utf-16le_encode(std::string));

My initial post confused the issue by using the term "unicode" in a
variable name (although this is a form of unicode (utf-16le)
encoding). And no, by default wstring doesn't magically do this.

--------------------------------------

insert works fine for in place. Faster than my for loop? Does not seem
so. I'm not sure it needs to be any faster. I was just wondering.
Maybe my approach is slow. Obviously, the encoding is an additional
step to perform (plain MD4 on std::string is faster than encode
std::string then MD4... but not a whole lot). BTW, these strings would
never be 10 megs ;)

That's about it. Thanks again.

You're welcome.

Your initial implementation is not slow, it's pretty fine, my only
(serious) suggestion about it is to use something like the second
implementation I presented, which takes advantage of std::string::swap()
to avoid an unneeded additional copy from the working string to the
original one and optimizes the allocation by creating an
appropriately-sized string beforehand.

Completely forget the insert() method for containers such as std::vector
and std::string - because it leads to very bad performances - but
remember it for containers like std::list - where it is good.
 
A

Alf P. Steinbach /Usenet

* Brad, on 07.09.2010 14:45:
Busy weekend. Thanks for the replies (all of you).

Why do such a thing?

Over simplified, but here is why: A Microsoft NT hash is a string
(encoded as I described (utf-16le)) with MD4 then applied. In short,
that produces an NT hash. So if I wanted to create an NT hash for the
word "easy" and I had a std::string the process would look something
like this:

std::string NTHash = MD4(utf-16le_encode(std::string));

My initial post confused the issue by using the term "unicode" in a
variable name (although this is a form of unicode (utf-16le)
encoding). And no, by default wstring doesn't magically do this.

OK, with more information more can be said.

First, in Windows, conversion to wstring like

string s = "blah blah";
wstring u( s.begin(), s.end() )

produces exactly the byte sequence that you're laboriously creating.

You're right that no magic is involved. However, no magic is necessary.

That said, second, Windows ANSI Western is a superset, not a subset, of Latin 1,
and so this zero-insertion procedure does not in general produce UTF-16.

E.g., if s contains a '€' Euro sign you won't get proper UTF-16. If you need
proper UTF-16 you should use one of the conversion functions available in the
standard library, or alternatively in the Windows API.

Third, I've already given this advice and it gets tiresome repeating it.


Cheers & hth.,

- Alf
 
B

Brad

OK, with more information more can be said.

First, in Windows, conversion to wstring like

   string  s = "blah blah";
   wstring u( s.begin(), s.end() )

produces exactly the byte sequence that you're laboriously creating.


I'm not programming on Windows or using with Windows at all. Just
because I have to deal with NT hashes, doesn't mean I'm using Windows
to deal with them :)

Thanks,

Brad
 
S

Stuart Golodetz

Francesco said:
Indeed. And since it seems that most of those programmers has no idea of
what the O() notation is, simply saying that an algorithm has linear or
quadratic complexity is not enough to make the point clear.

Unfortunately it needs more words (and practical examples as the one you
posted in the other branch) and a big fat "quadratic is BAD" sign three
meters on each side, but it's worth the effort.

You're oversimplifying :)

It's not that quadratic algorithms per se are necessarily bad -- for
some problems (matrix multiplication, for example), a quadratic solution
may in fact be optimal. The point is that you want to avoid implementing
solutions that are inadequate for the problem at hand (whatever their
actual complexity).

Regards,
Stu
 
F

Francesco S. Carta

You're oversimplifying :)

It's not that quadratic algorithms per se are necessarily bad -- for
some problems (matrix multiplication, for example), a quadratic solution
may in fact be optimal. The point is that you want to avoid implementing
solutions that are inadequate for the problem at hand (whatever their
actual complexity).

Indeed, I was unconsciously implying the problem (and the context) at
hand, where anything more than linear complexity would be sub-optimal,
thank you for the refinement.

Of course, an algorithm solving the TSP with quadratic complexity would
be very very good - no idea if something like that is even conceivable,
I'm not a mathematician, and for that very reason, I must take your word
about quadratic being optimal in matrix multiplication !-)
 
J

Joshua Maurice

You're oversimplifying :)

It's not that quadratic algorithms per se are necessarily bad -- for
some problems (matrix multiplication, for example), a quadratic solution
may in fact be optimal. The point is that you want to avoid implementing
solutions that are inadequate for the problem at hand (whatever their
actual complexity).

I assume you meant to say either that

1- O(n^2) is an optimal bound on the number of cell additions, for
doing a matrix add of two two-dimensional matrices, both of size (n,
n). In this case, you are correct.

2- O(n^3) is an optimal bound on the number of cell multiplications,
for doing a matrix multiply of two two-dimensional matrices, both of
size (n, n). Even assuming this interpretation, you are incorrect. You
can do better than O(n^3), though it's not the easiest thing to
program:
http://en.wikipedia.org/wiki/Matrix_multiplication#Algorithms_for_efficient_matrix_multiplication
 
S

Stuart Golodetz

Joshua said:
I assume you meant to say either that

1- O(n^2) is an optimal bound on the number of cell additions, for
doing a matrix add of two two-dimensional matrices, both of size (n,
n). In this case, you are correct.

2- O(n^3) is an optimal bound on the number of cell multiplications,
for doing a matrix multiply of two two-dimensional matrices, both of
size (n, n). Even assuming this interpretation, you are incorrect. You
can do better than O(n^3), though it's not the easiest thing to
program:
http://en.wikipedia.org/wiki/Matrix_multiplication#Algorithms_for_efficient_matrix_multiplication

What I meant by that was that matrix multiplication of two n x n
matrices is Omega(n^2) -- i.e. you can't do better than a solution of
quadratic complexity (since the output matrix depends on all the
elements of the two input matrices, you have to at least read all of
them, which instantly makes it quadratic). Thus a quadratic solution, if
one were possible, would be optimal. Actually, I'm not sure that there
isn't a tighter lower bound than that for matrix multiplication, in
which case a quadratic solution wouldn't actually be possible -- but I
was using the example to explain what I meant (I think I failed in that,
incidentally :))

What I wasn't saying was anything about the Big-O upper bound for either
matrix addition or multiplication. I'm aware that you can beat O(n^3)
for matrix multiplication -- e.g. Strassen's algorithm is O(n^2.807).
I've never actually sat down and implemented it, though, because I've
never personally needed it.

Cheers!
Stu
 
J

James Kanze

* Brad, on 07.09.2010 14:45:

[...]
First, in Windows, conversion to wstring like
string s = "blah blah";
wstring u( s.begin(), s.end() )
produces exactly the byte sequence that you're laboriously creating.

Only if you compile with the /J option. By default, char is
signed, and accented characters will result in incorrect
Unicode. (You could use boost::transform_iterator to convert
the char to unsigned char.)

This isn't a Windows issue: the standard requires the above to
work, *and* defines the semantics. (Semantics which are, in
general, useless.)
 
S

Stuart Golodetz

Francesco said:
Indeed, I was unconsciously implying the problem (and the context) at
hand, where anything more than linear complexity would be sub-optimal,
thank you for the refinement.

Indeed -- for what it's worth, I only meant to nitpick you in a friendly
way, hence the :)

Incidentally, I think it's worth observing that the algorithm with the
optimal Big-O complexity may not necessarily always be the optimal
solution to your problem. For that matter, the algorithm with the
optimal *actual* complexity (i.e. including the constant factors) may
not necessarily always be the optimal solution to your problem -- if
it's so damn hard to implement that you could have solved the problem
with the worse algorithm by the time you implemented the better one,
then you should implement the worse one and feel no guilt about it
afterwards. YMMV.
Of course, an algorithm solving the TSP with quadratic complexity would
be very very good - no idea if something like that is even conceivable,
I'm not a mathematician, and for that very reason, I must take your word
about quadratic being optimal in matrix multiplication !-)

If you came up with a quadratic complexity algorithm for TSP (which is
NP-hard), wouldn't you have just come up with a constructive proof that
P = NP? In which case that would indeed be "very very good" :) I think
you'd probably make yourself quite a bit of money... I guess it's
conceivably possible, but unlikely in practice. (As an article of
personal faith, I strongly suspect P != NP, but maybe one day we'll know
for sure.)

In terms of matrix multiplication being at least quadratic, consider
that you have to at least read all the elements of both input matrices.
I'm not a mathematician either, btw :)

Cheers,
Stu
 
S

Stuart Golodetz

Joshua said:
Sorry. I was trying to ferret out exactly what you meant in a nice
way. :)

No offence taken :) I should have been more clear.
Indeed. My favorite way of looking at this is choice trees. I'm
fudging the formalism a little bit now, but we know that the output
matrix depends on every cell in the input matrices, so we know that
the code has to at least read each input cell. Given the instruction
set can only read a fixed finite number of cells in an instruction
(naively only 1 or 2), then we know that O(n^2) instructions are
required to do the matrix multiplication. The bound of O(n^2) is thus
a lower bound, though we don't know if any algorithm exists which can
do it in that bound from this proof of choice trees.

I'm not totally sure what the choice tree stuff is in this context
actually -- could you elaborate?

For what it's worth, btw, I think you mean Omega(n^2) and not O(n^2) --
since as you (correctly) state it's a lower bound:

http://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations

But that's nitpicking on my part (and I'm far from being an expert on
complexity analysis, so I should perhaps refrain from it!).
Also, as mentioned in the wiki link, the best we've been able to do
(for the given instruction set) is about n ^ ((log base 2) of (7)).

Apparently there are asymptotically better algorithms that have been
discovered, e.g.

http://en.wikipedia.org/wiki/Coppersmith–Winograd_algorithm

However, this one (at least) isn't used in practice, because you have to
have really large matrices to make it worthwhile, and current hardware
can't handle matrices of the size required.

Cheers,
Stu
 
J

Joshua Maurice

No offence taken :) I should have been more clear.



I'm not totally sure what the choice tree stuff is in this context
actually -- could you elaborate?

The best example I can think of is for proving lower bounds on sorting
algorithms. Let's take an array of N elements. For arbitrary elements,
we know that any permutation of the input is possibly the correct
output. If we model our program as being able to "choose" a subset of
the remaining possibles at every step, then we end up with our program
being modeled by a binary tree, with time going from the root to the
leaves, and each leaf representing a single possible output
permutation. The middle nodes represent subsets of size bigger than 1,
but less than the whole. Each step of the algorithm removes some of
the possible answers from the answer space until only the correct
answer remains.

It's been a while, so I'm slightly fuzzy on the exact formalism, so I
know I'm explaining it badly. I /think/ that you can say, for unknown
input, you can calculate the size of the answer space, (size X), and
for "standard instruction sets", each "step" can only take one branch
in the binary decision tree, so the minimum number of steps depends on
the height of a balanced binary tree with X leafs.

I'm not sure it exactly applied to the reasoning I attempted to use. I
think I used something analogous. I know that the algorithm must at
least read every single input cell, each read takes O(1) cycle, and
there are O(N^2) cycles, so the algorithm running time in cycles must
be at least O(N^2).
For what it's worth, btw, I think you mean Omega(n^2) and not O(n^2) --
since as you (correctly) state it's a lower bound:

http://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80....

But that's nitpicking on my part (and I'm far from being an expert on
complexity analysis, so I should perhaps refrain from it!).

I was being somewhat informal, and I'm not really quite sure myself. I
know what it means for a function to be Big O and Big Theta, but I'm
not sure of the exact standard commonly used. Let me take a stab at
something a little more formal:

For an instruction set in which you can read at most a fixed finite
number of matrix cells per "cycle", any correct matrix multiplication
implementation for two N by N matrices must have its running time
describable as a function f : input size -> number of cycles, where f
is little-o (N^2).
Apparently there are asymptotically better algorithms that have been
discovered, e.g.

http://en.wikipedia.org/wiki/Coppersmith–Winograd_algorithm

However, this one (at least) isn't used in practice, because you have to
have really large matrices to make it worthwhile, and current hardware
can't handle matrices of the size required.

Mmm. I was unaware. I was just going off what I read off the matrix
multiply wiki page, and we all know how reliable that is.
 
S

Stuart Golodetz

Joshua said:
The best example I can think of is for proving lower bounds on sorting
algorithms. Let's take an array of N elements. For arbitrary elements,
we know that any permutation of the input is possibly the correct
output. If we model our program as being able to "choose" a subset of
the remaining possibles at every step, then we end up with our program
being modeled by a binary tree, with time going from the root to the
leaves, and each leaf representing a single possible output
permutation. The middle nodes represent subsets of size bigger than 1,
but less than the whole. Each step of the algorithm removes some of
the possible answers from the answer space until only the correct
answer remains.

It's been a while, so I'm slightly fuzzy on the exact formalism, so I
know I'm explaining it badly. I /think/ that you can say, for unknown
input, you can calculate the size of the answer space, (size X), and
for "standard instruction sets", each "step" can only take one branch
in the binary decision tree, so the minimum number of steps depends on
the height of a balanced binary tree with X leafs.

Thanks, I think I sort of see what you're getting at -- would need to
look at it in more detail to properly understand it though I think
(perhaps when it's not 1am :)) I should add that the explanation itself
isn't the problem so much as the ability of my tired brain to understand it!
I'm not sure it exactly applied to the reasoning I attempted to use. I
think I used something analogous. I know that the algorithm must at
least read every single input cell, each read takes O(1) cycle, and
there are O(N^2) cycles, so the algorithm running time in cycles must
be at least O(N^2).


I was being somewhat informal, and I'm not really quite sure myself. I
know what it means for a function to be Big O and Big Theta, but I'm
not sure of the exact standard commonly used. Let me take a stab at
something a little more formal:

For an instruction set in which you can read at most a fixed finite
number of matrix cells per "cycle", any correct matrix multiplication
implementation for two N by N matrices must have its running time
describable as a function f : input size -> number of cycles, where f
is little-o (N^2).

To the best of my knowledge, little-o (as I've seen it used) is still an
upper bound -- specifically, it's a strict upper bound, whereas Big-O is
a weak one. For instance, f is O(n^2) means that f is no worse than
quadratic, whereas f is o(n^2) means that f is better than quadratic.
For lower bounds, Omega(n^2) means that f is no better than quadratic,
whereas omega(n^2) means that f is worse than quadratic. In this case,
it's clear that matrix multiplication is Omega(n^2), i.e. no better than
quadratic, but it's not clear at first glance that it's omega(n^2) --
although it may well be.
Mmm. I was unaware. I was just going off what I read off the matrix
multiply wiki page, and we all know how reliable that is.

Likewise :) Just a different wiki page to the one you were reading -- I
guess there should have been a link between the two which someone forgot
to add. Just thought I'd give you the link in case you were interested(!)

Best,
Stu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top