Conversion of a number from string to vector<int>

A

Anonymous

Hello,

Do anyone want to write an efficient function for converting a
non-negative arbitrary-precision number in base 10 from string to
std::vector<int>. The vector must represent the number in base B, where
B is int and arbitrary. End each element in the vector represents the
digit of the number in base B. The most significative digit must be on
the top of the vector. The code must be portable and must not rely on
types greater than int. Only the std library is allowed.

For example:

std::vector<int> v = f("253", 127);

would give

v[0] = 126
v[1] = 1


thanks.
 
P

Paul

Anonymous said:
Hello,

Do anyone want to write an efficient function for converting a
non-negative arbitrary-precision number in base 10 from string to
std::vector<int>. The vector must represent the number in base B, where B
is int and arbitrary. End each element in the vector represents the digit
of the number in base B. The most significative digit must be on the top
of the vector. The code must be portable and must not rely on types
greater than int. Only the std library is allowed.

For example:

std::vector<int> v = f("253", 127);

would give

v[0] = 126
v[1] = 1
There is a function called atoi that may help you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/

HTH
 
I

Ian Collins

Hello,

Do anyone want to write an efficient function for converting a
non-negative arbitrary-precision number in base 10 from string to
std::vector<int>. The vector must represent the number in base B, where
B is int and arbitrary. End each element in the vector represents the
digit of the number in base B. The most significative digit must be on
the top of the vector. The code must be portable and must not rely on
types greater than int. Only the std library is allowed.

Homework?

Have you looked at strtol and friends?
 
J

Juha Nieminen

Paul said:
Do anyone want to write an efficient function for converting a
non-negative arbitrary-precision number in base 10 from string to
std::vector<int>. The vector must represent the number in base B, where B
is int and arbitrary. End each element in the vector represents the digit
of the number in base B. The most significative digit must be on the top
of the vector. The code must be portable and must not rely on types
greater than int. Only the std library is allowed.

For example:

std::vector<int> v = f("253", 127);

would give

v[0] = 126
v[1] = 1
There is a function called atoi that may help you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/

Your incompetence and comprehension capabilities never cease to amuse.

Care to actually give us actual code on how atoi() can be used for this
task? (Hint: It can't.)
 
J

Juha Nieminen

Anonymous said:
Do anyone want to write an efficient function for converting a
non-negative arbitrary-precision number in base 10 from string to
std::vector<int>. The vector must represent the number in base B, where
B is int and arbitrary. End each element in the vector represents the
digit of the number in base B. The most significative digit must be on
the top of the vector. The code must be portable and must not rely on
types greater than int. Only the std library is allowed.

Maybe it's not the most efficient solution that could be, but it should
be efficient enough, as well as easy: Interpret the last character in the
string and convert it to its equivalent value between 0 and 9 (IIRC the
standard even guarantees that the characters '0' through '9' will always
be contiguous, so you can do a simple "character - '0'") and assign it to
a variable. Then take the second-to-last character and add it likewise to
the character, but multiplied by 10, then the third-to-last, multiplied by
100 and so on. After each such addition check if the value in the variable
exceeds B, and if so, add the variable value module B to the vector, divide
the variable by B, and then start over (adding the next character, then
the next one multiplied by 10 and so on).

(Disclaimer: I haven't tested the algorithm in any way.)
 
P

Paul

Juha Nieminen said:
Paul said:
Do anyone want to write an efficient function for converting a
non-negative arbitrary-precision number in base 10 from string to
std::vector<int>. The vector must represent the number in base B, where
B
is int and arbitrary. End each element in the vector represents the
digit
of the number in base B. The most significative digit must be on the top
of the vector. The code must be portable and must not rely on types
greater than int. Only the std library is allowed.

For example:

std::vector<int> v = f("253", 127);

would give

v[0] = 126
v[1] = 1
There is a function called atoi that may help you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/

Your incompetence and comprehension capabilities never cease to amuse.

Care to actually give us actual code on how atoi() can be used for this
task? (Hint: It can't.)
He seems to be trying to convert a string to an int, this is what atoi does.
What is incompetent about trying to provide a helpfull suggestion?
 
K

Kai-Uwe Bux

Juha said:
Maybe it's not the most efficient solution that could be, but it should
be efficient enough, as well as easy: Interpret the last character in the
string and convert it to its equivalent value between 0 and 9 (IIRC the
standard even guarantees that the characters '0' through '9' will always
be contiguous, so you can do a simple "character - '0'") and assign it to
a variable. Then take the second-to-last character and add it likewise to
the character, but multiplied by 10, then the third-to-last, multiplied by
100 and so on. After each such addition check if the value in the variable
exceeds B, and if so, add the variable value module B to the vector,
divide the variable by B, and then start over (adding the next character,
then the next one multiplied by 10 and so on).

(Disclaimer: I haven't tested the algorithm in any way.)

Consider going from base 10 to base 3:

1 -> 1
10 -> 101
100 -> 10201
...

As you can see, powers of 10 always end in 1. That implies:

1 -> ..1
11 -> ..2
111 -> ..0
1111 -> ..1
11111 -> ..2
...

I.e.: the last digit after conversion depend on _all_ digits of the input.
So, the step ".. add the variable value module B to the vector ..." cannot
just mean to append that value mod B and move on to the next entry in the
vector.

The Art of Computer Programming Vol 2, Chapter 4.4 by D.E. Knuth deals with
radix conversion; and your proposed method is very close to Method 1a. For
the problem at hand, it can be specialized as follows:

Given u = (...cba) in base 10, you compute U = (...xyz) in base B by

z = u mod B
y = floor(u/B) mod B
x = floor( floor(u/B) / B ) mod B

The computations on the RHS need to be carried out in multi-precision
arithmetic. This can be done in base 10 arithmetic as u is given in base 10.
This requires writing B in base 10, which is much simpler as B fits in an
int.


Best,

Kai-Uwe Bux
 
A

Anonymous

Paul ha scritto:
He seems to be trying to convert a string to an int, this is what atoi
does.
What is incompetent about trying to provide a helpfull suggestion?

Basically, I am improving a constructor for big integers passed as
strings by the user. The class provides basic math operations in
arbitrary precision. It is everything done. When I implemented the
constructor initially, every char of the string was a digit in a
vector<char>. This was not really efficient. Factorial("1000") required
about 70s on my machine. So I decided to "group" more chars into ints,
as (almost) many chars as possible, that is by building a vector<int>
from the given number. This actually is 7 times faster than before, but
is still not perfect, since not all the possible bits of the integers
are used. The reason is that each digit in the vector<int> is in base B,
where B is a power of 10, not of two:

class BIGINT {
// Bitset must be signed (for diff. operation).
typedef signed int Bitset;

// vector is 25 times faster than list or twice than deque.
typedef std::vector<Bitset> Sequence;

// -1 to avoid overflows in sum.
static const int DGTS = std::numeric_limits<Bitset>::digits10 - 1;

BIGINT(const char* p = 0) {
// ...
size_t l = strlen(p);
for (size_t i = 0; i < l;) {
Bitset x = 0, f = 1;
for (int j = 0; j < DGTS && i < l; j++, i++, f *= 10)
x += (p[l - i - 1] - '0') * f;
module.push_back(x);
}
// ...
}
}
 
A

Anonymous

Juha Nieminen ha scritto:
Maybe it's not the most efficient solution that could be, but it should
be efficient enough, as well as easy: Interpret the last character in the
string and convert it to its equivalent value between 0 and 9 (IIRC the
standard even guarantees that the characters '0' through '9' will always
be contiguous, so you can do a simple "character - '0'") and assign it to
a variable. Then take the second-to-last character and add it likewise to
the character, but multiplied by 10, then the third-to-last, multiplied by
100 and so on. After each such addition check if the value in the variable
exceeds B, and if so, add the variable value module B to the vector, divide
the variable by B, and then start over (adding the next character, then
the next one multiplied by 10 and so on).

(Disclaimer: I haven't tested the algorithm in any way.)

It's basically what I had done initially (see my previous thread). But I
would prefer base B, where B is power of two, not of 10, to profit by
all the possibile bits of the integer.

thanks
 
P

Paul

Anonymous said:
Paul ha scritto:
He seems to be trying to convert a string to an int, this is what atoi
does.
What is incompetent about trying to provide a helpfull suggestion?

Basically, I am improving a constructor for big integers passed as strings
by the user. The class provides basic math operations in arbitrary
precision. It is everything done. When I implemented the constructor
initially, every char of the string was a digit in a vector<char>. This
was not really efficient. Factorial("1000") required about 70s on my
machine. So I decided to "group" more chars into ints, as (almost) many
chars as possible, that is by building a vector<int> from the given
number. This actually is 7 times faster than before, but is still not
perfect, since not all the possible bits of the integers are used. The
reason is that each digit in the vector<int> is in base B, where B is a
power of 10, not of two:

class BIGINT {
// Bitset must be signed (for diff. operation).
typedef signed int Bitset;

// vector is 25 times faster than list or twice than deque.
typedef std::vector<Bitset> Sequence;

// -1 to avoid overflows in sum.
static const int DGTS = std::numeric_limits<Bitset>::digits10 - 1;

BIGINT(const char* p = 0) {
// ...
size_t l = strlen(p);
for (size_t i = 0; i < l;) {
Bitset x = 0, f = 1;
for (int j = 0; j < DGTS && i < l; j++, i++, f *= 10)
x += (p[l - i - 1] - '0') * f;
module.push_back(x);
}
// ...
}
}


TBH I am still not 100% sure about your problem. I don't think this will
solve your porblem but is it the sort of thing you mean but using bigger
integers?

#include <iostream>
#include <vector>
#include <math.h>

std::vector<unsigned> numbers(std::string str, double r){
std::vector<unsigned> v;
std::string::iterator it;
unsigned int temp=0;
int power=str.length()-1;
double dec=10;

for (it=str.begin(); it<str.end(); it++, --power){
temp += (*it&15)*pow(dec,power);
}
while(temp){
v.push_back(temp%(int)r);
temp = temp/r;
}
return v;
}

int main(){
std::string str = "253";
double radix = 127;
std::vector<unsigned> v = numbers(str, radix);

for(int i=0; i< v.size(); i++){std::cout<< v<<std::endl;}
}


But instead of taking a number like "253" you want to handle massive
integers which my temp variable wouldn't have the capacity for?
 
A

Anonymous

Paul ha scritto:
But instead of taking a number like "253" you want to handle massive
integers which my temp variable wouldn't have the capacity for?

Yes, in your example temp might overflow with enough big integers
 
P

Paul

Pete Becker said:
Use an unsigned int, and represent the values in base UINT_MAX + 1. That
uses all the bits.

To convert a text string, just use the obvious <g> approach:

set the current value to 0
set the current position in the string to the leftmost character
while the character at the current position is in '0'..'9'
multiply the current value by 10
add the value represented by the digit to the current value
move the current position one place to the right

Try it with pencil and paper a few times to get the feel of it.

--

Ok say the string is something ridiculously large like
345678912345678546789435678123432567664334343457788933333331

I make that 60 chars long. So how do we calculate the first UINT value?
Normally we would need to calculate:
3*10^59 % UINT_MAX+1

The above can be calculated by doing a decimal shift on the massive number,
and then multiplying the result so if we shift the massive number 50 places
to the right we need to multiply the reuslt by the amount shifted, 10^50.
For example:

2000 /8 =250;
2/8 = 0.25 * 10^3 //shifted only 3 places

But problem is we cannot multiply UINT_MAX * 10^50.
We lose precision if we divide away all our integers because we have yet to
calculate the remainder, so we need to keep the MAX_RADIX small yet large
enough to hold a massive integer without needing a million vector elements.
Then we still have long long for doing integer arithmetic without losing too
much precision.
That would require a vector of size 26 to store the massive 60 digit
integer.
64bit int is about 19 decimal digits long , so any string with more digits
than this will need to implement something like the decimal shift algorithm
I mentioned. But if its to be portable you cannot even expect that 64bit
int.

Maybe I am missing some other way of doing this, I am not sure in your
explanation where you say:
"set the current value to 0"
you lose me. The current value of what?
Can you maybe post a simple example?
 
A

Anonymous

Pete Becker ha scritto:
On 2011-06-19 07:13:27 -0400, Anonymous said:
Use an unsigned int, and represent the values in base UINT_MAX + 1. That
uses all the bits.

To convert a text string, just use the obvious <g> approach:

set the current value to 0
set the current position in the string to the leftmost character
while the character at the current position is in '0'..'9'
multiply the current value by 10
add the value represented by the digit to the current value
move the current position one place to the right

Try it with pencil and paper a few times to get the feel of it.

I don't think the algorithm you are describing can use all the bits
available. As I said in my previous thread, the algorithm you are
talking about, which is similar to the one I wrote initially, can
represent the number in base B, where int B is a power of 10. Since the
base it's a power of 10, it cannot profit by all the available bits in
the integer. In other words:

10^std::numeric_limits<unsigned int>::digits10 -1 <
2^std::numeric_limits<unsigned int>::digits - 1,

on my architecture:
10^9-1 < 2^32-1,
999999999 < 4294967295,

which is about two bits lost.

Below is the actual algorithm again:

#include <vector>
#include <limits>
#include <string>

typedef unsigned long Bitset; // the more is sizeof() , the more math
ops are fast
static const int DGTS = std::numeric_limits<Bitset>::digits10;

std::vector<Bitset> f(const char* p = 0) {
std::vector<Bitset> module;
size_t l = std::string(p).length();
for (size_t i = 0; i < l;) {
Bitset x = 0, f = 1;
for (int j = 0; j < DGTS && i < l; j++, i++, f *= 10)
x += (p[l - i - 1] - '0') * f;
module.push_back(x);
}
return module;
}
 
P

Paul

Anonymous said:
Pete Becker ha scritto:
On 2011-06-19 07:13:27 -0400, Anonymous said:
Use an unsigned int, and represent the values in base UINT_MAX + 1. That
uses all the bits.

To convert a text string, just use the obvious <g> approach:

set the current value to 0
set the current position in the string to the leftmost character
while the character at the current position is in '0'..'9'
multiply the current value by 10
add the value represented by the digit to the current value
move the current position one place to the right

Try it with pencil and paper a few times to get the feel of it.

I don't think the algorithm you are describing can use all the bits
available. As I said in my previous thread, the algorithm you are talking
about, which is similar to the one I wrote initially, can represent the
number in base B, where int B is a power of 10. Since the base it's a
power of 10, it cannot profit by all the available bits in the integer. In
other words:

10^std::numeric_limits<unsigned int>::digits10 -1 <
2^std::numeric_limits<unsigned int>::digits - 1,

on my architecture:
10^9-1 < 2^32-1,
999999999 < 4294967295,

which is about two bits lost.

Below is the actual algorithm again:

#include <vector>
#include <limits>
#include <string>

typedef unsigned long Bitset; // the more is sizeof() , the more math ops
are fast
static const int DGTS = std::numeric_limits<Bitset>::digits10;

std::vector<Bitset> f(const char* p = 0) {
std::vector<Bitset> module;
size_t l = std::string(p).length();
for (size_t i = 0; i < l;) {
Bitset x = 0, f = 1;
for (int j = 0; j < DGTS && i < l; j++, i++, f *= 10)
x += (p[l - i - 1] - '0') * f;
module.push_back(x);
}
return module;
}

You do not use Bitset to its full capactiy by limiting it on digits10. For
example a 8 bit char can represent a value range of 0...255 but limiting it
with digits10 it can only represent 0..99.
Imagining your Bitset was a byte for easy counting:
If you get two '1' chars your byte is full with its maximum int value of
11(restricted by digits10). You would have used less than 5% of its
potential 255 value. You could have squeezed 4 or 5 chars into that byte
instead of two.
 
P

Paul

Paul said:
Anonymous said:
Pete Becker ha scritto:
On 2011-06-19 07:13:27 -0400, Anonymous said:
Use an unsigned int, and represent the values in base UINT_MAX + 1. That
uses all the bits.

To convert a text string, just use the obvious <g> approach:

set the current value to 0
set the current position in the string to the leftmost character
while the character at the current position is in '0'..'9'
multiply the current value by 10
add the value represented by the digit to the current value
move the current position one place to the right

Try it with pencil and paper a few times to get the feel of it.

I don't think the algorithm you are describing can use all the bits
available. As I said in my previous thread, the algorithm you are talking
about, which is similar to the one I wrote initially, can represent the
number in base B, where int B is a power of 10. Since the base it's a
power of 10, it cannot profit by all the available bits in the integer.
In other words:

10^std::numeric_limits<unsigned int>::digits10 -1 <
2^std::numeric_limits<unsigned int>::digits - 1,

on my architecture:
10^9-1 < 2^32-1,
999999999 < 4294967295,

which is about two bits lost.

Below is the actual algorithm again:

#include <vector>
#include <limits>
#include <string>

typedef unsigned long Bitset; // the more is sizeof() , the more math ops
are fast
static const int DGTS = std::numeric_limits<Bitset>::digits10;

std::vector<Bitset> f(const char* p = 0) {
std::vector<Bitset> module;
size_t l = std::string(p).length();
for (size_t i = 0; i < l;) {
Bitset x = 0, f = 1;
for (int j = 0; j < DGTS && i < l; j++, i++, f *= 10)
x += (p[l - i - 1] - '0') * f;
module.push_back(x);
}
return module;
}

You do not use Bitset to its full capactiy by limiting it on digits10. For
example a 8 bit char can represent a value range of 0...255 but limiting
it with digits10 it can only represent 0..99.
Imagining your Bitset was a byte for easy counting:
If you get two '1' chars your byte is full with its maximum int value of
11(restricted by digits10). You would have used less than 5% of its
potential 255 value. You could have squeezed 4 or 5 chars into that byte
instead of two.
Err actaully you couldn't get anymore than 3 chars if you are not converting
it to a higher base..
 
J

Juha Nieminen

Paul said:
He seems to be trying to convert a string to an int,

No, he isn't. He is trying to convert a string containing a very large
(ascii representation of an) integer into a set of ints, which is something
atoi() won't do (nor can you easily even use it to implement the task in
question). If you tried to use atoi() for this, if the string represents
an integer larger than can fit in an int, he will get an incorrect answer.
What is incompetent about trying to provide a helpfull suggestion?

Because it was not helpful. atoi() cannot be used to solve the problem.
If you tried to give us some actual code you would see it yourself.
 
J

Juha Nieminen

Paul said:
double dec=10;

for (it=str.begin(); it<str.end(); it++, --power){
temp += (*it&15)*pow(dec,power);
}

You are using doubles to handle integers? Have you ever programmed in
a C family of languages? Do you understand the inherent rounding problems
associated with floating point values? (This is especially egregious since
the problem is solvable with integers, and the solution isn't any more
complicated.)
 
J

Juha Nieminen

Juha Nieminen said:
Maybe it's not the most efficient solution that could be, but it should
be efficient enough, as well as easy: Interpret the last character in the
string and convert it to its equivalent value between 0 and 9 (IIRC the
standard even guarantees that the characters '0' through '9' will always
be contiguous, so you can do a simple "character - '0'") and assign it to
a variable. Then take the second-to-last character and add it likewise to
the character, but multiplied by 10, then the third-to-last, multiplied by
100 and so on. After each such addition check if the value in the variable
exceeds B, and if so, add the variable value module B to the vector, divide
the variable by B, and then start over (adding the next character, then
the next one multiplied by 10 and so on).

Btw, this doesn't work if adding the next digit to the value would overflow
the variable, so it can't be used if what you want is to use all the bits in
an int as the modulo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top