Negating a List Of Numbers

E

Eduardo Bezerra

Hi,

I'm looking for an efficient way to create the oposite of a
list of numbers.

For example, suppose I have this list of numbers:

100
200
300

I want to generate a list of all numbers between 1 and 999 that
exclude the numbers in this list.

Any suggestions are welcome.

Regards,
Eduardo
 
V

velthuijsen

Store the list of numbers in a vector, build a predicate function which
contains the list of numbers and returns true if something is compared
to one of these numbers, create an empty vector , then perform
remove_copy_if on the vector with the numbers with as target the empty
vector. Your (once) empty vector should now contain all the values not
on the list.
 
A

Andrew Koenig

I'm looking for an efficient way to create the oposite of a
list of numbers.

For example, suppose I have this list of numbers:

100
200
300

I want to generate a list of all numbers between 1 and 999 that
exclude the numbers in this list.

I think it's impossible to do this without executing a number of operations
proportional to the domain (here [1,999]). My reasoning is that if n is the
size of the domain and m is the number of elements in the list, merely
building the result must take at least n-m operations (because it has that
many elements) and checking the input must take m operations (because you
have to examine every element of the input at least once in order to know
what it is). So you must execute at least O(n) operations.

The question, then, is how close to O(n) you need to be. There is an easy
way to solve the problem in O(n+log(m)) operations: Sort the input, then
write a loop along the following lines:

vector<int>::const_iterator it = input.begin();

for (int i = 1; i <= 999; ++i) {
if (it != input.end() && i == *it)
++it;
else
output.push_back(i);
}

So my question is: Is this solution good enough? If not, what's not good
enough about it and what would you consider good enough?
 
X

Xepo

I think using sets would be the way to go about this. The best way to
go about it, as I see, is this:
------------------------------------
set<int> exclude, result;

//Fill the exclude set with the list of integers you want to exclude

for(int i=0; i<1000; i++)
if (!exclude.count(i))
result.insert(i);
------------------------------------

and then there's the insane way, that actually provides very close to
zero performance benefits (maybe even worse depending on your
compiler), but I feel like posting cause I think it's cool:
------------------------------------
/* You shouldn't *actually* use this code */
class int_it : public iterator<input_iterator_tag, int> {
public:
int i;
int_it(int c):i(c) {};
inline void operator++() { i++; };
inline int operator*() { return i; };
inline bool operator!=(const int_it &second) { return second.i != i; };
};
int main()
{
set<int> exclude, result;
int_it start_i(1), end_i(1000);

//Fill the exclude set with the list of integers you want to exclude

set_difference(start_i, end_i, exclude.begin(), exclude.end(),
inserter<int>(result, result.begin()));
}
------------------------------------

I think the first way is about as efficient as you can get, assuming
you don't know the list of excluded numbers beforehand.

If you do know the numbers beforehand and really care about the
slightest bit of efficiency (and I'm talking slight, assuming you're
not working with multi-megabyte sets), then you could use the programs
which generate a perfect hash function to generate a mathematical
formula to determine if a number should be included or not, but that's
really splitting hairs.
 
M

Mark P

Andrew said:
I'm looking for an efficient way to create the oposite of a
list of numbers.

For example, suppose I have this list of numbers:

100
200
300

I want to generate a list of all numbers between 1 and 999 that
exclude the numbers in this list.


I think it's impossible to do this without executing a number of operations
proportional to the domain (here [1,999]). My reasoning is that if n is the
size of the domain and m is the number of elements in the list, merely
building the result must take at least n-m operations (because it has that
many elements) and checking the input must take m operations (because you
have to examine every element of the input at least once in order to know
what it is). So you must execute at least O(n) operations.

The question, then, is how close to O(n) you need to be. There is an easy
way to solve the problem in O(n+log(m)) operations: Sort the input, then
write a loop along the following lines:

vector<int>::const_iterator it = input.begin();

for (int i = 1; i <= 999; ++i) {
if (it != input.end() && i == *it)
++it;
else
output.push_back(i);
}

So my question is: Is this solution good enough? If not, what's not good
enough about it and what would you consider good enough?

That works but O(n) is easily achievable too. Create a vector with 999
elements and mark each position corresponding to an input value. Then
iterate through the vector and write out the index of each unmarked
vector element.

Interestingly, this same idea (complemented) allows the list of input
values to be sorted in O(n) time. It's left as an exercise for the
reader to explain why this doesn't violate "conventional wisdom" about
the complexity of sorting.

-Mark
 
C

CrayzeeWulf

Andrew said:
So my question is: Is this solution good enough? If not, what's not good
enough about it and what would you consider good enough?

vector<int>::const_iterator it = input.begin() ;
int i = 1 ;
while( it != input.end() )
{
for(; ( i < *it ) && ( i <= 999 ); ++i )
{
output.push_back( i ) ;
}
++it ;
++i ;
}
for(; ( i <= 999 ); ++i )
{
output.push_back( i ) ;
}

Is that O(n) ?
 
M

Mark P

CrayzeeWulf said:
Andrew Koenig wrote:




vector<int>::const_iterator it = input.begin() ;
int i = 1 ;
while( it != input.end() )
{
for(; ( i < *it ) && ( i <= 999 ); ++i )
{
output.push_back( i ) ;
}
++it ;
++i ;
}
for(; ( i <= 999 ); ++i )
{
output.push_back( i ) ;
}

Is that O(n) ?

Yes, but it doesn't work unless the input is sorted.
 
K

Kannan Goundan

Like Mark P said, you can take advantage of the fact that your range is
relatively small and that integers are indexable. If "m" is the size
of the original list of numbers, and "n" is the size of the output
range, you can do it in O(m+n) time and O(n) space.

// read in original list
bool[] exclude = new bool[100];
for (int i = 0; i < 1000; i++) exclude = false;
while (more_numbers()) {
exclude[read_number()] = true;
}

// create new list
for (int i = 0; i < 1000; i++) {
if (!exclude) output.push_back(i)
}

It's basically the same algorithm, but now your "exclude" set is O(1)
for insert/lookup.
 
E

Eduardo Bezerra

Andrew Koenig said:
I'm looking for an efficient way to create the oposite of a
list of numbers.

For example, suppose I have this list of numbers:

100
200
300

I want to generate a list of all numbers between 1 and 999 that
exclude the numbers in this list.

I think it's impossible to do this without executing a number of operations
proportional to the domain (here [1,999]). My reasoning is that if n is the
size of the domain and m is the number of elements in the list, merely
building the result must take at least n-m operations (because it has that
many elements) and checking the input must take m operations (because you
have to examine every element of the input at least once in order to know
what it is). So you must execute at least O(n) operations.

The question, then, is how close to O(n) you need to be. There is an easy
way to solve the problem in O(n+log(m)) operations: Sort the input, then
write a loop along the following lines:

vector<int>::const_iterator it = input.begin();

for (int i = 1; i <= 999; ++i) {
if (it != input.end() && i == *it)
++it;
else
output.push_back(i);
}

So my question is: Is this solution good enough? If not, what's not good
enough about it and what would you consider good enough?

Mr. Koening

Actually I oversimplified my question, so let me refine it. The list is not
a list of integers, in reallity it's a list of strings numbers with a maximun
lenght of 15 characters.

For example:

"526"
"5261"
"526212"
"52621"

I was thinking about a solution based on a tree where any string number
not matching one of the tree element would be out. But I'm having trouble
implementing it. To tell you the truth I'm a little lost in here.

Best regards,
Eduardo
 
M

Mark P

Eduardo said:
Actually I oversimplified my question, so let me refine it. The list is not
a list of integers, in reallity it's a list of strings numbers with a maximun
lenght of 15 characters.

For example:

"526"
"5261"
"526212"
"52621"

So what are you trying to find? All strings of length <= 15 that aren't
in your list? That would be an enormous amount of data so I assume it's
something else. But what?
 
K

Karl Heinz Buchegger

Mark said:
So what are you trying to find? All strings of length <= 15 that aren't
in your list? That would be an enormous amount of data so I assume it's
something else. But what?

My guess is, he is on to 'set operations'.
Given two sets of items: A and B, find the difference A - B
 
E

Eduardo Bezerra

Mark P said:
So what are you trying to find? All strings of length <= 15 that aren't
in your list? That would be an enormous amount of data so I assume it's
something else. But what?

Hello Mark,

I took some time to return to this thread because the requirements
of my problem changed and now it is settled.

I need to generate a list of the numbers that are not an exact match
and also do not start with any of the numbers in a list.

For example, given a list of string numbers such as:

"12"
"5625"
"56254"
"562542"

For the above given list, the resultant "barring" list of string
numbers would be:

"0" "1" "2" "3" "4" "6" "7" "8" "9
"10" "11" "13" "14" "15" "16" "17" "18" "19"
"50" "51" "52" "53" "54" "55" "57" "58" "59"
"560" "561" "563" "564" "565" "566" "567" "568" "569"
"5620" "5621" "5622" "5623" "5624" "5626" "5627" "5628" "5629"

This generation much be very fast because it is for a real time
system.

Any comments, suggestions are much appreciated.

Best regards,
Eduardo
 
K

Karl Heinz Buchegger

Eduardo said:
Hello Mark,

I took some time to return to this thread because the requirements
of my problem changed and now it is settled.

I need to generate a list of the numbers that are not an exact match
and also do not start with any of the numbers in a list.

For example, given a list of string numbers such as:

"12"
"5625"
"56254"
"562542"

For the above given list, the resultant "barring" list of string
numbers would be:

"0" "1" "2" "3" "4" "6" "7" "8" "9
"10" "11" "13" "14" "15" "16" "17" "18" "19"
"50" "51" "52" "53" "54" "55" "57" "58" "59"
"560" "561" "563" "564" "565" "566" "567" "568" "569"
"5620" "5621" "5622" "5623" "5624" "5626" "5627" "5628" "5629"

This generation much be very fast because it is for a real time
system.

But it still is an enormous list!
Earlier you mentioned 15 characters. Is that still true?
So your above resultant list would also include:
"500"
"501" .. "599"
"5000"
"5001" .. "5999"
"50000"
"50001" .. "59999"
"500000"
....
"5000000"
....

That would be a pretty *huge* list!
 
K

Karl Heinz Buchegger

Eduardo said:
Hello Mark,

I took some time to return to this thread because the requirements
of my problem changed and now it is settled.

I need to generate a list of the numbers that are not an exact match
and also do not start with any of the numbers in a list.

For example, given a list of string numbers such as:

"12"
"5625"
"56254"
"562542"

For the above given list, the resultant "barring" list of string
numbers would be:

"0" "1" "2" "3" "4" "6" "7" "8" "9
"10" "11" "13" "14" "15" "16" "17" "18" "19"
"50" "51" "52" "53" "54" "55" "57" "58" "59"
"560" "561" "563" "564" "565" "566" "567" "568" "569"
"5620" "5621" "5622" "5623" "5624" "5626" "5627" "5628" "5629"

That 'list' is not complete, is it?
Eg. Why isn't "20" in the list. According to your description it should!
This generation much be very fast because it is for a real time
system.

But it still is an enormous list!
Earlier you mentioned 15 characters. Is that still true?
So your above resultant list would also include:
"500"
"501" .. "599"
"5000"
"5001" .. "5999"
"50000"
"50001" .. "59999"
"500000"
....
"5000000"
....

That would be a pretty *huge* list!
 
J

Jeff Flinn

Eduardo Bezerra said:
....

I need to generate a list of the numbers that are not an exact match

What do you mean by "generate"?
and also do not start with any of the numbers in a list.

For example, given a list of string numbers such as:

"12"
"5625"
"56254"
"562542"

For the above given list, the resultant "barring" list of string
numbers would be:

"0" "1" "2" "3" "4" "6" "7" "8" "9
"10" "11" "13" "14" "15" "16" "17" "18" "19"
"50" "51" "52" "53" "54" "55" "57" "58" "59"
"560" "561" "563" "564" "565" "566" "567" "568" "569"
"5620" "5621" "5622" "5623" "5624" "5626" "5627" "5628" "5629"

This generation much be very fast because it is for a real time
system.

Leaving aside the issues raised in other responses. How will you use this
information? Are you going to be printing/displaying it? Are you going to be
traversing it from beginning to end? Are you just using it to test if a
value is a member?

Answering these questions would help lead to a specific solution. IMO, the
general approach, is to store the excluded values in a set. Treating them as
delimiters of intervals of values. I'd make the data type an integer(of
appropriate size for 15 decimal digits) and convert to std::string when
needed.

Jeff Flinn
 
E

Eduardo Bezerra

Karl Heinz Buchegger said:
That 'list' is not complete, is it?
Eg. Why isn't "20" in the list. According to your description it should!


But it still is an enormous list!
Earlier you mentioned 15 characters. Is that still true?
So your above resultant list would also include:
"500"
"501" .. "599"
"5000"
"5001" .. "5999"
"50000"
"50001" .. "59999"
"500000"
...
"5000000"
...

That would be a pretty *huge* list!

That 'list' is not complete, is it?
Eg. Why isn't "20" in the list. According to your description it should!

Because there is no perfect match for "20" in the list and no number
in the list starts with (or contains) "20"
 
E

Eduardo Bezerra

Jeff Flinn said:
What do you mean by "generate"?

By "generate" I mean create or build the "barring" list.
Leaving aside the issues raised in other responses. How will you use this
information? Are you going to be printing/displaying it? Are you going to be
traversing it from beginning to end? Are you just using it to test if a
value is a member?

I'll be storing the result in a container, problably a set. It's not just
a test to see if a number is valid in the list, the goal is to build
the "barring" list.
 
E

Eduardo Bezerra

Karl Heinz Buchegger said:
That 'list' is not complete, is it?
Eg. Why isn't "20" in the list. According to your description it should!


But it still is an enormous list!
Earlier you mentioned 15 characters. Is that still true?
So your above resultant list would also include:
"500"
"501" .. "599"
"5000"
"5001" .. "5999"
"50000"
"50001" .. "59999"
"500000"
...
"5000000"
...

That would be a pretty *huge* list!

Continuing the explanation, "20" is not in the "barring" list because
any number starting with "2" is already excluded:

"0" "1" "2" "3" "4" "6" "7" "8" "9
^^^
 
J

Jeff Flinn

Eduardo Bezerra said:
By "generate" I mean create or build the "barring" list.


I'll be storing the result in a container, problably a set. It's not just
a test to see if a number is valid in the list, the goal is to build
the "barring" list.

And how do you plan on using this barring list? My point is that you may not
need to fully construct this list, which is probably antithetical to
real-time constraints. If the final use is to check if something is or is
not valid, you only need a function that provides that facility. If you do
need to present a user with values, probably only need to display a sub
range of values, perhaps implemented as an input iterator.
 
J

Jean-Sebastien Samson

For example, given a list of string numbers such as:
"12"
"5625"
"56254"
"562542"

For the above given list, the resultant "barring" list of string
numbers would be:

"0" "1" "2" "3" "4" "6" "7" "8" "9
"10" "11" "13" "14" "15" "16" "17" "18" "19"
"50" "51" "52" "53" "54" "55" "57" "58" "59"
"560" "561" "563" "564" "565" "566" "567" "568" "569"
"5620" "5621" "5622" "5623" "5624" "5626" "5627" "5628" "5629"

As I understand your problem, and assuming you have an "epsilon" value
representing a void string in the list, you need to find all the strings s
where the substring s(0,length(s)-2) is a prefix in the list and s not a
prefix in the list.

I think you could proceed that way:

1. Build a prefix tree P. (linear in the number of chars in your strings -
"epsilon" should be your root)
Each node N of P is tagged with the char it represents.
2. for each node N in P (width-first traversal)
2.1 for each char C in '0'..'9'
2.1.1 if N has a child tagged C then continue;
2.1.2 add the string made of all node from root to N + C to your list

The generation of the strings is linear in the size of the output which
might actually be exponential. I think you cannot do faster as you are bound
to the size of the output.

For instance, with the list 12, 5625, 56254, 562542 you would have the tree:

Node("", [
Node("1",[Node("2",[])]),
Node("5",[Node("6",[Node("2",[Node("5",[Node("4",[Node("2",[])])])])])])
])

You would start with node Node("",[...]), and add "0", "2", "3", ... (not
"1" and "5" since they are children of "")
Then you go with Node("1",[...]), and add "10", "11", "13", ... (not "2"
since it is a child of "1").
Then you go with Node("5",[...]), etc...
Then with Node("6") (and prefix "56")

If you don't want to have the values "120", "121", ... and so on you need
also to check if the current node is a leaf or not.

I hope it helps.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top