string letter distribution algorithm

U

Ulterior

Hi, everyone,

I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

Thank you in advance.

Cheers.,
Ulterior
 
V

Victor Bazarov

Ulterior said:
I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'.

Just curious, out of the SIX letters, which one is non-unique?
word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

This is not really a C++ language question, is it?

To determine if a word has a certain letter all you need is a bit mask
with each bit representing the letter. An unsigned long would do.
Then use the bitwise AND operator to see if the bit designated to 't'
is set.

V
 
B

BobR

Ulterior said:
Hi, everyone,
I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

All 'strings' are made of 'integers', (a 'char' is an integer type).
// cast is to 'print' number, not character. (short, long, etc. ok)
std::cout<<"\'A\'="<< static_cast<int>('A') <<std::endl;
// out: 'A'=65
( you may get a different number on your system.)
I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

That's clear as mud.

int anT( 't' );
std::cout<<"anT="<<anT<<std::endl;
// out: anT=116

int anS( 'c'+'a'+'t'+'p'+'o'+'w' );
std::cout<<"anS="<<anS<<std::endl;
// out: anS=654

Now you want to find 116 in 654?

int anS2( 'c'+'a'+'p'+'o'+'w' ); // no 't'
if( ( anS - anS2 ) == anT ){
std::cout<<"\'"<< char( anT & 0xFF ) <<"\' found.\n"; // or 0x7F
}
// out: 't' found.


{
std::string wrd("catpow"); // #include <string>
for( std::size_t in(0); in < wrd.size(); ++in){
if( int( wrd.at( in ) ) == int( 't' ) ){
std::cout<<"letter \'t\' found"<<std::endl;
}
else{ std::cout<<"letter \'t\' NOT found"<<std::endl;}
}
}
/* - output -
letter 't' NOT found
letter 't' NOT found
letter 't' found
letter 't' NOT found
letter 't' NOT found
letter 't' NOT found
*/
I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

Good luck with that.

{
std::vector<std::string> words; // #include <vector>
words.push_back("hello");
words.push_back("world");
words.push_back("catpow");
words.push_back("clarity");
std::string FindThis( "t" );
for( std::size_t in(0); in < words.size(); ++in ){
if( words.at( in ).find( FindThis ) != std::string::npos ){
std::cout<<words.at( in )<<std::endl;
} // if(find)
} // for(in)
}
// out: catpow
// out: clarity

Now, tell us what you really really want.
 
A

Anand Hariharan

Hi, everyone,

I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

You might consider std::bitset<26>, and its to_ulong member for your
value.

Try comp.programming.

- Anand
 
J

Jim Langston

Ulterior said:
Hi, everyone,

I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

Thank you in advance.

Here is something I threw together. Not sure if it's what you want, and
it's not optimized and is pretty much just C code.

#include <iostream>
#include <map>
#include <cmath>

int WordLetters( const char* Word )
{
int Result = 0;
for ( int i = 0; Word != 0; ++i )
{
unsigned char LetterValue = static_cast<unsigned char>( toupper(
Word ) );
// Ignore special characters, only treat characters
if ( LetterValue >= 'A' && LetterValue <= 'Z' )
Result |= static_cast<int>( std::pow( 2, LetterValue - 'A' ) );
}
return Result;
}

bool LetterInWord( char Letter, int WordValue )
{
unsigned char LetterValue = static_cast<unsigned char>( toupper(
Letter ) );
// Ignore special characters, only treat characters
if ( LetterValue >= 'A' && LetterValue <= 'Z' )
return (WordValue & static_cast<int>( std::pow(2, LetterValue -
'A' ))) != 0;
else
return false;
}

int main()
{
if ( sizeof(int) < 4 )
{
std::cout << "int is not big enough. try long int or something" <<
"\n";
return 1;
}

int Catpow = WordLetters( "Catpow" );
std::cout << "X in Catpow: " << LetterInWord( 'X', Catpow ) << "\n";
std::cout << "T in Catpow: " << LetterInWord( 'T', Catpow ) << "\n";
std::cout << "C in Catpow: " << LetterInWord( 'C', Catpow ) << "\n";
std::cout << "W in Catpow: " << LetterInWord( 'w', Catpow ) << "\n";

return 0;
}

Basically all I'm doing is converting each letter 'A' through 'Z' to a
number 1 thorugh 26. Then I take it as a binary place. Which means that I
need an ordinal value with at least 26 bits, for my system int fits that
since it has 32 bits. Then I simply set the bit on for that place.
 
U

Ulterior

Hi, guys,

thanks for your kind answers, but, if you are eager to help me
further,
I will try to clarify my problem.

The point is to evaluate a string to some single value, store it in
database field
and the, using simple sql compare, retrieve only those records with
fields, which have 't'
letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.

Rgrds,
Ulterior
 
J

Jim Langston

Ulterior said:
Hi, guys,

thanks for your kind answers, but, if you are eager to help me
further,
I will try to clarify my problem.

The point is to evaluate a string to some single value, store it in
database field
and the, using simple sql compare, retrieve only those records with
fields, which have 't'
letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.

"single value" "simple sql compare" Well, that's a bit tricky. If SQL does
bit checking then you can just retrieve the bit value of 't' using some
function, and run some SQL like this (if thsi was legal SQl)

select * from MyTable where WordLetters | LetterValue

If SQL doesn't do bitwise checking, you'll probalby need to come up with
some other way.

This now sounds more like a SQL problem then a C++ problem, perhaps a SQL
newsgroup would be more help.
 
A

Anand Hariharan

Hi, guys,

thanks for your kind answers, but, if you are eager to help me
further,
I will try to clarify my problem.

The point is to evaluate a string to some single value, store it in
database field
and the, using simple sql compare, retrieve only those records with
fields, which have 't'
letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.

SELECT * FROM table_name WHERE field_name LIKE '%t%'

- Anand
 
J

Jerry Coffin

[ ... ]
The point is to evaluate a string to some single value, store it in
database field and the, using simple sql compare, retrieve only those
records with fields, which have 't' letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.

Using (or not using) functions won't make any real difference here. For
what you're doing, CPU time is virtually guaranteed to mean nothing
compared to I/O time -- if you use a dozen functions to avoid even a
little bit of I/O, it'll be a win.

Since SQL (as such) is off-topic, I'll discuss this in C++ terms, and
leave it to you to translate over to SQL (though the translation is
_mostly_ fairly trivial).

I'm going to assume that you're willing to consume some time up-front to
build an index that gives you really fast retrieval later. In that case,
let's represent your database as a vector of records:

class record {
std::string key;
void *data; // this just represents whatever other
// data goes with your key.
};

std::vector<record> database;

Now, our index is going to be a vector indexed by a character, and what
it returns as a result will be a set of records that have that character
in the key:

typedef std::set<std::size_t> record_set;

std::vector<record_set> index;

Now, we need to build our index from our data. We do that by looking at
each item in the database, and adding its record number to the
appropriate places in the index:

// for each item in the database
for (int i=0; i<database.size(); i++) {

std::string const &s = database.key;

// for each character in the key:
for (int j=0; j<key.size(); j++) {
char ch = std::toupper(key[j]);
if (std::isupper(ch))
index[ch].insert(std::toupper(i));
}
}

Now, "index['t']" is the set of all records that have 't' somewhere in
their string. For example, to print out the record numbers of all the
records that have 't' in their key, we could do something like this:

record_set records = index['t'];

std::copy(records.begin(), records.end(),
std::eek:stream_iterator<std::size_t>(std::cout, "\n"));

Now, as to the conversion to SQL: the main thing is that you don't have
record numbers, so you typically include a field that acts as a unique
key. You'll want to tell your database to index on that field. IIRC, SQL
doesn't have a convenient way of representing a set like I've used
either. You can do that pretty easily with a two-column table, with the
letter in one column and a single record number in the right column.
With the left column indexed, retrieving the set of record numbers with
a particular letter is roughly equivalent to what I've written above.

I should also point out that if a given query is likely to retrieve most
of the records in the database, the index may not do much good. Its
whole point is to read only records you care about. The more records it
avoids reading, the more good it does. If you retrieve all or most of
the records anyway, the index just adds overhead.
 
J

Jerry Coffin

[ ... ]
for (int i=0; i<database.size(); i++) {

std::string const &s = database.key;

// for each character in the key:
for (int j=0; j<key.size(); j++) {
char ch = std::toupper(key[j]);
if (std::isupper(ch))
index[ch].insert(std::toupper(i));


Oops -- that should, of course, be:
index[ch].insert(i);

My apologies for being an idiot. I should also add that I left out a few
things like resizing the index to be as large as needed (since the C++
version is really intended as something like pseudo-code for a SQL
version, not for use as-is).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top