Perl style hash

N

none

(Wow, did someone's spam-bot explode in this group, or what? I think it's
trying to become self aware...)


Hi all, I am wondering if anyone has implemented something like this, and
whether they'd be willing to share the code.

In perl, "hash" is a basic type. And the language supports some really
nice (and yes, easy to abuse) functionality, like:

1) Easy lookup of "nested" hashes:

$result = $my_hash{"root"}{"branch1"}{"branch2"}{"leaf"};

2) Initializing an element forces it into existence:

$my_hash{"root"}{"branch1"}{"branch2"}{"leaf"} = 5;

That second one will work even if "root", "branch1", "branch2", and "leaf"
don't exist before the assignment, or even if the entire hash is empty, or
even if "root" and "branch1" already exist with lots of other
children/siblings, but "branch2" doesn't yet exist. It will also work for
any assignment -- you could store an int, a float, a string, or even an
array of something as a leaf. And if "leaf" already existed as some other
type, it would just be replaced with the int assignment.

I'm sure something similar could be done in C++, probably using a wrapper
around std::map that provided operator overloading for maybe the []
operator, and I guess you'd need to overload the = operator to accept any
type of data that you might want to store in a leaf. Templates, maybe?
Before I set off on a week of banging my head against a wall, I thought I'd
ask here.

Thanks!
 
J

Joshua Maurice

none said:
I'm sure something similar could be done in C++, probably using a wrapper
around std::map that provided operator overloading for maybe the []
operator, and I guess you'd need to overload the = operator to accept any
type of data that you might want to store in a leaf.

No need to overload anything. Both the key or the value in a std::map can be
any class, with the only restriction that the key class must implement
operator<() (ignoring the hair-splitting requirement for the implementation
of copy constructors and assignment operators).

It's all a matter of implementing the two classes, with the appropriate
semantics. All you need to do is define a value class that implements
operator[] using your desired semantics.

Alternatively unordered_map in TR1 or various other hash maps. That
requires a little more work because they're not as supported
everywhere like std::map.
 
N

none

Sam said:
No need to overload anything. Both the key or the value in a std::map
can be any class, with the only restriction that the key class must
implement operator<() (ignoring the hair-splitting requirement for the
implementation of copy constructors and assignment operators).

Sure, but the type of the elements are determined at compile time. In
other words, you can't just do this:

std::map<std::string, Anything> x;

x["a float"] = 5.0f;
x["an int"] = 10;
x["a string"] = "some text";

And even if you could, you still wouldn't have the perl-style behavior that
I described. If you tried:

x["root"]["branch"]["leaf"] = 5;

the compiler would not know how to resolve the second (and third) set of
braces.

I think I have something workable. I've made the restriction that you can
only store strings in the leaves. So, instead of the code above, you would
have to do this:

x["root"]["branch"]["leaf"] = "5";

With that restriction, it's fairly straightforward. I created a
"DynamicHash" class containing a std::map, defined the [] operator to
return a "DynamicHash *", calling "new" if needed. So the above code would
recursively create the branches and leaves. Then I overloaded the =
operator to accept a std::string&, and the "5" basically becomes an empty
DynamicHash that can just be treated as a string. Overloading the typecast
(const char *) operator even lets me do this:

cout << x["root"]["branch"]["leaf"];

And that prints the "5". It isn't quite as flexible as perl, but it could
be if I took the time.

Thanks.
 
N

none

Sam said:
So? This calls for an overloaded Anything::eek:perator=():

class Anything {

public:
// ..

Anything &operator=(double);
Anything &operator=(int);
Anything &operator=(std::string);
};

Okee-dokee.
And even if you could, you still wouldn't have the perl-style
behavior that I described. If you tried:

x["root"]["branch"]["leaf"] = 5;

the compiler would not know how to resolve the second (and third) set
of braces.

Of course it would. It's merely a matter of implementing operator[]()
accordingly.


I like the simplicity of this approach. The way I'm doing it now, I'll
have to add an iterator class to my wrapper, etc, which stinks. Your way,
I'd just be using std::map directly.

But I still don't quite see how the second and third sets of [] would work.
The first one, x["root"], is going to return an Antyhing&. So, class
Anything would have to also have a [] operator. How would the returned
Anything object access "x", which would be of type std::map?
 
S

Stuart Redmann

Sam said:
So? This calls for an overloaded Anything::eek:perator=():
class Anything {
public:
   // ..
   Anything &operator=(double);
   Anything &operator=(int);
   Anything &operator=(std::string);
};
Okee-dokee.
And even if you could, you still wouldn't have the perl-style
behavior that I described.  If you tried:
   x["root"]["branch"]["leaf"] = 5;
the compiler would not know how to resolve the second (and third) set
of braces.
Of course it would. It's merely a matter of implementing operator[]()
accordingly.

I like the simplicity of this approach.  The way I'm doing it now, I'll
have to add an iterator class to my wrapper, etc, which stinks.  Your way,  
I'd just be using std::map directly.

But I still don't quite see how the second and third sets of [] would work.  
The first one, x["root"], is going to return an Antyhing&.  So, class
Anything would have to also have a [] operator.  How would the returned
Anything object access "x", which would be of type std::map?- Zitierten Text ausblenden -

Try this:

#include <map>
#include <string>
#include <iostream>

#include <boost\any.hpp>

class CHashMapNode
{
private:
typedef std::map<std::string, CHashMapNode*> TNodeType;
TNodeType m_Node;

boost::any m_Value;

public:
CHashMapNode& operator[] (const char* KeyName)
{
// Look up the key in our map. If it doesn't exist, create one.
CHashMapNode*& NewNode = m_Node[std::string (KeyName)];
if (!NewNode)
NewNode = new CHashMapNode;
return *NewNode;
}

CHashMapNode& operator= (const boost::any& p_NewValue)
{
m_Value = p_NewValue;
return *this;
}

boost::any GetValue () const
{
return m_Value;
}
};


int main(int argc, char* argv[])
{
CHashMapNode HashMap;
HashMap["Root"]["Subnode"] = 3;
std::cout << boost::any_cast<int>
(HashMap["Root"]["Subnode"].GetValue ());

HashMap["Root"]["Subnode"] = std::string ("Hello World");
std::cout << boost::any_cast<std::string>
(HashMap["Root"]["Subnode"].GetValue ());
return 0;
}

Regards,
Stuart
 
F

Francesco

Sam said:
So? This calls for an overloaded Anything::eek:perator=():
class Anything {
public:
   // ..
   Anything &operator=(double);
   Anything &operator=(int);
   Anything &operator=(std::string);
};
Okee-dokee.
And even if you could, you still wouldn't have the perl-style
behavior that I described.  If you tried:
   x["root"]["branch"]["leaf"] = 5;
the compiler would not know how to resolve the second (and third) set
of braces.
Of course it would. It's merely a matter of implementing operator[]()
accordingly.
I like the simplicity of this approach.  The way I'm doing it now, I'll
have to add an iterator class to my wrapper, etc, which stinks.  Your way,  
I'd just be using std::map directly.
But I still don't quite see how the second and third sets of [] would work.  
The first one, x["root"], is going to return an Antyhing&.  So, class
Anything would have to also have a [] operator.  How would the returned
Anything object access "x", which would be of type std::map?- Zitierten Text ausblenden -

Try this:

#include <map>
#include <string>
#include <iostream>

#include <boost\any.hpp>

class CHashMapNode
{
private:
  typedef std::map<std::string, CHashMapNode*> TNodeType;
  TNodeType m_Node;

  boost::any m_Value;

public:
  CHashMapNode& operator[] (const char* KeyName)
  {
    // Look up the key in our map. If it doesn't exist, create one.
    CHashMapNode*& NewNode = m_Node[std::string (KeyName)];
    if (!NewNode)
      NewNode = new CHashMapNode;
    return *NewNode;
  }

  CHashMapNode& operator= (const boost::any& p_NewValue)
  {
    m_Value = p_NewValue;
    return *this;
  }

  boost::any GetValue () const
  {
    return m_Value;
  }

};

int main(int argc, char* argv[])
{
  CHashMapNode HashMap;
  HashMap["Root"]["Subnode"] = 3;
  std::cout << boost::any_cast<int>
               (HashMap["Root"]["Subnode"].GetValue ());

  HashMap["Root"]["Subnode"] = std::string ("Hello World");
  std::cout << boost::any_cast<std::string>
               (HashMap["Root"]["Subnode"].GetValue ());
  return 0;

}

Regards,
Stuart

Hi Stuart, hi everybody.

After having read the thread OP, I've started feeling dumb. I didn't
know how to implement a tree-like map (but now I see your example
above, Stuart) and I didn't know how to implement a "any_value"
object.

The first seemed impossible and I skipped it. I've started having a
try at the second, and here is my attempt:

-------
#include <iostream>
#include <map>

using namespace std;

class base {
public:
virtual string ti_name() const = 0;
virtual string Tti_name() const = 0;
virtual const void* get_data() const = 0;
virtual ~base() {}
};

template<class T> class variant : public base {
T data;
public:
variant(const T& t) : data(t) {}
const void* get_data() const {
return &data;
}
string ti_name() const {
static const type_info& ti = typeid(this);
return ti.name();
}
string Tti_name() const {
static const type_info& ti = typeid(T);
return ti.name();
}
};

class variantmap {
map<string, base*> datamap;
public:
class error {
public:
error(const string& s = ""): msg(s) {}
const string& what() {
return msg;
}
private:
string msg;
};
~variantmap() {
typedef map<string, base*>::iterator map_iter;
for (map_iter iter = datamap.begin(), end = datamap.end();
iter != end;
++iter) {
delete iter->second;
}
}
template<class T> void write(const string& id, const T& t) {
if (datamap.find(id) != datamap.end()) {
delete datamap[id];
}
datamap[id] = new variant<T>(t);
}
template<class T> void read_in(const string& id, T* t) throw
(error) {
if (datamap.find(id) != datamap.end()) {
static const string& asked = typeid(T).name();
const string& saved = datamap[id]->Tti_name();
if (asked == saved) {
*t = *(static_cast<const T*>(datamap[id]->get_data()));
} else {
throw error("Mismatch error:\n"
" map key == " + id + "\n" +
" value id == " + saved + "\n" +
" asked id == " + asked + "\n");
}
} else {
throw error("Key error\n [" + id + "] not found\n");
}
}
};

int main()
{
variantmap vmap;
vmap.write("integer", 42);

try {
string s;
vmap.read_in("integer", &s);
cout << "s == " << s << endl;
} catch (variantmap::error e) {
cout << e.what();
}

try {
string s;
vmap.read_in("string", &s);
cout << "s == " << s << endl;
} catch (variantmap::error e) {
cout << e.what();
}

try {
int i;
vmap.read_in("integer", &i);
cout << "i == " << i << endl;
} catch (variantmap::error e) {
cout << e.what();
}

return 0;
}
-------

I'll fiddle with it merging your tree-like map to my variant template,
Stuart.

@everybody: any suggestion, correction of bad approach, bug-fix or
comment about my code will be extremely welcome.

Cheers,
Francesco
 
F

Francesco

Sam wrote:
So? This calls for an overloaded Anything::eek:perator=():
class Anything {
public:
   // ..
   Anything &operator=(double);
   Anything &operator=(int);
   Anything &operator=(std::string);
};
Okee-dokee.
And even if you could, you still wouldn't have the perl-style
behavior that I described.  If you tried:
   x["root"]["branch"]["leaf"] = 5;
the compiler would not know how to resolve the second (and third) set
of braces.
Of course it would. It's merely a matter of implementing operator[]()
accordingly.
I like the simplicity of this approach.  The way I'm doing it now, I'll
have to add an iterator class to my wrapper, etc, which stinks.  Your way,  
I'd just be using std::map directly.
But I still don't quite see how the second and third sets of [] would work.  
The first one, x["root"], is going to return an Antyhing&.  So, class
Anything would have to also have a [] operator.  How would the returned
Anything object access "x", which would be of type std::map?- Zitierten Text ausblenden -
Try this:
#include <map>
#include <string>
#include <iostream>
#include <boost\any.hpp>
class CHashMapNode
{
private:
  typedef std::map<std::string, CHashMapNode*> TNodeType;
  TNodeType m_Node;
  boost::any m_Value;
public:
  CHashMapNode& operator[] (const char* KeyName)
  {
    // Look up the key in our map. If it doesn't exist, create one.
    CHashMapNode*& NewNode = m_Node[std::string (KeyName)];
    if (!NewNode)
      NewNode = new CHashMapNode;
    return *NewNode;
  }
  CHashMapNode& operator= (const boost::any& p_NewValue)
  {
    m_Value = p_NewValue;
    return *this;
  }
  boost::any GetValue () const
  {
    return m_Value;
  }

int main(int argc, char* argv[])
{
  CHashMapNode HashMap;
  HashMap["Root"]["Subnode"] = 3;
  std::cout << boost::any_cast<int>
               (HashMap["Root"]["Subnode"].GetValue ());
  HashMap["Root"]["Subnode"] = std::string ("Hello World");
  std::cout << boost::any_cast<std::string>
               (HashMap["Root"]["Subnode"].GetValue ());
  return 0;

Regards,
Stuart

Hi Stuart, hi everybody.

After having read the thread OP, I've started feeling dumb. I didn't
know how to implement a tree-like map (but now I see your example
above, Stuart) and I didn't know how to implement a "any_value"
object.

The first seemed impossible and I skipped it. I've started having a
try at the second, and here is my attempt:

-------
#include <iostream>
#include <map>

using namespace std;

class base {
  public:
    virtual string ti_name() const = 0;
    virtual string Tti_name() const = 0;
    virtual const void* get_data() const = 0;
    virtual ~base() {}

};

template<class T> class variant : public base {
    T data;
  public:
    variant(const T& t) : data(t) {}
    const void* get_data() const {
      return &data;
    }
    string ti_name() const {
      static const type_info& ti = typeid(this);
      return ti.name();
    }
    string Tti_name() const {
      static const type_info& ti = typeid(T);
      return ti.name();
    }

};

class variantmap {
    map<string, base*> datamap;
  public:
    class error {
      public:
        error(const string& s = ""): msg(s) {}
        const string& what() {
          return msg;
        }
      private:
        string msg;
    };
    ~variantmap() {
      typedef map<string, base*>::iterator map_iter;
      for (map_iter iter = datamap.begin(), end = datamap.end();
           iter != end;
           ++iter) {
        delete iter->second;
      }
    }
    template<class T> void write(const string& id, const T& t) {
      if (datamap.find(id) != datamap.end()) {
        delete datamap[id];
      }
      datamap[id] = new variant<T>(t);
    }
    template<class T> void read_in(const string& id, T* t) throw
(error) {
      if (datamap.find(id) != datamap.end()) {
        static const string& asked = typeid(T).name();
        const string& saved = datamap[id]->Tti_name();
        if (asked == saved) {
          *t = *(static_cast<const T*>(datamap[id]->get_data()));
        } else {
          throw error("Mismatch error:\n"
                      "   map key == " + id + "\n" +
                      "  value id == " + saved + "\n" +
                      "  asked id == " + asked + "\n");
        }
      } else {
        throw error("Key error\n  [" + id + "] not found\n");
      }
    }

};

int main()
{
  variantmap vmap;
  vmap.write("integer", 42);

  try {
    string s;
    vmap.read_in("integer", &s);
    cout << "s == " << s << endl;
  } catch (variantmap::error e) {
    cout << e.what();
  }

  try {
    string s;
    vmap.read_in("string", &s);
    cout << "s == " << s << endl;
  } catch (variantmap::error e) {
    cout << e.what();
  }

  try {
    int i;
    vmap.read_in("integer", &i);
    cout << "i == " << i << endl;
  } catch (variantmap::error e) {
    cout << e.what();
  }

  return 0;}

-------

I'll fiddle with it merging your tree-like map to my variant template,
Stuart.

@everybody: any suggestion, correction of bad approach, bug-fix or
comment about my code will be extremely welcome.

I've spotted one problem: if I let variantmap to be copied I'm in
trouble, have to define constructor, copy-constructor and assignment
operator to manage the dynamic memory.

Can you spot other problems there in my code?

Francesco
 
J

James Kanze

none said:
I'm sure something similar could be done in C++, probably
using a wrapper around std::map that provided operator
overloading for maybe the [] operator, and I guess you'd
need to overload the = operator to accept any type of data
that you might want to store in a leaf.
No need to overload anything. Both the key or the value in a
std::map can be any class, with the only restriction that the
key class must implement operator<() (ignoring the
hair-splitting requirement for the implementation of copy
constructors and assignment operators).

If you're using operator[] on std::map, the mapped type must
also support default construction.
It's all a matter of implementing the two classes, with the
appropriate semantics. All you need to do is define a value
class that implements operator[] using your desired semantics.

The important thing here is that the mapped type (but not the
key type) of a map can itself be a map, so you can have
something like:
std::map< Key1, std::map< Key2, std::map< Key3, Mapped > > >

The other important point is that it is NOT a hash table; if
this is important (e.g. millions of entries), there is an
unsorted_map in TR1, and in the next version of the standard,
which will be a hash table.
 
J

James Kanze

Sure, but the type of the elements are determined at compile
time. In other words, you can't just do this:
std::map<std::string, Anything> x;
x["a float"] = 5.0f;
x["an int"] = 10;
x["a string"] = "some text";

You can; just map to boost::any (although I'm not too sure what
boost::any would do with a string literal). Except in unusual
cases, however, it's something to avoid; the fact that perl
doesn't have any real typing is a definite defect in the
language, at least if you want to use it for larger programs.
And even if you could, you still wouldn't have the perl-style
behavior that I described. If you tried:
x["root"]["branch"]["leaf"] = 5;
the compiler would not know how to resolve the second (and
third) set of braces.

You could probably define an Any class yourself which would.
Whether it's a good idea or not is another question.
 
N

none

Stuart said:
Try this:

#include <map>
#include <string>
#include <iostream>

#include <boost\any.hpp>

class CHashMapNode

[...]

That's almost identical to what I did, but without boost::any. I'm discovering that
there are some ugly corner cases to handle. For example, this is legal:

my_hash["a"]["b"]["c"] = 5;
my_hash["a"] = 10;

To duplicate perl's behavior, the second assignment should destroy nodes "b" and "c".
I just added a check in operator= to first delete all children before making the
assignment.

Here is another tricky one:

hash1["a"]["b"]["c"] = 5;
hash2["x"]["y"]["z"] = hash1;

int x = hash2["x"]["y"]["z"]["a"]["b"]["c"];

The variable x should get the value 5. I haven't fixed this case yet, but it's going
to require a recursive copy.

At the risk of showing some C++ ignorance, should I handle that recursive copy in
operator=, or in the copy constructor? What's the difference?
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]
The other important point is that it is NOT a hash table; if
this is important (e.g. millions of entries), there is an
unsorted_map in TR1, and in the next version of the standard,
which will be a hash table.

Just in case the OP (or whoever) wants to Google for it, read about
it, etc., that's really unordered_map instead of unsorted_map.
 
R

Richard Herring

none said:
At the risk of showing some C++ ignorance, should I handle that
recursive copy in
operator=, or in the copy constructor?

For consistency, you should do it in both, so that both these lines have
the same result:

T x; x=y;
T x(y);


A common pattern is to implement the mechanics of the deep copy in the
copy constructor, define a swap() function, and then use the
copy-and-swap idiom (GIYF) in the assignment operator to exploit the
copy constructor you've already defined. This idiom also means you don't
need to handle self-assignment as a special case.
What's the difference?

The assignment operator has to dispose of the old value as well as
assigning a new one, whereas the copy constructor always starts from the
primordial vacuum ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top