Search in a std::map<std::pair<unsigned int, std::string>>

blueblood · Mar 16th 2007, 10:27am

Hello,

I'm currently trying to build up and understand the SceneGraph Mr.Mike did in GCC. And I ended up with some small trouble.

The thing is that in the book Scene::FindActor(ActorID id) only searches scenenode ids, but I wanted to be able to search for names to.

So I thought I would be smart and created a map with a pair as a key

Source Code

std::map<std::par<unsigned int, std::string>> m_SceneNodeMap;

Well, here is the problem.
I can't search in the map for only the ID or Name, just both of them at the same time.

Anyone know how I can come around this? Or is this a dead end?

Source Code

shared_ptr<ISceneNode> Scene::FindSceneNode(SceneNodeId id)
{
SceneNodeInfo info(0, name);
SceneNodeMap::iterator i = m_SceneNodeMap.find(info);

This obviosly won't work, wasen't so obvious when I thought about it though.

So, can anyone enlighten me? Or is the only way out to iterate the map and compare only pair.first/pair.second? Feels like I loose alot of the purpos of the map then?

Best regards
Jesper

Kain · Mar 17th 2007, 1:08pm

It's possible that SceneNodeInfo needs operator== defined for it... possible operator<, too but I can't remember.

blueblood · Mar 18th 2007, 5:32am

Ah, that perfectly makes sense, will immedietly look into that. Thank you for the pointer

edit:
It didn't work

Source Code

bool jSceneNodeInfo::operator==(const jSceneNodeInfo &info)

was never called. Then I was thinking that maybe it uses std::map::operator==, but I can't figure out how to overload that, I guess I need to inherit from it and make a new class, instead of an ordinary typedef.

jason_hughes · Mar 19th 2007, 2:21am

You could do it with a map, but it all depends on what you're trying to look up. I'll explain in the context of the data structure, not specifically the jSceneNodeInfo, because I'm not familiar with that code specifically.

So, if you have a function that can look up something via ID, and you want one that looks up via string, you have two choices. std::map<std::string, ID> or std::map<std::string, std::map<jSceneNodeInfo>::iterator>. The second structure here may have the wrong map type, but you get my drift.

Essentially, you can always look up the ID based on the string first, then make a second lookup based on the ID, incurring two O(logN) lookups, or you can insert the iterator to the node in the ID-to-node map associated by the string in your auxiliary map. This is legal in the STL because iterators to elements in an std::map are not invalidated under any operations (except delete of that node). So, you can get it down to a single O(logN) lookup this way. Just be very careful to insert and delete in your external string-to-node lookup every time the Scene's map is modified.

FWIW, most types in STL only need the operator< to be used in data structures, since == can be phrased as !(a<b) && !(b<a). Dunno if that actually answers your question or not.

JH

rezination · Mar 20th 2007, 1:49pm

I know I'm going to get rotten fruit & vegetables thrown at me for this, but what you're proposing is going to be pretty slow in a subsystem that probably can't afford the performance hit. One of the reasons for using id's is because they're fast. A string compare is extremely slow, and doing a nested map of maps of strings will make the compiler cry.

For Barbie, we had class called SeEntity which was the template for our property system. An Entity is something that can do something interesting; the interesting stuff being defined by what properties you give it. Thus, every entity had a map of properties:

Source Code

typedef std::map< std::string, SEProperty* > SEPropertyMap;

When we profiled our code, we found that an amazing amout of time was spent walking those maps map and comparing strings. We ended up changing it into a sorted vector of pairs, and changing the std::string's into something we called StringId's. A StringId is a class that optimizes string compares. It's used for strings that are unique throughout the game, like a property name. When you assign a StringId object some string value, it hunts through the list of strings (actually a std::set which is really a red/black bst under the hood; 0(log n) average case for searching) and finds the one you want. After that initial hit, comparing StringId's is super fast; it's just a pointer compare.

For completenesses sake, here's the final definition which shipped with Barbie:

Source Code

typedef std::vector<std::pair<StringId, SEProperty *> > SEPropertyMap;

I'm not sure if this helps you at all, I just wanted to point out a potential performance hit and show you our solution. It ended up working very well for us.

-Rez

jason_hughes · Mar 20th 2007, 2:10pm

I'm not sure if you're responding to me or the parent in this thread, so apologies if you weren't.

I wasn't suggesting a map of maps, or anything like it. It was a map of strings to map::iterators. Big difference. That means he could look something up with a string and not need to convert to an ID first, that's all.

What you describe with a StringId is very common, and has been used for decades in games. My favorite implementation is a StringCRC32, which acts like a constant string, except that upon construction, it computes a CRC32 of itself. During the operator< or operator==, it first checks for equality by comparing CRC's between strings. If they match, you have a very good probability of equality. Then you check the length. If still equal, you're almost certain. If you are worried about it, you do the slower string compare to be absolutely certain. That's only two integer compares. Plenty fast. And you don't need a central repository of strings to do it.

But I can see the advantage of what you describe, because you are guaranteed after first finding the string in the set, equal means equal. If you had some ballpark figure for the number of strings, you could have made that a hash table for a bit better performance. Iterating maps is horrible for cache coherency.

There are also cheaper/faster string hash functions than CRC32. It's just a convenient hash I have laying around. And actually, if you use two different hash functions, and they agree, you can be virtually certain of equality without bothering with string compares.

JH

rezination · Mar 20th 2007, 2:47pm

I was actually responding to the original post. The map of maps of strings scenario is one I saw in our code (eek!), though it was fortunately not in a performance critical area.

I've seen similar solutions to the StringCRC32 approach. My only concern is that it has the potential to create some crazy bugs if you happen to get a bad string compare despite the odds (Murphy's Law

) I'm curious; have you ever seen any problems crop up in your implementations from false matches?

We can be pretty certain how many StringId's there will be because most of them are created during initialization, so throwing them in a hash map is a good idea. Maybe I'll sneak that in.

-Rez

jason_hughes · Mar 20th 2007, 3:46pm

Yes, it definitely happens that CRC32's can collide. It's not the most usual thing, but when it happens, it can be nasty to track down. That's why I mentioned you want to use the checksum as the quick rejection, then use another method (length or another hash function) as a sanity check. And in debug builds, I always degenerate to a full string compare just to be absolutely sure. If you don't have problems in debug, you are pretty safe in release, so I turn the slow string compare part off.

The content of the strings matters, too. CRC was primarily developed to detect line noise, so it is ideal at discovering differences of one bit in two equal length binary blocks. Its probability of producing an identical checksum increases (slowly) as more bits differ. Strings tend to be highly variable, so if one bit is different, usually many more are different. And they tend to vary in length. So, anyone suggesting that the theoretical properties developed in most of the CRC literature apply here is simply wrong. However, it does give us a nice and random hash key.

And the probability that two different hash key functions yield the same result are astronomical...

JH

rezination · Mar 21st 2007, 7:44pm

Astronomical or not, I'm always paranoid about that kind of stuff. The final string compare would definitely solve that problem. What we really need to do is stop using so many strings in Ratrace.

-Rez

blueblood · Mar 25th 2007, 3:49pm

Hi,

it was a while I posted this. And I came to think of one thing. You might have mentioned it in your replys, but I may have missed it. But here goes.

In GCC 2ned MrMike uses a hashgenerator to decompose strings to longs (EventType) to match Events with eventlisteners.

Couldnt this be possible to use as actor id? That I send in a name to each actor, and it get hashed to a long. And that long is used inside the scene graph, but what I see as the user, is the string.

What do you think about that?

Best regards
Jesper

Kain · Apr 5th 2007, 12:35am

I like.

Tarviathun · Apr 5th 2007, 1:22am

I did something similar actually. It's a general type class that holds a map of unsigned long and strings. Basically, it hashes a string and puts it in a map and hands out const references to the type (so classes are only 4 bytes larger instead of 12, I think).

If you'd like, I can post it here so that you can get a few tips from it.

Source Code

Source Code

Source Code

Source Code

Source Code

Share