wstring and string questions, when to use, why?

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • wstring and string questions, when to use, why?

      Hello all.

      I was told by someone I should convert my project to use Unicode, right now it is using Multi-Byte Char Set.

      I was wondering how will it benefit me (it will take quite long time due to the project's size) to do so.

      Also I was wondering where should I use wstring and where should I use a normal string if I convert the project. For example right now I store path for files and character names as string, should I convert them all to wstring?

      I know that directx load/save from/to file takes string in multi-byte but will take a wstring if I change to unicode so logically I should use wstring there, but then all my loading/saving of actors names and models file path will be affected too, how do I handle saving of a wstring to file? is it just like saving a string? do I need to use a different class instead of the usual fstream? ps I am using binary (as in, out.write(char*(&myData), sizeof(myData); for streaming, not the ">>" operator.

      The post was edited 1 time, last by Shanee ().

    • It depends on what you want to do with the project. Unicode and multi-byte character formats only matter when trying to translate to other languages, especially languages like Japanese and Mandarin which have TONS of different glyphs. If you're not planning on localizing your program, it doesn't really matter.

      If you are planning on localizing it, you have a LOT more work to do. You'll need to create a string database for each and every user-facing string. You probably have to do it for all paths as well. It's a major pain in the ass.

      Once the system is set up it's easy enough to deal with. You typically access strings by a string key which then returns the appropriate text so you can print it out.

      Personally, I don't bother with any of this unless it's a professional project. I'm okay with my stuff being US English only.

      (edit)
      Whoops, just saw the bottom of your post. Yes, saving a wstring is the same, you just can't assume 1 byte anymore. The size of the string is no longer someString.size(), it's somestring.size() * sizeof(wchar_t). Same thing with saving char*'s directly, you can't assume a single byte for each character.

      -Rez

      The post was edited 1 time, last by rezination ().

    • Thank you Rez :)

      I just saw your message, I worked all night to convert the project to unicode, finally got it to work and collapsed in bed, ha!

      I am still a bit confused on where to use it, seeing the book is using a normal string for let's say, files loading but wstring for almost everything else, I am wondering if I should go ahead and make it use a wstring for file loading anyway.

      I am guessing actor names and the like should become wstring as this way they can be easily translated to any language.

      I am a bit worried for networking, for example, that the extra bytes from wchar_t would slow things down and makes the packages much bigger. Still not entirely sure where to use a string and where a wstring. Perhaps I should just change everything to wstring for simplicity? How is it in your programming experience?
    • I'm not sure about file loading, though I would guess that the directory paths themselves would need to be localized in some way considering that the OS has them in whatever language the user has set up. I honestly don't know about that one, though. I've never been the unlucky sap to have to deal with localization programming. ;) As an AI programmer, I'm usually off the hook for that sort of thing.

      If your actor names are visible to the user, they must be translated. If they are purely internal and never displayed except in debug strings, leave them as ASCII.

      Yeah, sending unicode string across the network could potentially be costly. The real question is why are you sending strings in the first place? You really should minimize traffic as much as possible across any bottleneck boundary like that. For example, why not just reference actors using a 32-bit unique ID? That's going to be MUCH more efficient than an actor string.

      On The Sims Medieval, we only localize what we have to. We have a bunch of static functions that deal with localization. So inside an interaction, you might see:

      Source Code

      1. public override string GetInteractionName()
      2. {
      3. return Localization.GetString("Interactions/Religious/Jacoban:GiveSermon");
      4. }


      If it's an interaction the user never sees, you might see this:

      Source Code

      1. public override string GetInteractionName()
      2. {
      3. return "Evict Role Sim";
      4. }


      We ONLY localize what we absolutely have to. Why? Because each string we localize costs us money. Some team of translators have to translate the string into like 8 languages and ensure that it's not only grammatically accurate, but culturally as well. This type of thing is pretty standard practice.

      That having been said, I *think* we're using unicode for just about everything. Our gameplay language is C# so we're using the string primitive when we need to deal with a string. I think it's just a compiler setting to make those be unicode, as opposed to C++. On the C++ side, I think we're doing the same thing, although we use typedefs to hide it from the programmer.

      So, in conclusion, if you're going with unicode, you might as well just use it everywhere unless there's a really reason not to somewhere (like if you're really dead-set on using a string for actor names the user never sees). But make sure you have a string database that has ONLY what you NEED to have translated. Localization costs money.

      -Rez
    • Don't feel bad. I spent an hour over the weekend trying to figure out why some UI wasn't rendering the alpha channel properly. Turns out I had alpha blending turned off. *facepalm*

      -Rez