Code Page problem in SetWindowText

You may want to try calling SetWindowTextW() instead.

This article may be interesting to you:

http://www.codeproject.com/string/cppstringguide1.asp

Tom
"Marco Hung" <***@gmail.com> wrote in message news:***@TK2MSFTNGP06.phx.gbl...
Hi All,

I've created a MFC project in MBCS. I need to show some set special characters ( ASCII code > 128) in a CStatic controls. It shows correctly in all English locale window. However all those special character becames "?" in non-English window. How to solve this problem? Here's part of my source code

void MySprcialCharacterDlg::OnUpdate()
{
TCHAR stringToShow[10];
ZeroMemory( stringToShow, sizeof(stringToShow) );
stringToShow[0] = 129;
stringToShow[1] = 130;
stringToShow[2] = 131;
stringToShow[3] = 132;
stringToShow[4] = 133;
stringToShow[5] = 134;
stringToShow[6] = 135;
stringToShow[7] = 136;
stringToShow[8] = 137;

::SetWindowText( GetDlgItem(IDC_STATIC_SPECIAL_CHAR), stringToShow );
}

Many thx.
Marco

Marco Hung

2007-09-05 04:32:29 UTC

Thx Tom,

I've follow the sample. But seems that the special character still can't display correctly.

"Tom Serface" <***@camaswood.com> wrote in message news:***@TK2MSFTNGP04.phx.gbl...
You may want to try calling SetWindowTextW() instead.

This article may be interesting to you:

http://www.codeproject.com/string/cppstringguide1.asp

Tom
"Marco Hung" <***@gmail.com> wrote in message news:***@TK2MSFTNGP06.phx.gbl...
Hi All,

I've created a MFC project in MBCS. I need to show some set special characters ( ASCII code > 128) in a CStatic controls. It shows correctly in all English locale window. However all those special character becames "?" in non-English window. How to solve this problem? Here's part of my source code

void MySprcialCharacterDlg::OnUpdate()
{
TCHAR stringToShow[10];
ZeroMemory( stringToShow, sizeof(stringToShow) );
stringToShow[0] = 129;
stringToShow[1] = 130;
stringToShow[2] = 131;
stringToShow[3] = 132;
stringToShow[4] = 133;
stringToShow[5] = 134;
stringToShow[6] = 135;
stringToShow[7] = 136;
stringToShow[8] = 137;

::SetWindowText( GetDlgItem(IDC_STATIC_SPECIAL_CHAR), stringToShow );
}

Many thx.
Marco

Marco Hung

2007-09-05 04:32:29 UTC

Mihai N.

2007-09-05 07:41:42 UTC

Post by Marco Hung
I've created a MFC project in MBCS. I need to show some set special
characters ( ASCII code > 128) in a CStatic controls.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Tom Serface

2007-09-05 15:28:02 UTC

Hi Mihai,

For the most part I agree with what you say here, the only exception
being... if you are using a lot of strings and doing a lot of string
handling and don't need anything except English then using MBCS may be a bit
faster to execute, better in memory storage, and quicker to read and write
files since Unicode doubles all character sizes whether needed or not. I
wish Windows/MFC/all those good things had better handling for other methods
like UTF-8 that would give similar results as MBCS.

That say, the differences in most cases are not all that significant and
I've gone to using Unicode all the time.

Tom

Post by Marco Hung
I've created a MFC project in MBCS. I need to show some set special
characters ( ASCII code > 128) in a CStatic controls.

If you start now with a new project, there is no reason no to go Unicode.
The only reason for MBCS is to support Win 9x (with a new project?),
to learn about things that will be obsolete in 2-3 years,
or for being a masochist :-)
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

David Ching

2007-09-05 22:37:57 UTC

Post by Tom Serface
Hi Mihai,
For the most part I agree with what you say here, the only exception
being... if you are using a lot of strings and doing a lot of string
handling and don't need anything except English then using MBCS may be a
bit faster to execute, better in memory storage, and quicker to read and
write files since Unicode doubles all character sizes whether needed or
not. I wish Windows/MFC/all those good things had better handling for
other methods like UTF-8 that would give similar results as MBCS.
That say, the differences in most cases are not all that significant and
I've gone to using Unicode all the time.

But much of the inherent speed advantage of MBCS is negated by the native
API in Win2K/XP/Vista being Unicode, so having a Unicode app allows us to
call these API's directly and not go through thunks. But I've not done
speed tests.

-- David

Tom Serface

2007-09-05 23:02:33 UTC

That's a really good point. I hadn't thought of that before. So I guess if
you are moving strings in and out of controls a lot there could even be a
performance improvement using Unicode.

Tom

Post by David Ching

Giovanni Dicanio

2007-09-05 23:37:34 UTC

Post by Tom Serface
That's a really good point. I hadn't thought of that before. So I guess
if you are moving strings in and out of controls a lot there could even be
a performance improvement using Unicode.

Hi Tom,

I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)

G

Joseph M. Newcomer

2007-09-06 05:17:21 UTC

I've had lots of people insist that it is faster to use ANSI apps because "the strings are
shorter". They don't realize that since all of WIndows is written in Unicode, every ANSI
API has to first convert its arguments to Unicode, then call the Unicode version of the
API, so ANSI would be inherently slower.

In an experiment I ran, Unicode is on the average slightly faster than ANSI, for something
as simple as a repeated SetWindowText, although the variance of the samples is high.
joe

Hi Tom,
I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)
G

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Giovanni Dicanio

2007-09-06 10:09:11 UTC

Yes, Joe! The key point is the converstion ANSI -> Unicode made internally
by Windows, as you pointed.

Giovanni

Tom Serface

2007-09-06 15:57:25 UTC

I guess it all depends on what you are going to do with the strings. If you
are manipulating them in memory and not using them for any Windows things
then certainly ANSI would save memory and time, but it's difficult to
quantify the difference and I suspect it is negligible so Unicode seems a
better way to go in my opinion. If you really need to minimize memory space
(like you're trying to run an MFC application on your watch or something)
then perhaps, but ...

Tom

Post by Joseph M. Newcomer
I've had lots of people insist that it is faster to use ANSI apps because "the strings are
shorter". They don't realize that since all of WIndows is written in Unicode, every ANSI
API has to first convert its arguments to Unicode, then call the Unicode version of the
API, so ANSI would be inherently slower.
In an experiment I ran, Unicode is on the average slightly faster than ANSI, for something
as simple as a repeated SetWindowText, although the variance of the samples is high.
joe

Giovanni Dicanio

2007-09-06 22:04:10 UTC

but it's difficult to quantify the difference and I suspect it is
negligible so Unicode seems a better way to go in my opinion.

Hi Tom,

I agree with you.

And maybe if memory space saving is the main target, UTF-8 could be used as
the encoding for Unicode, instead of UTF-16.
But maybe for historical reasons, it seems that internal Windows format for
Unicode is UTF-16 :(
On the other side, IIRC Mac OS X and Linux tend to use UTF-8, but I may be
in mistake...

If you really need to minimize memory space (like you're trying to run an
MFC application on your watch or something) then perhaps, but ...

IIRC, Windows CE (which should be suited to embedded platforms and platforms
with memory limits, not like the "huge" 1-2 GB of RAMs in current desktop
PCs) uses Unicode (UTF-16) and not ANSI :)

Giovanni

Joseph M. Newcomer

2007-09-07 05:13:59 UTC

And how many strings do you need before you start to see any real impact on memory space?

Suppose I have 10MB 'characters'. In ANSI, these would take 10MB of RAM; in Unicode they
would take 20MB of RAM. On a typical end-user machine of 1GB of memory, this means that I
would occupy 0.5% of physical RAM, or 0.25% of my virtual address space, with 8-bit
strings, and in Unicode, I'd use a whopping 1% of my physical address space and 0.5% of my
virtual address space. I somehow cannot get excited about this problem, given all the
additional problems of complex code, possibility of error, cost of development and
debugging, etc. that it would cost to use 8-bit characters.
joe

but it's difficult to quantify the difference and I suspect it is
negligible so Unicode seems a better way to go in my opinion.

Hi Tom,
I agree with you.
And maybe if memory space saving is the main target, UTF-8 could be used as
the encoding for Unicode, instead of UTF-16.
But maybe for historical reasons, it seems that internal Windows format for
Unicode is UTF-16 :(
On the other side, IIRC Mac OS X and Linux tend to use UTF-8, but I may be
in mistake...

If you really need to minimize memory space (like you're trying to run an
MFC application on your watch or something) then perhaps, but ...

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Tom Serface

2007-09-07 05:52:24 UTC

OK, point, set, match... I can't argue with that one :o) Not to mention
that arguing makes no sense since I'm a Unicode convert anyway.

Tom

Post by Joseph M. Newcomer
And how many strings do you need before you start to see any real impact on memory space?
Suppose I have 10MB 'characters'. In ANSI, these would take 10MB of RAM; in Unicode they
would take 20MB of RAM. On a typical end-user machine of 1GB of memory, this means that I
would occupy 0.5% of physical RAM, or 0.25% of my virtual address space, with 8-bit
strings, and in Unicode, I'd use a whopping 1% of my physical address space and 0.5% of my
virtual address space. I somehow cannot get excited about this problem, given all the
additional problems of complex code, possibility of error, cost of development and
debugging, etc. that it would cost to use 8-bit characters.
joe
On Fri, 7 Sep 2007 00:04:10 +0200, "Giovanni Dicanio"

but it's difficult to quantify the difference and I suspect it is
negligible so Unicode seems a better way to go in my opinion.

Hi Tom,
I agree with you.
And maybe if memory space saving is the main target, UTF-8 could be used as
the encoding for Unicode, instead of UTF-16.
But maybe for historical reasons, it seems that internal Windows format for
Unicode is UTF-16 :(
On the other side, IIRC Mac OS X and Linux tend to use UTF-8, but I may be
in mistake...

If you really need to minimize memory space (like you're trying to run an
MFC application on your watch or something) then perhaps, but ...

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Giovanni Dicanio

2007-09-07 09:37:54 UTC

Post by Tom Serface
OK, point, set, match... I can't argue with that one :o) Not to mention
that arguing makes no sense since I'm a Unicode convert anyway.

Hi Tom,

Yes, ANSI is kind of computer archaeology in these days :)

G.

Tom Serface

2007-09-07 15:09:16 UTC

Unfortunately, we may think that philosophically, but I think there are
still more applications using MBCS than anything. The more people continue
to use VC 6 the more this will be the case in my opinion. MSFT should do
everything it can to make the VC6 people happy enough with a new version to
update. That would help the cause more than any other new feature.

Tom

Post by Tom Serface
OK, point, set, match... I can't argue with that one :o) Not to mention
that arguing makes no sense since I'm a Unicode convert anyway.

Hi Tom,
Yes, ANSI is kind of computer archaeology in these days :)
G.

Giovanni Dicanio

2007-09-07 15:16:00 UTC

Post by Tom Serface
Unfortunately, we may think that philosophically, but I think there are
still more applications using MBCS than anything. The more people
continue to use VC 6 the more this will be the case in my opinion.

Hi Tom,

VC6 has no problem with Unicode...

http://www.mihai-nita.net/article.php?artID=20060723a

...Am I missing something?

G

Tom Serface

2007-09-07 15:28:25 UTC

Indeed, but it suggests MBCS by default and can't handle Unicode RC files.
I think it encourages programs to be MBCS with this behavior.

Tom

Hi Tom,
VC6 has no problem with Unicode...
http://www.mihai-nita.net/article.php?artID=20060723a
...Am I missing something?
G

Mihai N.

2007-09-08 05:25:44 UTC

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Giovanni Dicanio

2007-09-08 09:20:31 UTC

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?

Is there any workaround?

Should we use external custom file encoded e.g. in UTF-8 and read it and
convert it dynamically to UTF-16?

Thanks in advance,
Giovanni

Tom Serface

2007-09-08 16:10:16 UTC

Hi Giovanni,

In 2003 you can have a Unicode RC file, but it is initially created in MBCS
and you just have to open the .RC file in Notepad then save it back as
Unicode. The IDE will use it after that. I think 2005 creates them as
Unicode in the first place.

In VC6 and 2003 (using ANSI) it relies on the codepage and fonts to do the
correct characters so you can have Japanese, but it wouldn't be Unicode. I
think there are some characters that MBCS can't handle, but I don't know
what they are off hand.

Tom

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?
Is there any workaround?
Should we use external custom file encoded e.g. in UTF-8 and read it and
convert it dynamically to UTF-16?
Thanks in advance,
Giovanni

Joseph M. Newcomer

2007-09-09 02:01:13 UTC

Actually, I think you can, but you have to use \xNNNN syntax and mark it as a wide string
because the text is stored in 8-bit characters. But the resource compiler will do the
right thing. For example

IDS_MU L"Gray Cats say \x03BC!"

will produce the right result. The problem is that I'm no longer sure how to produce the
L" form of the string short of hand-editing, and if you just type in \x it converts it to
\\x. But it works, and the correct result is displayed providing the font you have has
the Greek letter 'mu' in it.
joe

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?
Is there any workaround?
Should we use external custom file encoded e.g. in UTF-8 and read it and
convert it dynamically to UTF-16?
Thanks in advance,
Giovanni

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Mihai N.

2007-09-09 02:42:30 UTC

Post by Giovanni Dicanio
So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?

The compiled resource files are always Unicode.
The source resource files (.rc) can be Unicode, but you cannot edit them with
the resource editor in VS 6/2002/2003

Ok, in VS 2005 editor, if you know about some of the bugs:
- The RichEdit controls in dialogs are always ansi
(http://www.mihai-nita.net/article.php?artID=20050709b)
- The .rc is not Unicode unless you ask for it
(http://www.mihai-nita.net/article.php?artID=20051030a)
- The DLGINIT used for combo-boxes in MFC is always ansi
(I have reported it for Orcas, marked as fixed)

And only UTF-16LE is supported (no UTF-8!)

Post by Giovanni Dicanio
Is there any workaround?
Should we use external custom file encoded e.g. in UTF-8 and read it and
convert it dynamically to UTF-16?

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Giovanni Dicanio

2007-09-09 10:47:00 UTC

Thank you!

Giovanni

----

Post by Giovanni Dicanio
So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?

The compiled resource files are always Unicode.
The source resource files (.rc) can be Unicode, but you cannot edit them with
the resource editor in VS 6/2002/2003
- The RichEdit controls in dialogs are always ansi
(http://www.mihai-nita.net/article.php?artID=20050709b)
- The .rc is not Unicode unless you ask for it
(http://www.mihai-nita.net/article.php?artID=20051030a)
- The DLGINIT used for combo-boxes in MFC is always ansi
(I have reported it for Orcas, marked as fixed)
And only UTF-16LE is supported (no UTF-8!)

Post by Giovanni Dicanio
Is there any workaround?
Should we use external custom file encoded e.g. in UTF-8 and read it and
convert it dynamically to UTF-16?

Set the system locale to Japanese and reboot.
(http://www.mihai-nita.net/article.php?artID=20050611a)
It is the best option, because you need WYSIWYG for proper resizing.
http://www.mihai-nita.net/article.php?artID=20070503a
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Norman Diamond

2007-09-10 00:46:17 UTC

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?

It's the other way around. In VC6 or VS2003 Unicode-built apps, we can't
have e.g. a string-table resource with any NON-JAPANESE characters in
Unicode.

Post by Giovanni Dicanio
Is there any workaround?

Use Notepad to edit the RC file. (Facts are funnier than jokes, eh?)

Tom Serface

2007-09-10 15:18:08 UTC

Hi Norman,

You'd be surprised how many times I do this sort of thing. The problem is
with pre-2005 versions if you edit the wrong resource by mistake the RC
editor would trash all of your other resources (yielding ???) unless you
were in the correct region (locale) while editing. Fortunately, this
doesn't seem to be a problem with Unicode RC files. Still, I use Notepad to
make some changes since the search and replace works so much nicer :o)

Tom

Post by Norman Diamond

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?

It's the other way around. In VC6 or VS2003 Unicode-built apps, we can't
have e.g. a string-table resource with any NON-JAPANESE characters in
Unicode.

Post by Giovanni Dicanio
Is there any workaround?

Use Notepad to edit the RC file. (Facts are funnier than jokes, eh?)

Giovanni Dicanio

2007-09-10 19:34:49 UTC

Post by Tom Serface
You'd be surprised how many times I do this sort of thing.

Well, also the graphics/image-editing capabilities of Visual Studio are not
great, so also to edit images it is good to go to external "ad hoc"
programs...

G

Tom Serface

2007-09-08 16:06:40 UTC

OK, so same point, only I think that VS 2005 allows people to use Unicode RC
files without saving them outside the IDE.

I think people have more trouble updating from VC6 to VS.NET than they do
updating to any other version since then. I think it would make sense for
Microsoft to make a really easy upgrade path from VC6/VS6 to VS 2008 to
encourage people to move up.

Tom

Post by Tom Serface
Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Giovanni Dicanio

2007-09-07 09:14:51 UTC

Post by Joseph M. Newcomer
I somehow cannot get excited about this problem, given all the
additional problems of complex code, possibility of error, cost of development and
debugging, etc. that it would cost to use 8-bit characters.

I believe that both you and me (and others, of course) use Unicode for
strings.
My point was about UTF-16 vs UTF-8 (both *Unicode*, not ANSI 8 bits).

Giovanni

Mihai N.

2007-09-07 07:29:15 UTC

Post by Giovanni Dicanio
And maybe if memory space saving is the main target, UTF-8 could be used as
the encoding for Unicode, instead of UTF-16.
But maybe for historical reasons, it seems that internal Windows format for
Unicode is UTF-16 :(
On the other side, IIRC Mac OS X and Linux tend to use UTF-8, but I may be
in mistake...

As a general rule: UTF-16 for processing, UTF-8 for transfer/storage
(and like any general rule it has exceptions, but you have to know when
to do that)

Mac OS X string API uses UTf-16. Same for Apache Xerces
(XML parsing library), ICU (IBM's International Components for Unicode),
Qt, Java.
Here is a good read: http://unicode.org/notes/tn12/tn12-1.html

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Giovanni Dicanio

2007-09-07 09:13:00 UTC

Post by Mihai N.
As a general rule: UTF-16 for processing, UTF-8 for transfer/storage
(and like any general rule it has exceptions, but you have to know when
to do that)

Yes.

Post by Mihai N.
Mac OS X string API uses UTf-16. Same for Apache Xerces
(XML parsing library), ICU (IBM's International Components for Unicode),
Qt, Java.
Here is a good read: http://unicode.org/notes/tn12/tn12-1.html

Thank you for having corrected my wrong information about Mac OS X.
I'm going to read the web page you linked.

Giovanni

Tom Serface

2007-09-06 16:08:05 UTC

Yeah, I hadn't thought of that angle. Sometimes I get all focused on local
heap memory and forget about all the automatic stuff happening behind the
scenes. I can say with a degree of certainty that going to Unicode has
worked for us over the last few years. There was some initial shock in the
transition, but it wasn't difficult to work through. Now we just do
everything Unicode so it makes life easier all around.

Tom

Hi Tom,
I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)
G

PackAddict

2007-09-07 16:46:06 UTC

I'm struggling with a similar issue. I have encryption algorithms peppered
throught some legacy code (some written in VB3). They don't work when a
regional code page that is Unicode based is selected.

Is there a way to override the system regional code page setting to force a
VB 6 application to use "English (United States)"?

Post by Tom Serface
Yeah, I hadn't thought of that angle. Sometimes I get all focused on local
heap memory and forget about all the automatic stuff happening behind the
scenes. I can say with a degree of certainty that going to Unicode has
worked for us over the last few years. There was some initial shock in the
transition, but it wasn't difficult to work through. Now we just do
everything Unicode so it makes life easier all around.
Tom

Hi Tom,
I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)
G

Joseph M. Newcomer

2007-09-07 21:28:55 UTC

Encryption should be independent of locale, so I'm curious how there could be a problem.
In addition, "algorithms peppered throughout" the code suggests that there are deeper
architectural problems, since in most cases there is exactly ONE instance of the
algorithm, in ONE place. They are probably written in terms of char*, which assumes a
NUL-terminated 8-bit character string, which makes them instantly obsolete. They should
be written in terms of counted byte strings, not character strings.
joe

Post by PackAddict
I'm struggling with a similar issue. I have encryption algorithms peppered
throught some legacy code (some written in VB3). They don't work when a
regional code page that is Unicode based is selected.
Is there a way to override the system regional code page setting to force a
VB 6 application to use "English (United States)"?

Hi Tom,
I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)
G

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

PackAddict

2007-09-10 15:52:14 UTC

I probably described the problem wrong.

A better description might be that there are Asc and Chr function calls
throughout the code, not duplication of algorithms throughout the code.
Those Asc and Chr function calls cause problems when a Chinese code page is
set as the default language. Each time that we hit a hex value with no
corresponding Ascii value in the code page, we get the "?' returned.
Needless to say, that causes some significant discrepences when
encrypting/decrypting a string of data.

I figured that I was going to have to move to byte arrays, but thought I'd
take a stab in the dark at a solution that would allow me to just overrid the
code page.

Post by Joseph M. Newcomer
Encryption should be independent of locale, so I'm curious how there could be a problem.
In addition, "algorithms peppered throughout" the code suggests that there are deeper
architectural problems, since in most cases there is exactly ONE instance of the
algorithm, in ONE place. They are probably written in terms of char*, which assumes a
NUL-terminated 8-bit character string, which makes them instantly obsolete. They should
be written in terms of counted byte strings, not character strings.
joe

Hi Tom,
I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)
G

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

2007-09-16 03:56:54 UTC

Since VB3 is an antique, I'm not at all sure what it was doing with its functions. So I
can't even begin to think about what might be going on (I thought VB3 was the Win16
version!)
joe

Post by PackAddict
I probably described the problem wrong.
A better description might be that there are Asc and Chr function calls
throughout the code, not duplication of algorithms throughout the code.
Those Asc and Chr function calls cause problems when a Chinese code page is
set as the default language. Each time that we hit a hex value with no
corresponding Ascii value in the code page, we get the "?' returned.
Needless to say, that causes some significant discrepences when
encrypting/decrypting a string of data.
I figured that I was going to have to move to byte arrays, but thought I'd
take a stab in the dark at a solution that would allow me to just overrid the
code page.

Hi Tom,
I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)
G

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Giovanni Dicanio

2007-09-05 07:52:02 UTC

Post by Marco Hung
I've created a MFC project in MBCS. I need to show some set special
characters
( ASCII code > 128) in a CStatic controls. It shows correctly in all
English locale window.
However all those special character becames "?" in non-English window.
How to solve this problem?

Hi,

This is a classical example of the importance of using *Unicode* to store
characters and strings.
IMHO, you should forget about ANSI (or MBCS), and consider *Unicode* as the
type for characters and strings (like modern programming languages like
Java, Python, C#, etc. do).

Basically, Unicode provides *unique* number for every character, no matter
what the programming language, or the operating system, etc.

I don't know what character you want to display, but e.g. suppose that you
want to display a lower-case Greek "omega" (kind of "w").
In Unicode UTF-16 encoding, the "unique number" associated to this character
is 0x03C9 (hex, note that its 16 bits, not 8 bits like for ANSI).

The C++ code to display that character in a message-box is like so:

// Build a string of Unicode UTF-16 characters:
// "omega" (0x03C9), end-of-string (0x0000)
wchar_t omega[] = { 0x03C9, 0x0000 };

// Display Unicode text (note the W and the L)
MessageBoxW( NULL, omega, L"Unicode Test", MB_OK );

The L before "Unicode Test" string literal identifies this string as Unicode
and not ANSI.
The W after MessageBox is a Win32 API naming convention to identify the
Unicode (and not the ANSI) version of MessageBox API.

If you compile in Unicode mode, you can avoid the W and just write
MessageBox; the C/C++ preprocessor will expand MessageBox as MessageBoxW.

You might find the Unicode FAQ http://unicode.org/faq/ and Mihai's blog
http://www.mihai-nita.net/ to be both interesting.

Giovanni

Joseph M. Newcomer

2007-09-05 13:15:13 UTC

There are many problems. First, there is no reason to use an array as you show; you could
just as easily have written

TCHAR stringToShow[] = { 129, 130, 131, 132, 133, 134, 135, 136, 137 };
or

TCHAR stringToShow[] = _T("\x81\x82\x83\x84\x85\x86\x87\x88\x89");

You should NOT be using GetDlgItem; this should be considered as obsolete except in very
rare and exotic situations, which you do not have an instance of. Create member
variables.

Part of the problem is that you are using MBCS, which means that character codes >=128 are
not actual characters, but part of a multibyte encoding, and therefore they are going to
be misinterpreted in all kinds of fascinating ways.

As already pointed out, forget that MBCS exists. It is dead technology. Use Unicode.
There is no real choice these days.
joe

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special characters ( ASCII code > 128) in a CStatic controls. It shows correctly in all English locale window. However all those special character becames "?" in non-English window. How to solve this problem? Here's part of my source code
void MySprcialCharacterDlg::OnUpdate()
{
TCHAR stringToShow[10];
ZeroMemory( stringToShow, sizeof(stringToShow) );
stringToShow[0] = 129;
stringToShow[1] = 130;
stringToShow[2] = 131;
stringToShow[3] = 132;
stringToShow[4] = 133;
stringToShow[5] = 134;
stringToShow[6] = 135;
stringToShow[7] = 136;
stringToShow[8] = 137;
::SetWindowText( GetDlgItem(IDC_STATIC_SPECIAL_CHAR), stringToShow );
}
Many thx.
Marco

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Marco Hung

2007-09-06 01:29:02 UTC

Thx all of you.

I understand that Unicode is the best way of string operations for morden
application. However, my appliaction need to communicate with a "Old" system
thr some API calls, which will always return string in "single character"
format. I think MBCS may be the only choice for it.

I've tried to convert the string to unicode using function like
"MultiByteToWideChar" and "SetWindowTextW", but the same output in display.
Is there any way to make the conversion correctly in all language windows?

Marco

Post by Joseph M. Newcomer
There are many problems. First, there is no reason to use an array as you show; you could
just as easily have written
TCHAR stringToShow[] = { 129, 130, 131, 132, 133, 134, 135, 136, 137 };
or
TCHAR stringToShow[] = _T("\x81\x82\x83\x84\x85\x86\x87\x88\x89");
You should NOT be using GetDlgItem; this should be considered as obsolete except in very
rare and exotic situations, which you do not have an instance of. Create member
variables.
Part of the problem is that you are using MBCS, which means that character codes >=128 are
not actual characters, but part of a multibyte encoding, and therefore they are going to
be misinterpreted in all kinds of fascinating ways.
As already pointed out, forget that MBCS exists. It is dead technology.

Use Unicode.

Post by Joseph M. Newcomer
There is no real choice these days.
joe
On Wed, 5 Sep 2007 11:54:55 +0800, "Marco Hung"

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special

characters ( ASCII code > 128) in a CStatic controls. It shows correctly in
all English locale window. However all those special character becames "?"
in non-English window. How to solve this problem? Here's part of my source
code

Post by Marco Hung
void MySprcialCharacterDlg::OnUpdate()
{
TCHAR stringToShow[10];
ZeroMemory( stringToShow, sizeof(stringToShow) );
stringToShow[0] = 129;
stringToShow[1] = 130;
stringToShow[2] = 131;
stringToShow[3] = 132;
stringToShow[4] = 133;
stringToShow[5] = 134;
stringToShow[6] = 135;
stringToShow[7] = 136;
stringToShow[8] = 137;
::SetWindowText( GetDlgItem(IDC_STATIC_SPECIAL_CHAR), stringToShow );
}
Many thx.
Marco

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

David Wilkinson

2007-09-06 01:41:09 UTC

Post by Marco Hung
Thx all of you.
I understand that Unicode is the best way of string operations for morden
application. However, my appliaction need to communicate with a "Old" system
thr some API calls, which will always return string in "single character"
format. I think MBCS may be the only choice for it.
I've tried to convert the string to unicode using function like
"MultiByteToWideChar" and "SetWindowTextW", but the same output in display.
Is there any way to make the conversion correctly in all language windows?

Marco:

If you know the code page of the 8-bit strings, then
MultiByteToWideChar() should work. If you don't you are in trouble.

--
David Wilkinson
Visual C++ MVP

Joseph M. Newcomer

2007-09-06 04:20:20 UTC

Note that MBCS is not the same as "ANSI" (a bad name choice). MBCS uses sequences of
8-bit characters to represent characters, and as far as I know, there are no API calls
that take MBCS strings. They take either ANSI or Unicode.

You can't just say "MultiByteToWideChar" since there are critical parameters that you have
omitted telling us about, such as what code page you specified, and whether or not you
have true MBCS (e.g., UTF-7, UTF-8) or just 8-bit characters. Certainly the example you
gave of 128, 129, 130, ...137 is not UTF-8, and in fact these code points are not defined
in most character sets (although 128 is the official Euro symbol in a lot of fonts), so
you have supplied rather incomplete information on what you are doing, trying to do, and
how you are doing it. MBCS is *not* a substitute for ANSI, since there are no APIs that
actually use it. So you need to say a lot more about what is going on here before the
question even begins to make sense.
joe

Post by Joseph M. Newcomer
There are many problems. First, there is no reason to use an array as you

show; you could

Post by Joseph M. Newcomer
just as easily have written
TCHAR stringToShow[] = { 129, 130, 131, 132, 133, 134, 135, 136, 137 };
or
TCHAR stringToShow[] = _T("\x81\x82\x83\x84\x85\x86\x87\x88\x89");
You should NOT be using GetDlgItem; this should be considered as obsolete

except in very

Post by Joseph M. Newcomer
rare and exotic situations, which you do not have an instance of. Create

member

Post by Joseph M. Newcomer
variables.
Part of the problem is that you are using MBCS, which means that character

codes >=128 are

Post by Joseph M. Newcomer
not actual characters, but part of a multibyte encoding, and therefore

they are going to

Post by Joseph M. Newcomer
be misinterpreted in all kinds of fascinating ways.
As already pointed out, forget that MBCS exists. It is dead technology.

Use Unicode.

Post by Joseph M. Newcomer
There is no real choice these days.
joe
On Wed, 5 Sep 2007 11:54:55 +0800, "Marco Hung"

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Marco Hung

2007-09-06 06:22:57 UTC

Sorry for my misleading question. Let me explain more in my problem.

My application will call an extranl dll, which will return a string as
result ( should be a list of ASCII code from 0~255 ). My application will
then display the result in an Edit box.

The result only consists of characters from A~Z plus 2 special characters
( ? (0x87) & € (0xA4) ). The edit box display correct if I run my
application in English Windws. However in non-English system, all these 2
characters will display as "?"

Here's my exact coding in my application.

OnStart(CString strCommand)
{
CMyLiberaryObject MyLibObj;
char *strResult = MyLibObj.ProcessCommand( (LPCTSTR) strCommand ); //
return type is char*

BSTR bstr = NULL;
int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
, -1, NULL, NULL);
bstr = ::SysAllocStringLen(NULL, nConvertedLen);
if (bstr != NULL)
MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
bstr, nConvertedLen);
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);
SysFreeString(bstr);

MyLibObj.Complete();
}

Rgds,
Marco

Post by Joseph M. Newcomer
Note that MBCS is not the same as "ANSI" (a bad name choice). MBCS uses sequences of
8-bit characters to represent characters, and as far as I know, there are no API calls
that take MBCS strings. They take either ANSI or Unicode.
You can't just say "MultiByteToWideChar" since there are critical parameters that you have
omitted telling us about, such as what code page you specified, and whether or not you
have true MBCS (e.g., UTF-7, UTF-8) or just 8-bit characters. Certainly the example you
gave of 128, 129, 130, ...137 is not UTF-8, and in fact these code points are not defined
in most character sets (although 128 is the official Euro symbol in a lot of fonts), so
you have supplied rather incomplete information on what you are doing, trying to do, and
how you are doing it. MBCS is *not* a substitute for ANSI, since there are no APIs that
actually use it. So you need to say a lot more about what is going on here before the
question even begins to make sense.
joe
On Thu, 6 Sep 2007 09:29:02 +0800, "Marco Hung"

Post by Joseph M. Newcomer
There are many problems. First, there is no reason to use an array as you

show; you could

except in very

Post by Joseph M. Newcomer
rare and exotic situations, which you do not have an instance of.

Create

Post by Marco Hung
member

Post by Joseph M. Newcomer
variables.
Part of the problem is that you are using MBCS, which means that character

codes >=128 are

Post by Joseph M. Newcomer
not actual characters, but part of a multibyte encoding, and therefore

they are going to

Post by Joseph M. Newcomer
be misinterpreted in all kinds of fascinating ways.
As already pointed out, forget that MBCS exists. It is dead technology.

Use Unicode.

Post by Joseph M. Newcomer
There is no real choice these days.
joe
On Wed, 5 Sep 2007 11:54:55 +0800, "Marco Hung"

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special

stringToShow );

Post by Marco Hung

Post by Marco Hung
}
Many thx.
Marco

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

David Wilkinson

2007-09-06 09:34:13 UTC

Post by Marco Hung
Sorry for my misleading question. Let me explain more in my problem.
My application will call an extranl dll, which will return a string as
result ( should be a list of ASCII code from 0~255 ). My application will
then display the result in an Edit box.
The result only consists of characters from A~Z plus 2 special characters
( ? (0x87) & � (0xA4) ). The edit box display correct if I run my
application in English Windws. However in non-English system, all these 2
characters will display as "?"
Here's my exact coding in my application.
OnStart(CString strCommand)
{
CMyLiberaryObject MyLibObj;
char *strResult = MyLibObj.ProcessCommand( (LPCTSTR) strCommand ); //
return type is char*
BSTR bstr = NULL;
int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
, -1, NULL, NULL);
bstr = ::SysAllocStringLen(NULL, nConvertedLen);
if (bstr != NULL)
MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
bstr, nConvertedLen);
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);
SysFreeString(bstr);
MyLibObj.Complete();
}

Marco:

If your characters are all ISO-8859-1 characters (as would seem to be
the case) then, as I said before, you should just be able to copy (not
convert) them into an array of wchar_t, and use SetWindowTextW. This is
because the first 256 code points of Unicode (and the UTF-16 encoding of
it) are the same as ISO-8859-1. Or you could use MultiByteToWideChar()
with the code page always set to English. You do not want to use
MultiByteToWideChar() with the local code page.

Actually, I am confused by your code. The only purpose to using TCHAR,
LPCTSTR, etc, is to have an app that will compile as both ANSI and
Unicode. This surely cannot be the case for you, as this would mean that
your legacy CLibrarayObject::ProcessCommand() would have to accept a
const whar_t* and return a char*.

I think you would be best to write your whole app in Unicode and do what
you have to to convert to and from 8-bit strings only when using your
legacy library.

--
David Wilkinson
Visual C++ MVP

David Wilkinson

2007-09-06 14:36:06 UTC

Post by David Wilkinson
If your characters are all ISO-8859-1 characters (as would seem to be
the case) then, as I said before, you should just be able to copy (not
convert) them into an array of wchar_t, and use SetWindowTextW. This is
because the first 256 code points of Unicode (and the UTF-16 encoding of
it) are the same as ISO-8859-1. Or you could use MultiByteToWideChar()
with the code page always set to English. You do not want to use
MultiByteToWideChar() with the local code page.
Actually, I am confused by your code. The only purpose to using TCHAR,
LPCTSTR, etc, is to have an app that will compile as both ANSI and
Unicode. This surely cannot be the case for you, as this would mean that
your legacy CLibrarayObject::ProcessCommand() would have to accept a
const whar_t* and return a char*.
I think you would be best to write your whole app in Unicode and do what
you have to to convert to and from 8-bit strings only when using your
legacy library.

Marco:

I see you are already converting using code page 1252 (I didn't notice
that before). This should work if you do it correctly, but I'm not sure
you are (see Joe's reply).

--
David Wilkinson
Visual C++ MVP

Joseph M. Newcomer

2007-09-06 14:13:58 UTC

See below...

****
There's a problem here. What is the parameter of the function ProcessCommand? Is it
really LPCTSTR (8-bit or Unicode depending on compilation mode)? Or is it 8-bit? The
LPCTSTR cast would be dangerous in a Unicode build if the function takes char *.

Given it returns a char *, who is freeing it? This is inherently dangerous that it would
return a pointer to a fixed buffer, so it should really be returning a CStringA, or at the
very least a char * on the heap which needs to be freed.

Code that returns a pointer to a fixed buffer is not thread-safe, and should be considered
*dangerously obsolete* at this point (think Unicode, think multithreading, ALWAYS)
***

Post by Marco Hung
BSTR bstr = NULL;

****
Why are you allocating a BSTR here? Why not an LPWSTR? BSTRs have additional overheads,
such as reference counting, and since you are not using any of those, an LPWSTR would be
fine.
****

Post by Marco Hung
int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
, -1, NULL, NULL);

****
This tells you to convert the string using code page 1252, ISO-8859-1 (Latin-1). Given
that you have said that you only use A-Z and two special characters, MB_COMPOSITE has no
meaning here, and should be omitted.
****

Post by Marco Hung
bstr = ::SysAllocStringLen(NULL, nConvertedLen);

****
LPWSTR bstr = new WCHAR[nConvertedLen];

there was no need to declare an initialize a pointer before it is used, and there is
certainly no need for a BSTR, so get rid of it
****

Post by Marco Hung
if (bstr != NULL)
MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
bstr, nConvertedLen);

****
Get rid of the MB_COMPOSITE
****

Post by Marco Hung
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);

****
Create a control variable. Generally, assume that if you have written GetDlgItem, except
in EXTREMELY RARE CIRCUMSTANCES (of which this is not one) you have made a fundamental
design error. Because you are trying to write a Unicode string in an ANSI app, you would
need to write
::SetWindowTextW(c_Result.m_hWnd, bstr);
although it would make much more sense to compile this as a Unicode app (beware the
parameter issue already mentioned!) and just write
c_Result.SetWindowText(bstr);
****

Post by Marco Hung
SysFreeString(bstr);

*****
delete [] bstr;
why use something as complicated as a BSTR for such a trivial purpose?

Now you've got some other issues here. For example, what font is loaded into the edit
control? Is the result of the MultiByteToWideChar correct, or does it already have the
erroneous '?' in it? There are too many variables here and you have not isolated the
problem adequately.
****

Post by Marco Hung
MyLibObj.Complete();
}
Rgds,
Marco

Post by Joseph M. Newcomer
Note that MBCS is not the same as "ANSI" (a bad name choice). MBCS uses

sequences of

Post by Joseph M. Newcomer
8-bit characters to represent characters, and as far as I know, there are

no API calls

Post by Joseph M. Newcomer
that take MBCS strings. They take either ANSI or Unicode.
You can't just say "MultiByteToWideChar" since there are critical

parameters that you have

Post by Joseph M. Newcomer
omitted telling us about, such as what code page you specified, and

whether or not you

Post by Joseph M. Newcomer
have true MBCS (e.g., UTF-7, UTF-8) or just 8-bit characters. Certainly

the example you

Post by Joseph M. Newcomer
gave of 128, 129, 130, ...137 is not UTF-8, and in fact these code points

are not defined

Post by Joseph M. Newcomer
in most character sets (although 128 is the official Euro symbol in a lot

of fonts), so

Post by Joseph M. Newcomer
you have supplied rather incomplete information on what you are doing,

trying to do, and

Post by Joseph M. Newcomer
how you are doing it. MBCS is *not* a substitute for ANSI, since there

are no APIs that

Post by Joseph M. Newcomer
actually use it. So you need to say a lot more about what is going on

here before the

Post by Joseph M. Newcomer
question even begins to make sense.
joe
On Thu, 6 Sep 2007 09:29:02 +0800, "Marco Hung"

Post by Marco Hung
Thx all of you.
I understand that Unicode is the best way of string operations for morden
application. However, my appliaction need to communicate with a "Old"

system

Post by Marco Hung
thr some API calls, which will always return string in "single character"
format. I think MBCS may be the only choice for it.
I've tried to convert the string to unicode using function like
"MultiByteToWideChar" and "SetWindowTextW", but the same output in

display.

Post by Marco Hung
Is there any way to make the conversion correctly in all language

windows?

Post by Marco Hung
Marco

Post by Joseph M. Newcomer
There are many problems. First, there is no reason to use an array as

you

Post by Marco Hung
show; you could

obsolete

Post by Marco Hung
except in very

Post by Joseph M. Newcomer
rare and exotic situations, which you do not have an instance of.

Create

Post by Marco Hung
member

Post by Joseph M. Newcomer
variables.
Part of the problem is that you are using MBCS, which means that

character

Post by Marco Hung
codes >=128 are

Post by Joseph M. Newcomer
not actual characters, but part of a multibyte encoding, and therefore

they are going to

Post by Joseph M. Newcomer
be misinterpreted in all kinds of fascinating ways.
As already pointed out, forget that MBCS exists. It is dead

technology.

Post by Marco Hung
Use Unicode.

Post by Joseph M. Newcomer
There is no real choice these days.
joe
On Wed, 5 Sep 2007 11:54:55 +0800, "Marco Hung"

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special

characters ( ASCII code > 128) in a CStatic controls. It shows correctly

Post by Marco Hung
all English locale window. However all those special character becames

"?"

Post by Marco Hung
in non-English window. How to solve this problem? Here's part of my

source

Post by Marco Hung
code

stringToShow );

Post by Marco Hung

Post by Marco Hung
}
Many thx.
Marco

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Norman Diamond

2007-09-07 00:18:14 UTC

Post by Marco Hung
( should be a list of ASCII code from 0~255 )

ASCII codes are 0~127.

If you're having code page problems it's because you're dealing with ANSI
code pages other than ASCII. Some code pages (mostly European) are 0~255.
Some (Asian) are basically 0~65535, but of course some portions of that
range can't be used, so they use 0~127 and part of 32768~65535.

If a value isn't a valid character in your code page (for example number 529
in code page 1252 or number 129 in code page 932) then of course you get
garbage.

Post by Marco Hung
Sorry for my misleading question. Let me explain more in my problem.
My application will call an extranl dll, which will return a string as
result ( should be a list of ASCII code from 0~255 ). My application will
then display the result in an Edit box.
The result only consists of characters from A~Z plus 2 special characters
( ? (0x87) & $B!"(B (0xA4) ). The edit box display correct if I run my
application in English Windws. However in non-English system, all these 2
characters will display as "?"
Here's my exact coding in my application.
OnStart(CString strCommand)
{
CMyLiberaryObject MyLibObj;
char *strResult = MyLibObj.ProcessCommand( (LPCTSTR) strCommand ); //
return type is char*
BSTR bstr = NULL;
int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
, -1, NULL, NULL);
bstr = ::SysAllocStringLen(NULL, nConvertedLen);
if (bstr != NULL)
MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
bstr, nConvertedLen);
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);
SysFreeString(bstr);
MyLibObj.Complete();
}
Rgds,
Marco

Post by Joseph M. Newcomer
Note that MBCS is not the same as "ANSI" (a bad name choice). MBCS uses

sequences of

Post by Joseph M. Newcomer
8-bit characters to represent characters, and as far as I know, there are

no API calls

Post by Joseph M. Newcomer
that take MBCS strings. They take either ANSI or Unicode.
You can't just say "MultiByteToWideChar" since there are critical

parameters that you have

Post by Joseph M. Newcomer
omitted telling us about, such as what code page you specified, and

whether or not you

Post by Joseph M. Newcomer
have true MBCS (e.g., UTF-7, UTF-8) or just 8-bit characters. Certainly

the example you

Post by Joseph M. Newcomer
gave of 128, 129, 130, ...137 is not UTF-8, and in fact these code points

are not defined

Post by Joseph M. Newcomer
in most character sets (although 128 is the official Euro symbol in a lot

of fonts), so

Post by Joseph M. Newcomer
you have supplied rather incomplete information on what you are doing,

trying to do, and

Post by Joseph M. Newcomer
how you are doing it. MBCS is *not* a substitute for ANSI, since there

are no APIs that

Post by Joseph M. Newcomer
actually use it. So you need to say a lot more about what is going on

here before the

Post by Joseph M. Newcomer
question even begins to make sense.
joe
On Thu, 6 Sep 2007 09:29:02 +0800, "Marco Hung"

Post by Marco Hung
Thx all of you.
I understand that Unicode is the best way of string operations for morden
application. However, my appliaction need to communicate with a "Old"

system

display.

Post by Marco Hung
Is there any way to make the conversion correctly in all language

windows?

Post by Marco Hung
Marco

Post by Joseph M. Newcomer
There are many problems. First, there is no reason to use an array as

you

Post by Marco Hung
show; you could

obsolete

Post by Marco Hung
except in very

Post by Joseph M. Newcomer
rare and exotic situations, which you do not have an instance of.

Create

Post by Marco Hung
member

Post by Joseph M. Newcomer
variables.
Part of the problem is that you are using MBCS, which means that

character

Post by Marco Hung
codes >=128 are

Post by Joseph M. Newcomer
not actual characters, but part of a multibyte encoding, and therefore

they are going to

Post by Joseph M. Newcomer
be misinterpreted in all kinds of fascinating ways.
As already pointed out, forget that MBCS exists. It is dead

technology.

Post by Marco Hung
Use Unicode.

Post by Joseph M. Newcomer
There is no real choice these days.
joe
On Wed, 5 Sep 2007 11:54:55 +0800, "Marco Hung"

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special

characters ( ASCII code > 128) in a CStatic controls. It shows correctly

Post by Marco Hung
all English locale window. However all those special character becames

"?"

Post by Marco Hung
in non-English window. How to solve this problem? Here's part of my

source

Post by Marco Hung
code

stringToShow );

Post by Marco Hung

Post by Marco Hung
}
Many thx.
Marco

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Uwe Kotyczka

2020-04-27 09:16:18 UTC

[...]
wchar_t* str = [...]; // generate unicode string properly
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), str);

I know, I'm a little late (almost 13 years in fact). But I just had
to face the same problem and found a solution, which is not mentioned
in this old thread.

In my case I wanted to show an "arrow up" sign on a square button.
I found that SetWindowTextW was working and was happy. But then I
complied the very same project on another computer and found that
the button did not show an "arrow up", but a "question mark" instead.
Digging around with GetWindowTextW I found that the button returned
diffrerent wide character strings on both machines.

I found this working on both computers:

BOOL WINAPI SafeSetWindowTextW(HWND hWnd, LPCWSTR lpString)
{
// switch to DefWindowProcW
LONG_PTR originalWndProc = GetWindowLongPtrW(hWnd, GWLP_WNDPROC);
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, (LONG_PTR) DefWindowProcW);

// set window text
BOOL bResult = SetWindowTextW(hWnd, lpString);

// switch to back to originalWndProc
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, originalWndProc);

return bResult;
}

LOGFONT lf;
HFONT hOrigFont = (HFONT)SendMessage(GetDlgItem(hDlg, IDC_BUTTON_DIODES_SHIFT_UP), WM_GETFONT, 0, 0);
GetObject(hOrigFont, sizeof(lf), &lf);
lf.lfHeight *= 2;
memset(lf.lfFaceName, 0, sizeof(lf.lfFaceName));
strncpy(lf.lfFaceName, "Consolas", min(sizeof(lf.lfFaceName)-1, strlen("Consolas")));
HFONT hOtherFont = CreateFontIndirect(&lf);
SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), WM_SETFONT, (WPARAM)hOtherFont, 0);
SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_DOWN), WM_SETFONT, (WPARAM)hOtherFont, 0);
SafeSetWindowTextW(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), L"\x2191");
SafeSetWindowTextW(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_DOWN), L"\x2193");

Of course this will not work on Win9x, but it shouldn't be a problem nowadays.

HTH

Uwe Kotyczka

2020-04-27 09:18:09 UTC

[...]
wchar_t* str = [...]; // generate unicode string properly
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), str);

I know, I'm a little late (almost 13 years in fact). But I just had
to face the same problem and found a solution, which is not mentioned
in this old thread.

In my case I wanted to show an "arrow up" sign on a square button.
I found that SetWindowTextW was working and was happy. But then I
complied the very same project on another computer and found that
the button did not show an "arrow up", but a "question mark" instead.
Digging around with GetWindowTextW I found that the button returned
diffrerent wide character strings on both machines.

I found this working on both computers:

BOOL WINAPI SafeSetWindowTextW(HWND hWnd, LPCWSTR lpString)
{
// switch to DefWindowProcW
LONG_PTR originalWndProc = GetWindowLongPtrW(hWnd, GWLP_WNDPROC);
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, (LONG_PTR) DefWindowProcW);

// set window text
BOOL bResult = SetWindowTextW(hWnd, lpString);

// switch to back to originalWndProc
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, originalWndProc);

return bResult;
}

LOGFONT lf;
HFONT hOrigFont = (HFONT)SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), WM_GETFONT, 0, 0);
GetObject(hOrigFont, sizeof(lf), &lf);
lf.lfHeight *= 2;
memset(lf.lfFaceName, 0, sizeof(lf.lfFaceName));
strncpy(lf.lfFaceName, "Consolas", min(sizeof(lf.lfFaceName)-1, strlen("Consolas")));
HFONT hOtherFont = CreateFontIndirect(&lf);
SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), WM_SETFONT, (WPARAM)hOtherFont, 0);
SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_DOWN), WM_SETFONT, (WPARAM)hOtherFont, 0);
SafeSetWindowTextW(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), L"\x2191");
SafeSetWindowTextW(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_DOWN), L"\x2193");

Of course this will not work on Win9x, but it shouldn't be a problem nowadays.

HTH

Pitzelpatz

2020-05-20 12:09:03 UTC

// switch to DefWindowProcW

Post by Uwe Kotyczka
LONG_PTR originalWndProc = GetWindowLongPtrW(hWnd, GWLP_WNDPROC);
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, (LONG_PTR) DefWindowProcW);
// set window text
BOOL bResult = SetWindowTextW(hWnd, lpString);
// switch to back to originalWndProc
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, originalWndProc);

Hi Uwe,

great! Thank you, I had the same Problem today and your solution worked
for me.

Br
Christian

Mihai N.

2007-09-06 06:49:30 UTC

This is again the Windows lingo.
- SBCS = single byte character set
- DBCS = double byte character set
- MBCS = multi-byte character set

So DBCS is a particular case of MBCS.

The only encoding supported by Windows that is MBCS without being DBCS is
GB-18030 (and I bet this is the one used by the legacy thing :-)
When you define DBCS in an application there are very few APIs affected.

Technically UTF-7/UTF-8 are not MBCS/DBCS/SBCS, because they are not CS
(character sets).
The code pages used for CCJK are MBCS.

In the Windows lingo ANSI code page means "the default system code page"
So some DBCS can be ANSI (932 Jp, 936 SC, 949 Ko, 950 TC)

My best bet: the old legacy thing uses ANSI (because all the system
calls used to be ANSI).
So you can probably use the ANSI code page (CP_ACP or as returned by GetACP)
If the old thing was designed to only handle English, then you can probably
safely use 1252 for conversion.

In any case, you should axpect some data loss.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Norman Diamond

2007-09-07 00:12:45 UTC

Post by Joseph M. Newcomer
Note that MBCS is not the same as "ANSI"

Huh?

Post by Joseph M. Newcomer
(a bad name choice).

Yes "ANSI" is a bad name choice, but the meaning is the same as MBCS.

Post by Joseph M. Newcomer
MBCS uses sequences of 8-bit characters to represent characters,

Yes.

Post by Joseph M. Newcomer
and as far as I know, there are no API calls that take MBCS strings.

The ones that end in "A" take MBCS strings. Most of them work by converting
to Unicode before calling NT internal routines and converting back to MBCS
before returning to the caller. Some such as WTSQuerySessionInformationA
don't work. (ANSI applications have to call WTSQuerySessionInformationW
explicitly, including the W, and do the conversions themselves.)

Post by Joseph M. Newcomer
They take either ANSI or Unicode.

Yes. The ones that end in A take "ANSI" i.e. MBCS, and the ones that end in
W take Unicode i.e. UTF-16.

Post by Joseph M. Newcomer
There are many problems. First, there is no reason to use an array as you

show; you could

except in very

Post by Joseph M. Newcomer
rare and exotic situations, which you do not have an instance of.
Create

member

Post by Joseph M. Newcomer
variables.
Part of the problem is that you are using MBCS, which means that character

codes >=128 are

Post by Joseph M. Newcomer
not actual characters, but part of a multibyte encoding, and therefore

they are going to

Post by Joseph M. Newcomer
be misinterpreted in all kinds of fascinating ways.
As already pointed out, forget that MBCS exists. It is dead technology.

Use Unicode.

Post by Joseph M. Newcomer
There is no real choice these days.
joe
On Wed, 5 Sep 2007 11:54:55 +0800, "Marco Hung"

Post by Marco Hung
Hi All,
I've created a MFC project in MBCS. I need to show some set special

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Mihai N.

2007-09-07 07:01:30 UTC

Post by Norman Diamond
Yes "ANSI" is a bad name choice, but the meaning is the same as MBCS.

Almost.

ANSI can be SBCS or MBCS. But it is one of them.
The system has one ANSI code page and only one at a certain time
(the system code page), and changing it requires a reboot.

932 (Shift-JIS), 950 (Big5), etc, are all MBCS.
Any one of them can be the ANSI code page in a certain session.
But not all of them.
Then you have other code pages, like EUC-JP or GBK, that are DBCS,
but cannot be ANSI (they can never be used as system locale).

But this is just lingo.

For a programmer using Dev Studio the lingo means something else.

If you go in Dev Studio you only have 3 options for Character set
1. Not set (nothing defined)
2. Multi-Byte Character Set (_MBCS defined)
3. Unicode Character Set (_UNICODE and UNICODE defined)

In most cases there is no difference between 1. and 2.
If you use MessageBox for 1. and 2. will become MessageBoxA,
and for 3. it will become MessageBoxW.

But look at things like _tcsclen.
In case 1. will become strlen, in case 2. it becomes _mbslen,
and in case 3. it becomes wcslen.

This is why sometimes you have to be very carefull what you use
when you convert to generic text handling. Will you replace strlen
with _tcslen, or with _tcsclen?
(in most cases the answer is _tcslen, but there are exceptions)

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Tom Serface

2007-09-07 15:07:34 UTC

If I remember correctly, although technically incorrect, in regards to
Windows and MFC specifically, ANSI is MBCS. I've always considered SBCS to
just be a subset of MBCS.

Of course any program I've ever done has either MBCS or UNICODE defined so
perhaps that where I'm getting it.

Tom

Post by Norman Diamond
Yes "ANSI" is a bad name choice, but the meaning is the same as MBCS.

Almost.
ANSI can be SBCS or MBCS. But it is one of them.
The system has one ANSI code page and only one at a certain time
(the system code page), and changing it requires a reboot.
932 (Shift-JIS), 950 (Big5), etc, are all MBCS.
Any one of them can be the ANSI code page in a certain session.
But not all of them.
Then you have other code pages, like EUC-JP or GBK, that are DBCS,
but cannot be ANSI (they can never be used as system locale).
But this is just lingo.
For a programmer using Dev Studio the lingo means something else.
If you go in Dev Studio you only have 3 options for Character set
1. Not set (nothing defined)
2. Multi-Byte Character Set (_MBCS defined)
3. Unicode Character Set (_UNICODE and UNICODE defined)
In most cases there is no difference between 1. and 2.
If you use MessageBox for 1. and 2. will become MessageBoxA,
and for 3. it will become MessageBoxW.
But look at things like _tcsclen.
In case 1. will become strlen, in case 2. it becomes _mbslen,
and in case 3. it becomes wcslen.
This is why sometimes you have to be very carefull what you use
when you convert to generic text handling. Will you replace strlen
with _tcslen, or with _tcsclen?
(in most cases the answer is _tcslen, but there are exceptions)
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Mihai N.

2007-09-08 05:32:25 UTC

Post by Tom Serface
If I remember correctly, although technically incorrect, in regards to
Windows and MFC specifically, ANSI is MBCS. I've always considered SBCS to
just be a subset of MBCS.

I would agree that SBCS is just be a subset of DBCS,
and DBCS a subset of MBCS.
ANSI is the MBCS the that is currenty system code page :-)

The MS lingo in this area is a mess, so one should be pretty
flexible with the definitions here :-)

For a programmer the only important part is: what are the implications
of defining _MBCS / UNICODE / _UNICODE / nothing?

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Tom Serface

2007-09-08 16:08:04 UTC

No doubt about it being a mess. I think it would be wise to just abandon
ANSI/MBCS/Whatever and go to Unicode with good file handling for UTF-8 in a
future version of MFC. Dropping ANSI doesn't seem to keep anyone from using
C# and .NET.

Tom

Post by Tom Serface
If I remember correctly, although technically incorrect, in regards to
Windows and MFC specifically, ANSI is MBCS. I've always considered SBCS to
just be a subset of MBCS.

I would agree that SBCS is just be a subset of DBCS,
and DBCS a subset of MBCS.
ANSI is the MBCS the that is currenty system code page :-)
The MS lingo in this area is a mess, so one should be pretty
flexible with the definitions here :-)
For a programmer the only important part is: what are the implications
of defining _MBCS / UNICODE / _UNICODE / nothing?

Mihai N.

2007-09-06 06:40:11 UTC

Post by Marco Hung
I understand that Unicode is the best way of string operations for morden
application. However, my appliaction need to communicate with a "Old" system
thr some API calls, which will always return string in "single character"
format. I think MBCS may be the only choice for it.
I've tried to convert the string to unicode using function like
"MultiByteToWideChar" and "SetWindowTextW", but the same output in display.
Is there any way to make the conversion correctly in all language windows?

Then maybe the best thing is to have the whole application Unicode, and
convert back and forth when you comunicate with the legacy part.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Tom Serface

2007-09-06 16:06:23 UTC

Hi Marco,

You can get this to work so long as you know the code page you need for the
language or you are running only on the machine where that language is
installed and the correct region is set. We tried this for years and could
never get it to work right since our software was installed in so many
configurations so we finally went to Unicode and we just convert the
external strings and files to Unicode to use them rather than trying to go
the other way. So far this approach has worked well. So to answer your
question, yes you can theoretically get it to work, but the number of
parameters involved is often difficult to control.

Tom

David Wilkinson

2007-09-06 01:33:51 UTC