Discussion:
wprintf() could not display unicode chars which is > 255 to console?
(too old to reply)
David
2006-01-12 10:58:29 UTC
Permalink
Hi£¬

I encountered a problem about wprintf() to display unicode chars which is >
255 in my console app. I am sure I have defined _UNICODE in my VC6 Project
Setting.

I had a search with google and found that my question was some like the
case:
http://blog.kalmbachnet.de/?postid=23 .

Is it true that there is some bugs in wprintf()?

Thanks for your help.

David
Vipin
2006-01-12 11:56:25 UTC
Permalink
I assume you are writting a c++ program. Can you give preference for
ostream cout, wcout on printf/wprintf?

use wcout like this:-

wcout << L"How are you?"
--
Vipin Aravind
MVP
Post by David
Hi£¬
I encountered a problem about wprintf() to display unicode chars which is
255 in my console app. I am sure I have defined _UNICODE in my VC6 Project
Setting.
I had a search with google and found that my question was some like the
http://blog.kalmbachnet.de/?postid=23 .
Is it true that there is some bugs in wprintf()?
Thanks for your help.
David
Joseph M. Newcomer
2006-01-14 07:52:49 UTC
Permalink
Why use something as dead-obsolete as wprintf anyway? Use CString::Format instead.
wprintf was a kludge for 16-bit Windows and is maintained only for backward compatibility
with obsolete code.

It is probably safe to assume that wprintf has bugs. But since there is absolutely no
sane reason to consider using it, forget it exists.
joe
Post by David
Hi£¬
I encountered a problem about wprintf() to display unicode chars which is >
255 in my console app. I am sure I have defined _UNICODE in my VC6 Project
Setting.
I had a search with google and found that my question was some like the
http://blog.kalmbachnet.de/?postid=23 .
Is it true that there is some bugs in wprintf()?
Thanks for your help.
David
Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
Mihai N.
2006-01-15 02:04:48 UTC
Permalink
Post by Joseph M. Newcomer
Why use something as dead-obsolete as wprintf anyway?
Use CString::Format instead.
wprintf was a kludge for 16-bit Windows and is maintained only for backward
compatibility with obsolete code.
wprintf is ANSI C, not obsolete. It deals with Unicode strings, which where
not even there in Windows 16-bit, so I think you are counfusing it with
something else.

For Unicode console output, CString::Format, wprintf or wcout are equivalent,
the limitations belong to the console.

On the other side, I agree that if you don't need cross-platform code,
CString::FormatMessage is better, because it does allow to change the order
of the placeholders without changing the order of the parameters.
This is good for internationalization.
CString::Format and printf are equivalent, and cout (iostream) is a crap,
which encourages concatenation.

===================================================================
This is not directed to Joe, but to whoever thinks iostream is ok:
-------------------------------------------------------------------
C++ die-hards will complain here, and I know iostream is type-safe (unlike
the other options). But is a mess to use for internationalization.
If you really want "C++ purity", then you can go with boost format library.

And before attaking me on this, read the two topics on string formatters in
"Exceptional C++ Style" :-)
-------------------------------------------------------------------
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
David
2006-01-15 08:56:21 UTC
Permalink
Thank you all.

I have to use WideCharToMultiByte() to convert the unicode strings to MBCS
before I print them to console with printf(). It works fine now. Maybe this
is a defect of DOS Command of Win2K by design. :-)
Mihai N.
2006-01-15 19:29:58 UTC
Permalink
Post by David
I have to use WideCharToMultiByte() to convert the unicode strings to MBCS
before I print them to console with printf(). It works fine now. Maybe this
is a defect of DOS Command of Win2K by design. :-)
I would say it is the DOS Command. And Win2K (and newer) cannot really change
this beacause of backward compatibility. And maybe low priority: who cares
about console anymore?

Personally, I can hardly wait for Monad.

--------------------------------

You "convert the unicode strings to MBCS", but you should be careful what
code page you use, ANSI or OEM.
Alternatively, you might try changing the console code page with chcp or
programatically (SetConsoleCP, SetConsoleOutputCP, SetFileApisToANSI).

Here are some links about console and international text:

Consoling people about their troubles with the console.
(http://blogs.msdn.com/michkap/archive/2005/06/29/433669.aspx)

Working with the OEMCP and the ACP
(http://blogs.msdn.com/michkap/archive/2006/01/02/508576.aspx)

Why ACP != OEMCP (usually)
(http://blogs.msdn.com/michkap/archive/2005/02/08/369197.aspx)
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Alexander Grigoriev
2006-01-16 03:30:07 UTC
Permalink
Console by default uses OEM codepage (which is a braindead remainder from
MS-DOS times).
UseSetConsoleCP to change it to ANSI codepage.
Post by David
Thank you all.
I have to use WideCharToMultiByte() to convert the unicode strings to MBCS
before I print them to console with printf(). It works fine now. Maybe this
is a defect of DOS Command of Win2K by design. :-)
Isaac Chen
2006-01-22 04:17:41 UTC
Permalink
David,

wprintf does have some Unicode support.

You don't need to use WideCharToMultiByte if you
1. use setlocale to a proper locale (that contains the characters you want
to output)
2. chcp to a proper code page
3. choose a proper console font

Since you can use WideCharToMultiByte to achieve what you want, I believe
you have 2 and 3 above already. As to 1, most likely, a "setlocale(LC_CTYPE,
"");" is enough.

Isaac Chen
Post by David
Thank you all.
I have to use WideCharToMultiByte() to convert the unicode strings to MBCS
before I print them to console with printf(). It works fine now. Maybe this
is a defect of DOS Command of Win2K by design. :-)
David Wilkinson
2006-01-15 22:15:45 UTC
Permalink
Post by Mihai N.
Post by Joseph M. Newcomer
Why use something as dead-obsolete as wprintf anyway?
Use CString::Format instead.
wprintf was a kludge for 16-bit Windows and is maintained only for backward
compatibility with obsolete code.
wprintf is ANSI C, not obsolete. It deals with Unicode strings, which where
not even there in Windows 16-bit, so I think you are counfusing it with
something else.
For Unicode console output, CString::Format, wprintf or wcout are equivalent,
the limitations belong to the console.
On the other side, I agree that if you don't need cross-platform code,
CString::FormatMessage is better, because it does allow to change the order
of the placeholders without changing the order of the parameters.
This is good for internationalization.
CString::Format and printf are equivalent, and cout (iostream) is a crap,
which encourages concatenation.
===================================================================
-------------------------------------------------------------------
C++ die-hards will complain here, and I know iostream is type-safe (unlike
the other options). But is a mess to use for internationalization.
If you really want "C++ purity", then you can go with boost format library.
And before attaking me on this, read the two topics on string formatters in
"Exceptional C++ Style" :-)
-------------------------------------------------------------------
Mihai:

I have to say that this thread seems to be confusing outputting to a
console window with formatting strings.

To format a string, in a non-Unicode app, the following are alternative
methodologies:

CString::Format()
sprintf
std::ostringstream

and for a Unicode app

CString::Format()
swprintf
std::wostringstream

If we make the typedef

typedef std::basic_ostringstream<TCHAR> tostringstream;

then we have generic alternatives

CString::Format()
_stprintf
tostringstream

What is better about any of these three as opposed to the others?
Personally, I see no reason to choose non-portable code in a pure
console application, so I would always go the iostream route. I do not
have the book "Exceptional C++ style"; what does it have to say here? I
can see that tostringstream might lose on efficiency (usually not an
issue), but why is it worse for internationalization?

The question of writing wide characters to the console is a separate
question, I think. I have never written a console app using wide
characters, so I have no opinion. But I have to say I am shocked if
Win2000/XP cannot output a correctly formatted wide-character string to
the Console, using either wcout or wprintf. Referring to your other
post, what does backward compatibility have to do with this? Apart from
the UTF-16/UCS-2 issue, there has only ever been one kind of wide
character string in NT/2000/XP.

David Wilkinson
Mihai N.
2006-01-16 09:32:28 UTC
Permalink
Post by David Wilkinson
I have to say that this thread seems to be confusing outputting to a
console window with formatting strings.
True. Unicode and the Windows console is a tricky business.
And my post was more like a answer to "use wcout" (Vivek)
and "use CString::Format" (joe).
None of them is a solution, because the problem is the console, not the API.
As a side-note CString::Format does not even use Unicode, it is a generic
API.
Post by David Wilkinson
What is better about any of these three as opposed to the others?
Personally, I see no reason to choose non-portable code in a pure
console application, so I would always go the iostream route. I do not
have the book "Exceptional C++ style"; what does it have to say here? I
can see that tostringstream might lose on efficiency (usually not an
issue), but why is it worse for internationalization?
I know how generic text/API works :-) (http://www.mihai-
nita.net/20050306b.shtml)

=======================================================

I will add CString::FormatMessage, and boost::format, then I will do the same
thing with all 5 APIs:
- load a string from resources (or message catalog, or whatever, I am using a
"generic" thing, GetTranslation)
- replace parameters
- display it
You tell me which gives the bigger mess.

The sting should have two parameters, something like this:
"You should click Ok to %s %d files"
and let's assume this needs to change into Yoda-speak to something like this:
"FILES %d TO %s CLICK OK YOU SHOULD"
And there are real languages that require such reordering.

=== iostream ===

string action = "delete";
int ncount = 12;
cout << "You should click Ok to " << action << " " << ncount << " files";

To localize:
IDS_CLICKOK = "You should click Ok to "
IDS_ACTION_DELETE = "delete"
IDS_ACTION_FILES = "files"

string s1, s2, s3, s4;
GetTranslation( s1, IDS_CLICKOK );
GetTranslation( s2, IDS_ACTION_DELETE );
GetTranslation( s3, IDS_ACTION_FILES );
cout << s1 << s2 << " " << %d << s3;

But in Japanese you should have no spaces, so this is wrong.
And in Yoda-speak the order changes completely, so this is wrong again.
And the translator should translate bits and pieces of a sentence, quite a
mess.
And how ugly and difficult to read if you need some special output (like
printing the number in hex, or controlling precision), because then you also
have to output control flags. Plus that the flags are not under the control
of the localizer (for instance currenty values need 2 decimals in most
locales, but 3 digits are used for most Arab countries).

=== printf ===

printf( "You should click Ok to %s %d files", action, ncount );
Much more readable.
To localize:
IDS_CLICKTOACT = "You should click Ok to %s %d files"
IDS_ACTION_DELETE = "delete"

string msg, action;
GetTranslation( msg, IDS_CLICKTOACT );
GetTranslation( action, IDS_ACTION_DELETE );
printf( msg, action, ncount );

In Japanese you should have no spaces, and the translator can remove them.
In Yoda-speak the order changes completely, so this is still wrong.
And the translator should translate almost full sentence, good.
And is easy read and control the format because the control flags are part of
the string.

=== CString::Format ===

It is pretty much like printf, just loading the localized string is a bit
easier:

CString msg;
CString action( IDS_ACTION_DELETE );
msg.Format( IDS_CLICKTOACT, action, ncount );

Same drawbacks otherwise.

=== CString::FormatMessage ===

Ideal!

To localize:
IDS_CLICKTOACT = "You should click Ok to %1!s! %2!d! files"
IDS_ACTION_DELETE = "delete"

CString msg;
CString action( IDS_ACTION_DELETE );
msg.FormatMessage( IDS_CLICKTOACT, action, ncount );

In Japanese you should have no spaces, and the translator can remove them.
And the translator should translate almost full sentence, good.
And is easy read and control the format because the control flags are part of
the string.

But this solves the Yoda-speak:
IDS_CLICKTOACT = "FILES %2!d! TO %1!s! CLICK OK YOU SHOULD"
IDS_ACTION_DELETE = "DELETE"

This works with the same code as above. Nice!

=== boost::format ===

Second best after CString::FormatMessage, especially if you have to be cross-
platform (and if you don't care about the funny % between parameters :-)

IDS_CLICKTOACT = "FILES %2!d! TO %1!s! CLICK OK YOU SHOULD"
IDS_ACTION_DELETE = "DELETE"

string msg, action;
GetTranslation( msg, IDS_CLICKTOACT );
GetTranslation( action, IDS_ACTION_DELETE );
cout << boost::format( msg % action % ncount );

In Japanese you should have no spaces, and the translator can remove them.
In Yoda-speak the order changes completely, possible, so is ok.
And the translator should translate almost full sentence, good.
And is easy read, but not to control the format, because the control flags
are stil hard-coded.

=======================================================

Sutter compares sprintf, snprintf, stringstream, strstream,
boost::lexical_cast (he does not care about platform api :-)
No clue why he ignores boost::format.

His analisys takes 16 pages, in two parts, so you can imagine it is quite
serious. He looks at ease of use, code clarity, standard or not (C90, C99,
C++03, C++0x), efficient, length-safe, type-safe, usable with templates.

And he does not even look at internationalization!

His conclusion:
- to convert a value a to a string => boost::lexical_cast
- simple formatting => stringstream, strstream
"the code will be more verbose and harder to grasp"
- for more complex formatting => snprintf
- never sprintf

=======================================================
Post by David Wilkinson
Referring to your other
post, what does backward compatibility have to do with this?
The console is used to emulate the behavior of the old DOS prompt.
And that was not Unicode, and used the OEM code page.
This is why the actual console uses OEM code page as default, even in the
latest Windows versions.

And since I don't work at MS, I am not sure what is the real reason for
proper Unicode support not being added to the console: "backward
compatibility" and "who cares about console anymore".
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
David Wilkinson
2006-01-16 15:05:35 UTC
Permalink
Post by Mihai N.
Post by David Wilkinson
I have to say that this thread seems to be confusing outputting to a
console window with formatting strings.
True. Unicode and the Windows console is a tricky business.
And my post was more like a answer to "use wcout" (Vivek)
and "use CString::Format" (joe).
None of them is a solution, because the problem is the console, not the API.
As a side-note CString::Format does not even use Unicode, it is a generic
API.
Post by David Wilkinson
What is better about any of these three as opposed to the others?
Personally, I see no reason to choose non-portable code in a pure
console application, so I would always go the iostream route. I do not
have the book "Exceptional C++ style"; what does it have to say here? I
can see that tostringstream might lose on efficiency (usually not an
issue), but why is it worse for internationalization?
I know how generic text/API works :-) (http://www.mihai-
nita.net/20050306b.shtml)
=======================================================
I will add CString::FormatMessage, and boost::format, then I will do the same
- load a string from resources (or message catalog, or whatever, I am using a
"generic" thing, GetTranslation)
- replace parameters
- display it
You tell me which gives the bigger mess.
[snip]

Thanks Mihai. I see now why the printf type API's are better than
iostream for internationalization: it is because the format string
itself can be internationalized.
Post by Mihai N.
Post by David Wilkinson
Referring to your other
post, what does backward compatibility have to do with this?
The console is used to emulate the behavior of the old DOS prompt.
And that was not Unicode, and used the OEM code page.
This is why the actual console uses OEM code page as default, even in the
latest Windows versions.
And since I don't work at MS, I am not sure what is the real reason for
proper Unicode support not being added to the console: "backward
compatibility" and "who cares about console anymore".
I didn't know console used OEM codepage! But I guess my console projects
only use ASCII characters, so it doesn't bother me too much. Maybe the
8-bit interface is required to be this way for backward DOS
compatibility, but I am still shocked that the NT-type platforms do not
have proper wide-character support for the console.

Thanks again.

David Wilkinson
Mihai N.
2006-01-17 03:11:56 UTC
Permalink
Post by David Wilkinson
I didn't know console used OEM codepage! But I guess my console projects
only use ASCII characters, so it doesn't bother me too much. Maybe the
8-bit interface is required to be this way for backward DOS
compatibility, but I am still shocked that the NT-type platforms do not
have proper wide-character support for the console.
Even worse. The console thouls be able to accomodate all ancient programs,
some of them writing directly to the video memory, or using old BIOS
interrupts. And all that was 8 bits.
We can only hope Monad will solve this (and before it gets to version 3 :-)
Post by David Wilkinson
Thanks again.
Glad if I can help :-)
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Norman Diamond
2006-01-17 08:47:03 UTC
Permalink
The console thouls be able to accomodate all ancient programs, some of
them writing directly to the video memory, or using old BIOS interrupts.
And all that was 8 bits.
Some of that was 8 bits and some of that was 16 bits.

In some computers made by NEC, Fujitsu, Panasonic, etc., even the BIOS setup
screens use 16-bit characters.
Mihai N.
2006-01-18 04:15:49 UTC
Permalink
Post by Norman Diamond
Some of that was 8 bits and some of that was 16 bits.
In some computers made by NEC, Fujitsu, Panasonic, etc., even the BIOS setup
screens use 16-bit characters.
Are we talking about PC and MS-DOS here?
I remember that there was a Japanese version of DOS for NEC systems (Windows
version, up to 3.x).
But I guess it is a bit much to ask the XP console to emulate that :-)
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Norman Diamond
2006-01-18 07:27:30 UTC
Permalink
"Mihai N." <***@yahoo.com> wrote in message news:***@207.46.248.16...
[Norman Diamond:]
Post by Mihai N.
Post by Norman Diamond
Some of that was 8 bits and some of that was 16 bits.
In some computers made by NEC, Fujitsu, Panasonic, etc., even the BIOS
setup screens use 16-bit characters.
Are we talking about PC and MS-DOS here?
Fundamentally I was talking PC and BIOS, but of course MS-DOS depended very
heavily on BIOSes for such matters and MS-DOS used the same 16-bit
characters.
Post by Mihai N.
I remember that there was a Japanese version of DOS for NEC systems
(Windows version, up to 3.x).
Up to Windows 2000, including Windows 2000 Server. The PC98 architecture is
no longer on the market (except for used ones of course). But my statement
is true of the DOS/V architecture too. In some PCs currently made by NEC,
Fujitsu, Panasonic, etc., even the BIOS setup screens use 16-bit characters.
Post by Mihai N.
But I guess it is a bit much to ask the XP console to emulate that :-)
But the XP console does emulate that. Windows setup (i.e. when installing
Windows) puts file BOOTFONT.BIN in the root of the C:\ drive, together with
NTLDR, NTDETECT.COM, and BOOT.INI. By the way that makes it fun if you
install an English language Windows version from MSDN, which rewrites NTLDR
and NTDETECT.COM to English-language versions but neglects to delete
BOOTFONT.BIN.
Mihai N.
2006-01-19 07:49:38 UTC
Permalink
Post by Norman Diamond
Post by Mihai N.
But I guess it is a bit much to ask the XP console to emulate that :-)
But the XP console does emulate that.
You mean it emulates the NEC style of BIOS and DOS? I would be surprised.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Norman Diamond
2006-01-19 10:19:33 UTC
Permalink
Post by Mihai N.
Post by Norman Diamond
Post by Mihai N.
But I guess it is a bit much to ask the XP console to emulate that :-)
But the XP console does emulate that.
You mean it emulates the NEC style of BIOS and DOS? I would be surprised.
Not most of that, but it displays 16-bit characters on the screen even
before the kernel starts, just as the BIOS can do and as DOS used to do.

Oh I see, I should check if the pseudo-BIOS that XP emulates for NTVDM
supports those BIOS interrupts. Or maybe I should check if the real BIOS
supports 16-bit character display in those interrupts, as the real BIOS does
for itself. Wow I wish I had time to play with things like this.
Norman Diamond
2006-01-16 04:01:52 UTC
Permalink
[Joe Newcomer:]
Post by Joseph M. Newcomer
Why use something as dead-obsolete as wprintf anyway?
Use CString::Format instead.
wprintf was a kludge for 16-bit Windows and is maintained only for
backward compatibility with obsolete code.
wprintf is ANSI C, not obsolete.
Yes.
It deals with Unicode strings, which where not even there in Windows
16-bit, so I think you are counfusing it with something else.
But you're equally confused. wprintf dealt with wide characters before
Unicode was invented. Somewhere around 12 years ago Microsoft redesigned
the use of wide characters in Windows so that they would be encoded in
Unicode instead of a simple arithmetic calculation from the bytes that made
up their multibyte encodings. In Windows 16-bit, and in Unix since
somewhere around 25 years ago, wprintf and other wide functions handled wide
characters with their original meanings.
Mihai N.
2006-01-16 08:34:10 UTC
Permalink
Post by Norman Diamond
But you're equally confused. wprintf dealt with wide characters before
Unicode was invented.
...
Post by Norman Diamond
Somewhere around 12 years ago Microsoft redesigned
the use of wide characters in Windows so that they would be encoded in
Unicode instead of a simple arithmetic calculation from the bytes that made
up their multibyte encodings.
You are right. In fact wchar_t can even be 8 bits, according to the standard.
Quite a crap, if you ask me.
But is seems that it is serious talk in the C++ standard committee to add a
real wide character type, with size specified and Unicode encoding.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Norman Diamond
2006-01-17 00:43:00 UTC
Permalink
Post by Mihai N.
But is seems that it is serious talk in the C++ standard committee to add
a real wide character type, with size specified and Unicode encoding.
Good news. And I didn't even post to comp.std.c++ 10 years ago to say that
such a thing was needed. I only posted that to comp.std.c.
Loading...