Some rare kanjis are not displayed (font issue?)
  • Hi Kim, all,

    I have noticed that some kanjis are not properly displayed, getting a question mark ? instead. Example:㒵

    I think, but I may be wrong, that it is a font issue: the total characters in the JIS X 0213 standard (my asian fonts are compliant to it) are 11,223, but on the other hand KanjiDic2 has 13108 entries (kanjis), so I think it would be these "extra" kanjis the ones I can not see. (Note however that the kanji in example above has a JIS X 0213-2000 kuten code ¿?).

    Whay do you think it is cause of this? Can you see the kanji in the example correctly?

    Thanks a lot, and great site!
    (You may think that just the jouyou list is enough difficult to learn, and you are right, but I am just curious. ;-) )
  • I can see it on my Windows box.
  • It does not show up on my box.

    I'm running Windows XP Pro SP 2 and Firefox 3. I do not get a question mark however, I get the same symbol I get every time unicode tries to display something which it can't actually display.


    The above shows up as two boxes for me with 34B5 inside.
  • On my Ubuntu box I get a last resort glyph (from the last resort font).

    This character falls under the CJK extension A block.

    The questionmark or boxes are both indicators of a placeholder (noglyph) glyph.
  • I put out a call for some open source fonts supporting CJK Ext A and/or B.

    There's a Chinese font at that should support Ext A. The name is DFSongSd.ttf, about 21 MB.
  • Hi asmodai,

    Your font really helped. Now I can see many more kanjis (like the one in the example) by just installing it.

    There are still some entries from the KanjiDic2 which I can not see, but it looks like some of these are not even considered in (there is no much information about them, such as meanings or readings, just some jis_213 and Unicode codes).
    Some others are considered but not displayed at all like this one: (can you see this kanji in you computer?), but again I doubt we will find these in the world out there. :-)

    Thanks a lot for your help!
  • That's one from the extension B set of CJK within Unicode.

    I think Wen Quan Yi is working on this: (they might even have finished already if On-going Projects is to be believed).
  • Actually, I think Kim has a bug in Jisho for that U+27614 codepoint (and perhaps a lot of others).

    The link to Unihan, for example, gives you this:
    A codepoint of 0 is wrong of course. Also, the td elements are empty, only contain whitespace.
    When I look at my browser displays the kanji without problems.
  • asmodai is right, there is a bug, but it's not mine :)

    MySQL 5.1 and earlier only supports characters in the Basic Multilingual Plane of Unicode and will silently drop any character outside of the BNP on inserts. Which is why you have a hit for it, but nothing where the character should be.

    It has been fixed in MySQL 6, but that is only available as an alpha at this point and I don't know when it will be released.

    I did some tests some time ago and came up with this list of kanji that MySQL will silently drop:
  • Hehe, I tried pasting the the kanji in the above post, but since this forum runs on MySQL with the utf8 charset, they disappear on saving :)
  • I haven't decided yet if I should fix this by switching to PostgreSQL or just wait for MySQL 6 to be released.

    Fortunately the characters affected are rare engough for this not to be a huge problem.
  • Ahhh, right. I totally forgot about MySQL's lame Unicode handling.

    I have sworn to only use SQLite or PostgreSQL for my own stuff.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion