Denshi Jisho — Online Japanese dictionary

Not signed in (Sign In)

The Denshi Jisho Forum is a place for people to help each other with Japanese, and a great way for users to get together and socialize.

Vanilla 1.1.4 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome Guest!
Want to take part in these discussions? If you have an account, sign in now.
If you don't have an account, apply for one now.
    • CommentAuthordotanuki
    • CommentTimeAug 2nd 2008
     # 1

    Hi Kim, all,

    I have noticed that some kanjis are not properly displayed, getting a question mark ? instead. Example:

    http://www.jisho.org/kanji/details/%E3%92%B5

    I think, but I may be wrong, that it is a font issue: the total characters in the JIS X 0213 standard (my asian fonts are compliant to it) are 11,223, but on the other hand KanjiDic2 has 13108 entries (kanjis), so I think it would be these "extra" kanjis the ones I can not see. (Note however that the kanji in example above has a JIS X 0213-2000 kuten code ¿?).

    Whay do you think it is cause of this? Can you see the kanji in the example correctly?

    Thanks a lot, and great site!
    (You may think that just the jouyou list is enough difficult to learn, and you are right, but I am just curious. ;-) )

    •  
      CommentAuthorasmodai
    • CommentTimeAug 3rd 2008
     # 2

    I can see it on my Windows box.

    Yes, I am a bluntly honest type.
    • CommentAuthorTobberoth
    • CommentTimeAug 3rd 2008
     # 3

    It does not show up on my box.

    I'm running Windows XP Pro SP 2 and Firefox 3. I do not get a question mark however, I get the same symbol I get every time unicode tries to display something which it can't actually display.

    㒵㒵

    The above shows up as two boxes for me with 34B5 inside.

    •  
      CommentAuthorasmodai
    • CommentTimeAug 4th 2008
     # 4

    On my Ubuntu box I get a last resort glyph (from the last resort font).

    This character falls under the CJK extension A block.

    The questionmark or boxes are both indicators of a placeholder (noglyph) glyph.

    Yes, I am a bluntly honest type.
    •  
      CommentAuthorasmodai
    • CommentTimeAug 4th 2008
     # 5

    I put out a call for some open source fonts supporting CJK Ext A and/or B.

    There's a Chinese font at http://glyph.iso10646hk.net/english/download.jsp that should support Ext A. The name is DFSongSd.ttf, about 21 MB.

    Yes, I am a bluntly honest type.
    • CommentAuthordotanuki
    • CommentTimeAug 4th 2008
     # 6

    Hi asmodai,

    Your font really helped. Now I can see many more kanjis (like the one in the example) by just installing it.

    There are still some entries from the KanjiDic2 which I can not see, but it looks like some of these are not even considered in www.jisho.org (there is no much information about them, such as meanings or readings, just some jis_213 and Unicode codes).
    Some others are considered but not displayed at all like this one: http://www.jisho.org/kanji?rt=jap&reading=&mt=en&meaning=&ct=ucs&code=27614 (can you see this kanji in you computer?), but again I doubt we will find these in the world out there. :-)

    Thanks a lot for your help!

    •  
      CommentAuthorasmodai
    • CommentTimeAug 5th 2008 edited
     # 7

    That's one from the extension B set of CJK within Unicode.

    I think Wen Quan Yi is working on this: http://wqy.sourceforge.net/cgi-bin/enindex.cgi (they might even have finished already if On-going Projects is to be believed).

    Yes, I am a bluntly honest type.
    •  
      CommentAuthorasmodai
    • CommentTimeAug 10th 2008
     # 8

    Another one might be Han Nom: http://vietunicode.sourceforge.net/fonts/fonts_hannom.html

    Yes, I am a bluntly honest type.
    •  
      CommentAuthorasmodai
    • CommentTimeAug 10th 2008
     # 9

    Actually, I think Kim has a bug in Jisho for that U+27614 codepoint (and perhaps a lot of others).

    The link to Unihan, for example, gives you this: http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0&useutf8=true
    A codepoint of 0 is wrong of course. Also, the td elements are empty, only contain whitespace.
    When I look at http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0x27614&useutf8=true my browser displays the kanji without problems.

    Yes, I am a bluntly honest type.
    •  
      CommentAuthorKim
    • CommentTimeAug 23rd 2008 edited
     # 10

    asmodai is right, there is a bug, but it's not mine :)

    MySQL 5.1 and earlier only supports characters in the Basic Multilingual Plane of Unicode and will silently drop any character outside of the BNP on inserts. Which is why you have a hit for it, but nothing where the character should be.

    http://dev.mysql.com/doc/refman/5.1/en/charset-unicode.html

    It has been fixed in MySQL 6, but that is only available as an alpha at this point and I don't know when it will be released.

    http://dev.mysql.com/doc/refman/6.0/en/charset-unicode.html

    I did some tests some time ago and came up with this list of kanji that MySQL will silently drop:

    I make Denshi Jisho.
    •  
      CommentAuthorKim
    • CommentTimeAug 23rd 2008
     # 11

    Hehe, I tried pasting the the kanji in the above post, but since this forum runs on MySQL with the utf8 charset, they disappear on saving :)

    I make Denshi Jisho.
    •  
      CommentAuthorKim
    • CommentTimeAug 23rd 2008
     # 12

    I haven't decided yet if I should fix this by switching to PostgreSQL or just wait for MySQL 6 to be released.

    Fortunately the characters affected are rare engough for this not to be a huge problem.

    I make Denshi Jisho.
    •  
      CommentAuthorasmodai
    • CommentTimeAug 27th 2008
     # 13

    Ahhh, right. I totally forgot about MySQL's lame Unicode handling.

    I have sworn to only use SQLite or PostgreSQL for my own stuff.

    Yes, I am a bluntly honest type.