Two improvement requests for the "Kanji by radicals" page
  • mongrelmongrel July 2009

    First, is it possible to provide an option to display in the found kanji area, below, ONLY the Jouyou kanji? I know that the Jouyou and other kanji are colored with different shades of blue, but it still very distracting when you are looking for a relatively common kanji.

    Second, is it possible to create a similar page where the radicals are arranged not by their stroke order but by their frequency? This way, for those who want it, it may be faster to track down a frequently used radical at the top of the table then to memorize the stoke number of all these radicals or even their location in the table.

    Thanks.

  • I've done some code-slinging in my day :^)
    First thing is possible for sure. Probably wouldn't be too hard to do. I'd like to see that too.

    The second thing depends on whether that frequency information is contained within the unicode that is
    used to encode a kanji in the computer (everything in a computer is broken down into numbers in the long run, a 26 letter, ordered alphabet is really easy to assign numbers to).
    So a kanji like 明 might be like 02F377A1 or something like that in hexadecimal encoding.
    I don't really know the specifics of how kanji are encoded, but I'd imagine a mess of kanji is much more difficult to order so the radical that is used in old fashioned dictionaries probably also appears as some kind of pattern in the number. So maybe all kanji with the radical 日 have the first few unicode numbers / letters in common. Using the previous example 02F3XXXX might refer to the family of Kanji with 日 (such as 時、曜、明 etc) because they all have the first few numbers in common. Whether the rest of the numbers contain any info about frequency of the radical occurring is unknown to me. If not, then that frequency information would have to be obtained and inserted into the software manually in order to sort the radicals by frequency.
    My guess is that #2 won't happen any time soon, but you'd have to ask Kim directly :^)

  • TobberothTobberoth July 2009

    No, frequency is definitely not encoded in unicode. However, there are TONS of various frequency lists for Japanese and Chinese on various corpuses, so finding the general frequency of each radical shouldn't be hard.

  • That's what I thought. Too bad. Yeah I imagined finding the frequency wouldn't be too hard in this day in age, but manually entering all that crap into your program is what is a pain in the ass ;^) Programmers usually hate doing that. If it were stored in a file in tabular form (or xml or something), and easy to process with simple file i/o that would be awesome. Come to think of it though, if my previous guess of how Kanji is stored in unicode is correct, the frequency could just be obtained through calculations on the dictionary from within the software (i.e. create a double dimension array that stores a sequence that represents a radical, and the number of times it occurred). That could be expensive and time consuming for radicals that appear in many Kanji. I'm not sure. Also that way, you'd get the real frequency of radicals in the dictionary you have on hand, not how many exist in the Japanese language. However, that's a more accurate thing to display to the user, since the frequencies would reflect the true limits of the dictionary.

    Also, "kanji by radicals" is a pretty foreigner friendly program. The pieces you click do not have to actually comprise a true radical.
    Although 宀 is not the radical for 案, 案 still appears in the list when 宀 is clicked.
    I was doing an elementary school kanji drill book question with "choose the radical" and got tripped up on that one. The radical isn't always what you (or even your Japanese friends! :-P) might think. So really it's "kanji by pieces that appear in that kanji you're looking for" Of course If it were as strict as a 国語 teacher in Japan, it wouldn't be very useful to beginners. If it isn't already available, finding out the frequency of those individual pieces (i.e. all the buttons you can press on 'kanji by radicals') might be a major pain in the ass.

  • TobberothTobberoth July 2009

    Yeah, it should technically be renamed to kanji by 部首 (and not even that is entirely correct I think) but just like having only the read radicals, it wouldn't be as useful for beginners who have no idea what 部首 means.

  • mongrelmongrel July 2009

    "Also, "kanji by radicals" is a pretty foreigner friendly program. The pieces you click do not have to actually comprise a true radical.
    Although 宀 is not the radical for 案, 案 still appears in the list when 宀 is clicked."

    Yes, that's exactly my point (if I may interject); "kanji by radicals" could be made REALLY useful if the ability to search by - let's call them "parts" for this purpose - were implemented across the board and the traditional radicals just abandoned.

    As an example, the fact that 黹 is a radical is - to me as a foreign aspiring kanji student - totally useless since scanning the table trying to track it down all the way to the bottom is time consuming; one doesn't know, at the start of the search in the table, that this kanji is itself a radical, so the searcher tries to break 黹 into (what he guesses to be) its component "parts" and first tries to locate these imaginary parts in the table, click and reset the table multiple times before finally reaching the 12 stroke row and - if he's sufficiently sharp-eyed - realizing that the radical for 黹 is itself 黹! And if, heaven forbid, the kanji he is looking for is 黼 - of which 黹 is a radical - it's highly likely that by that time the user will miss that fact and not be able to identify it.

    The solution, IMHO, is to break kanji down to their most primitive "parts" and arrange the table, if not by the frequency of these "parts" (if that's too difficult to do) then at least by their stroke count, complexity or even just their appearance. For example, 品 is a kanji that (unless I'm missing something obvious) cannot be found by using the table, except as it's own 9-stroke radical. If you click on the 3 stroke radicals 口 or 囗 (or both), 品 does NOT come up as an option. Yet it's clear that if one could click on "parts" that looks like the upper and/or lower rectangles, 品 would have easily come up as an option. Similarly, when you click on 月 you should get 用 as an option; even better, a "part" which looks like a hollowed-out 月(that is, with the two lower horizontal lines removed) should be available for clicking and, when clicked, display both 月 and 用 as options; you should then be able to click |and eliminate 月.

    In an ideal world - that's what a "kanji by parts" table (regardless of the order of appearance of parts) should look like.

  • TobberothTobberoth July 2009

    I don't really agree with mongrels opinion. While I agree that it's awesome to make the radical table as newbie friendly as possible, I think it's important to keep it "professional". If you're good enough at Japanese to know that 黹 is a radical and you need it to find a certain kanji, you shouldn't be held back by those who don't. In the case of the name, I don't see a problem. Most of us here realize that "radical" is not the proper term for what is supplied, but it doesn't matter to us when we use it. If radicals we know exist aren't in the table however, we will be annoyed and stop using it.

    I guess what I'm saying is, is keep it as newbie friendly as possible without making it harder to use for those with more knowledge. Make it logical, but don't dumb it down.

  • RichardRichard July 2009

    I agree with Tobberoth, and I think it could get confusing if you included parts that aren't actually radicals. Learning what radicals exist is not too hard a task, but I remember from my paper dictionary days that the designated radical for a kanji is often infuriatingly arbitrary. I think the balance is about right just now.

  • mongrelmongrel July 2009

    I have to respectfully disagree with both of you - I don't see why making things considerably simpler for those who want or need it should be considered "dumbing down" or "unprofessional".

    Be that as it may, the more important question is - why not have both? There's unlimited space on the net - you could keep the current "kanji by radicals" page and add a "kanji by parts" page and let the users choose them in accordance with their perceived needs. This way, both rank amateurs and seasoned professionals could have their cakes and eat them, too...

  • paulusmaximuspaulusmaximus August 2009

    That's kind of a cool idea mongrel has. I have run into that type of problem mongrel was describing with 口 and 品 before. I think the current system is a decent mix of "Kanji, narrowed down by what pieces (you think) appear" and also "old infuriating arbitrary radical look-up" If there was a noob version (like mongrel wants) for beginners, the current version for intermediates, and something that works like a real 国語辞典 for those with high thresholds for pain like those guys studying bungo, that would be awesome. It's all just a matter of writing the software ;-)

    Is it 'dumbing down?' Well on one hand if you can improve something that's inefficient... why not? But on the other, it's like English spelling. Lots of it makes NO sense and takes memorization, BUT if you get good then you look like a master pimp. In this case you'll probably be considered a master pimp nerd in Japan hahah but that's all it takes ;-D

    I don't find the real Japanese system very friendly (even to Japanese people. kids seem to struggle with it) and I'm wondering how it came about. I asked a Chinese guy in my Japanese class about how they describe 部首 and look stuff up in a dictionary in Chinese, and he said it was a bit different from Japanese. He told me they don't even use the suffix "hen" (?!)
    I didn't really get the Chinese system, but I thought it HAS to be similar, so I'll have to ask more next time ;^)

  • mongrelmongrel August 2009

    "It's all just a matter of writing the software ;-)"

    Yes, indeed! That - and creating the "simplified kanji parts" database. Now, interestingly, something like that was done by this guy:

    http://pomax.nihongoresources.com/index.php?entry=1224334063

    He refers to it as "kanji decomposing" - term which I find oddly appropriate :) Anyway, the idea would be to take his "decomposed parts", arrange them in a table (like the "find kanji by radicals" table) according to some logical order (simplicity, frequency, whatever) and use the table to track down the kanji composed of the "decomposed" parts. Unfortunately, my programming skills are not up to this type of challenge (not even close) or I would have gladly done it myself.

    As to the Japanese writing system as a whole - I have to say that (although I'm a fan and a relatively serious part-type student) it is probably the most atrocious system on this earth. If I understand correctly, it is - in some respects - much worse than the various Chinese languages using the same characters. There's a very interesting book covering exactly this topic: Asia's Orthographic Dilemma by William C. Hannas. Highly recommended. If you do read, please let us know what you think.

  • RichardRichard August 2009

    Posted By: mongrel
    [p][p]As to the Japanese writing system as a whole - I have to say that (although I'm a fan and a relatively serious part-type student) it is probably the most atrocious system on this earth. If I understand correctly, it is - in some respects - much worse than the various Chinese languages using the same characters. There's a very interesting book covering exactly this topic: Asia's Orthographic Dilemma by William C. Hannas. Highly recommended. If you do read, please let us know what you think.[/p]


    That's quite a sweeping statement. I was interested enough to go and have a look at Google Books for this title:

    http://books.google.co.uk/books?id=aJfv8Iyd2m4C&dq=Asia%27s+Orthographic+Dilemma&printsec=frontcover&source=bl&ots=iV6ols0fdq&sig=BBOsFN_JqEUUsCq9kobiEwoM-iU&hl=en&ei=3r98SqX0BKXm6gOIzN1A&sa=X&oi=book_result&ct=result&resnum=4#v=onepage&q=japanese&f=false

    However, I wasn't interested enough to read much more than the foreword. After skimming through the Japanese section trying to find out what his point was, I went back to the foreword. This sentence caught my eye: 'Arguing that the Asian cultures based on these systems are therefore in crisis, he builds a particularly strong case for the conclusion that the intractable incompatibility between characters and computers will be a primary factor in the eventual demise of the former.' I have to say I don't see any major incompatibility.

    Incidentally, does he offer any practical suggestions at any point for solving the crisis? I'd be interested to know what solutions he might come up with.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership

In this Discussion