searching for a word in mixed kanji and kana
  • This has probably been asked before, but...
    You can search jisho.org for words in pure kana like "にんぎょうげき"
    or pure kanji like 人形劇, but you can't mix them up. Searching for
    "にん形げき" will get nothing.

    Is there any hope that this limitation might be removed at some point?
  • This is a question that comes up now and then, so it's always in the back of my mind as I'm working on the next verison of the site.

    However it's a very tricky problem to solve, so maybe one day I'll add support for it. But not tomorrow :)
  • As a workaround you can search on Google and it will probably say 'Did you mean xxxx ?' giving you the word to look for on Jisho. Of course, it takes a bit longer.
  • Ok, thanks for the replies.
    Can you outline for me what the difficulties are? Naively, I would think the following algorithm should work for most cases:
    1) For each kanji in the search expression, look up the list of its possible readings. This generates a list of possible hiragana readings of the whole search expression.
    2) Search for each reading on the list. For each dictionary entry that you find, look at its kanji representation, and check that it matches the parts of the original search expression that were given as kanji.
  • While that might work for some words that use standard readings, it could not find words such as "大な" (that is, 大人). I would implement an algorithm like this:

    Given a search string S, such as "大な"
    a) Let A be the set of results of a search for "*な", that is String.Replace(S,Kanji,"*")
    b) Let B be the set of results of a search for "大*", that is String.Replace(S,Kana,"*")
    c) A∩B are the desired search results, that is all results that turned up both in (a) and (b)
    Notes:
    - The wildcard "*" should represent any string, excluding the null string.
    - Multiple wildcards such as "***" should be treated as a single wildcard "*".
    - Or just filter whether the results from the first search match the second search string for speed.

    We can already search on jisho.org using the wildcards "?" (one character) and "*" (multiple characters).
  • @aruberto
    It will not always work. If it was that easy, there would be plenty of solutions already.

    The trick is that kanji are wicked themselves. Such algorithm would be okay only when given that the word exists in dictionary and is appropriate for the input. Think of compound of kanji in which each of a sign has more than 5 readings. Too many loops if it wasn't good constructed.

    You need no professional PHP knowledge to check out your algorithms, so if you get to find some time, try it and maybe you will turn the one that would eventually find a "cure" for this problem :-)
  • Just as an example, with the algorithm I was describing above...

    ...a search for "大な" would yield:
    大穴
    大綱
    大女
    大店
    大人
    大きな
    大横綱
    大旦那
    大陸棚
    大阪しろ菜

    ...a search for "使よう" would yield:
    使用
    使い様

    ...a search for "にん形げき" would yield:
    人形劇

    …a search for "般若はら蜜多" would yield:
    般若波羅蜜多

    ...a search for "か愛い" would yield:
    可愛い
    他愛ない
    可愛らしい
    可愛さ余って憎さ百倍

    As we can see here, the results could be improved by sorting for length. And by filtering out those results where the part of the entry the would correspond to the wildcard(s) when searching for "*愛*", namely "さ余って憎さ百倍 " does include kana not found in what the wildcard(s) would represent in the orginal search string "か愛い", namely "い"; or when it is longer (longer defined by counting kana).

    Yet even the simple algorithm would already produce good result imo.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion