Thursday, 7 August 2014

common kanji : a free plain text with 875 characters for KanjiRecog exercises


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

After removing those not in KLC or the old 2,500 list of common newspaper kanji, we are left with a file of 875 characters with MANY visible duplicates.

Can you find a row with only ONE pair of duplicates ?

Can you scan slowly for a row with duplicates none of which is known to you ? Never before seen or just an unknown meaning or reading ?

弥哀徳尊庶霊沼鳥飛歴魚沼満跳故恭遊后菜培実質恭傍華
段階妥推弥呉橋斉斉済帰畔弥呉橋根欄橋蘇我馬珍評判録
掘済崇否争崇蘇我飛鳥蘇我済推創浮鳥舞軸線溶鳥隠背岳
雄御身得示能測隆伊賀掘遺跡城之越遺跡遺跡指保護遺跡
属落状護岬屈添改武悲舎残荒掘受継録浜荒幹瀬頃追憶努
展先駆示飛鳥城跡掘得知昭掘査城条坊坪最幅細屈底玉掘
催線複雑湾底玉縁奈濃緑紫層属盆縁層質冷院朱雀院院残
富湧貯巧往姿郊離別荘頃覚離遺貴遺曽跡貴寝普遍寝寝遣
院塩釜松浮条院丹橋線奈受継催情緒換活絵絵漢詩仮院毛
越極拝架橋渡御越詳俊綱残割遣秘得張貫乞求求克遣展先
駆示情緒遣筋身父院藤頼藤頼専武打頼鎌倉受継永福頼尊
毛越無院精舎荘激死弟藤泰衡将鎮魂鎌倉育委員階阿弥薬
藍遺認眼掘査継続約掘果藍徐鎌倉資領荘得第巨富富御臨
釣松訪藤瑠澄比類激増鏡眺望巧将軍足満荘譲受拡荘第層
楼閣舎閣閣望楼閣視橋閣往能竜鏡湖丸極難足劣満洞御迎
幸仰満死鹿閣際破壊放職章承鎌倉僧隆盛墨詩院院院狭凝
院院余刈段橋架横橋浮照夢窓疎輩夢窓疎愛芳傑測知里似
夢窓疎遊苦録残他芸匹最峰夢窓疎芳鎌倉帰僧蘭渓隆徳院
養院座視智院衆湯飲器客湯寄屋湯客玄別専細座向機能趣
待庵訓郡官休庵将軍城屋際遊遊盛屋遊遊照兼栗香松趣熊
熊屋随壊状郎職務余暇録資収退職究励収資余単究伊勢栗
催展身故郷桑松信華究漢漢詩晩詩激争展身横芸阿弥藤紹
衷渋栄庸菱親睦網別慶雲朋無庵湖芝雲荘坂浜慶雲城亘郎
各受継扇湖荘郎座掲匠視展掲眺望視線類趣強飾郎津荘継
昭頃寅郎推雑継急速雑全強類求運搬容易照項雑保鉄狭浮
活改善運動視覚評機能視視打活改善盟綱領項協協究宇精
執筆保奇屋紹究全測昭院院批判打兼雪準慮指摘兼茨城別
照毛越郡院郡駒根乗谷倉福福芳恵指梨甲妙退蔵院曹徳院
鹿閣徳院慈照銀閣徳院乗院奈奈養院庫粉竹知知院粉福芳
徳院玉養福福福福賀青滋賀軒蘭滋賀浜玄滋賀根滋賀津離
院離宝院願院条城丸御院徳詩松根城丸紅渓養音院鳥鳥徳
城徳城御閣徳徳松玄栗条城城丸紅渓音院渉慶雲滋賀浜無
庵雲荘慶阪阪荘依奈奈裏妙庫登録念最登録件温荘潮遊個
最無慶温荘昭福松尾松城根阪根足根根専誌念照欧鼓橋奔
放制職招制幕摩動伊郎雇革真真視覚販売際第遊遊盛屋遊




Wednesday, 6 August 2014

Kanji duplicate reduction script


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

At 975 characters left in the file (fewer than 40 rows of 25), it is time to consider removing those kanji that are neither frequent in use nor rated 'general use'. But that requires a software script ... or an app ! 

弥哀徳尊庶霊沼鳥飛歴魚沼満跳故恭遊后蔬菜培実質允恭
傍櫻華段階妥推弥呉橋斉斉済帰畔弥呉橋根欄橋蘇我馬嶋
珍評判録坦掘済崇否争崇蘇我飛鳥蘇我済推創浮鳥舞軸線
溶鳥隠背岳雄御身得示能測澤隆伊賀掘遺跡城之越遺跡遺
跡指保護遺跡属箇涌落涌状護岬屈添改武悲舎残荒橘掘受
継録浜荒幹瀬頃追憶努展先駆示飛鳥城跡掘得知昭掘査城
条坊坪最幅細屈底玉掘催汀線複雑湾底玉縁奈濃緑紫層属
盆縁層質冷院朱雀院淳院残富湧貯巧往姿郊離別荘頃嵯峨
覚嵯峨離遺貴遺曽跡貴寝普遍寝寝遣院塩釜松浮条院丹橋
線奈受継莫催情緒換活絵絵漢詩仮院毛越極拝架橋渡御曼
越詳橘俊綱残割遣秘得張貫乞求求克遣展先駆示情緒遣筋
身父院藤頼藤頼専武打頼鎌倉受継永福頼尊毛越無院精舎
荘激死弟藤泰衡将鎮魂鎌倉育委員階阿弥陀薬伽藍遺認眼
掘査継続約掘果伽藍徐鎌倉卿資領荘得第巨富富御臨釣松
訪藤瑠璃澄比類激増鏡眺望巧将軍足満荘譲受拡荘第層楼
閣舎閣閣望楼閣俯瞰視橋閣往能竜瀑鏡湖丸極難足劣満洞
御迎幸仰満死鹿閣際破壊放職鳳章承鎌倉僧隆盛宋墨詩院
院院狭凝院院余刈段橋架堰横橋浮照夢窓疎輩夢窓疎愛芳
傑測知里似夢窓疎遊苦録残他芸匹最峰夢窓疎芳龍瑞鎌倉
帰僧蘭渓隆徳院龍養院座視智院堺衆湯飲器客湯寄屋湯客
玄別専細座向機能趣待庵府訓郡官休庵将軍城屋際遊廻遊
盛屋遊遊照兼栗香松趣熊熊屋随壊状圭郎職務余暇録資収
退職究励収資余単究伊勢栗催展身故郷桑松信九華究漢詣
漢詩晩詩激争展身横芸阿弥藤紹衷渋栄庸菱親睦網別慶雲
縣朋無鄰庵琵琶湖疏芝碧雲荘坂浜慶雲甥城亘郎各受継扇
湖荘郎座掲匠視展掲眺望視線類趣強飾郎津蘆荘継昭頃寅
郎推雑継急速雑全強類求運搬容易照項雑保鉄狭浮活改善
運動視覚評機能視視打活改善盟綱領項協協究宇精執筆保
也奇屋紹究全測昭院院批判打兼雪準慮指摘兼偕茨城別照
毛越磐郡院磐郡駒根乗谷倉福福芳苔府龍府恵指梨甲妙退
蔵院府龍曹府徳院府鹿閣府徳龍院府慈照銀閣府圓徳院府
府乗院奈奈養院庫粉竹知知院粉福芳徳龍院玉養浩福福福
福敦賀青滋賀軒蘭滋賀浜玄滋賀彦根滋賀津桂離府院離府
醍醐宝院府願院府条城丸御府府院府徳府詩府府松府幡根
城丸紅渓養翠音院鳥鳥徳城徳城御閣徳徳松玄栗条城城丸
紅渓音院渉慶雲滋賀浜無鄰庵府府府碧雲荘府慶阪府阪荘
府依奈奈裏妙庫登録念最登録件温荘潮遊個最無鄰菴慶温
荘昭福府松尾松府城府根阪府堺堺根足根根専誌念照欧鼓
橋篭奔放制職招制幕府薩摩動伊郎雇革真真視覚販売際第

The duplicate kanji are VERY visible now. 

is one example in the row

紅渓音院渉慶雲滋賀浜無鄰庵府府府碧雲荘府慶阪府阪荘

In the almost 40 rows above it, occurs 33 times. Run through the lines again. Do you start to see them ? Try separating the lines with a blank line. Try shortening the lines.

You could first remove all '\n' linefeeds, and then use a regexp such as

FIND  expression :   (..........)
REPLACE expr:     \1\n

which in Notepad++ will give almost 100 rows of 10 characters for the file above. It says "for every 10 characters, return that selection followed by a linefeed."

The one ( 1 ) in the case above refers to the first expression in parentheses.  If we'd had a second, it would have been \2.

What features make complex kanji appear to be the same ? Are some easily confused if not next to each other ?

Funny, but when I read 鳥鳥 side-by-side I just KNOW that is not horse ! But seen alone, I can be unsure ... horse or bird ? Crow ?




kana-free Text 3 reduced to 1,125 kanji


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

These are the end rows of the text file when we remove duplicates to the point at which the character count reaches 1,125 kanji :

浮活改善運動住視覚評機能視視打活改善盟綱領項協協究
宇精執筆保也奇屋紹究全測昭院院主批判打兼雪準江慮指
摘兼偕茨城著別照毛越磐郡旧院磐郡光駒根乗谷倉福福芳
苔府龍丈府右恵林指梨甲州妙退蔵院府右龍曹府右徳院府
鹿閣府徳龍院丈府慈照銀閣府圓徳院府巴府旧乗院奈良奈
良養院庫粉竹林知知旧院粉福芳徳龍院玉養浩福福福福敦
賀青滋賀軒蘭滋賀浜玄滋賀彦根滋賀津桂離府院離府醍醐
宝院府願院府条城丸御府丈府院府徳丈府詩府府松府幡根
城丸紅渓養翠音院鳥鳥徳城旧徳城御千閣徳徳松玄栗林条
城城丸紅渓音院渉慶雲滋賀浜無鄰庵府府府碧雲荘府慶阪
府阪荘府依奈良奈良裏千又妙庫登録念最登録件温荘潮遊
個最無鄰菴慶温荘昭福丈府松尾松府城府根阪府堺堺根足
根根専誌念照欧鼓橋篭奔放制職招制江幕府薩摩動伊郎雇
革真真視覚販売児玉英際第復刻談復刻刊英諸採旅惹経験
著協役割果龍松助究育傍著英訳紹第姉妹衆愛専誌項幽玄

Many, many duplicates now leap out at you, right ? Do you recall any mnemonic for them or for some related kanji ? A topic ? A story ?

Grab two rows from the file and try to say by scanning along each row if the row has duplicates found in the other row. Take a moment. Look forward. Glance back ( we often have to glance back when learning to read in a second or third or distantly-related script !)

These two rows are easy :

府阪荘府依奈良奈良裏千又妙庫登録念最登録件温荘潮遊
個最無鄰菴慶温荘昭福丈府松尾松府城府根阪府堺堺根足

These two rows are at the top of my practice file :

弥哀徳尊庶霊沼鳥飛歴魚沼満跳故恭遊后蔬菜培実質允恭
傍櫻華段階妥推弥呉橋斉斉百済帰畔弥呉橋根欄干橋蘇我

For me, this morning, I don't recognize 蔬 ... BUT : I did note its features in comparison to 4 other kanji in these last 2 rows and THAT is what I need to develop : the capability to note features, respond to features, aware or unaware as I may be that these are distinctive for my reading. As a professor of mine once wrote, " to be is to be distinguished."