Showing posts with label plain text. Show all posts
Showing posts with label plain text. Show all posts

Monday, 1 September 2014

Adobe Reader PDF Text COPY : kanji + furigana copied


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

The image below is of a PDF open in the latest Adobe Reader ; as you see, the text copy selection also grabs the furigana, producing slightly problematic results as plain text.

Web page copies tend to place the furigana in-line, which is even worse!

A typical result from Adobe Reader follows:

夜の 
ちやう
帳にささめき尽きし星の今を 
げかい
下界 の人の鬢のほつれ

Is the last item furigana or not?

Compare my Curl markup example in this blog post. The MIT Curl browser plugin is available at the Tokyo SCSK site.



Sunday, 31 August 2014

vertical poem with non-selectable furigana


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

The more conventional presentation of my previous post :



Here the applet is running fullscreen in Firefox.
The Japanese poem ( a tanka ) is by Yosano Akiko.



Curl markup for non-selectable furigana in plain text copy


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

The image below shows a poem selected with a mouse swipe WITHOUT the furigana text being selected.



Because Curl poetry markup uses macros, this can be set by a user to copy furigana INSTEAD of the kanji !

The size of the furigana and its visibility can be user options.




Saturday, 30 August 2014

Copy kanji plain text without furigana text


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

Curl for poem formatting : select poem text, but not furigana text.


This allows the user to copy the poem and not the furigana. This can be set as a user preference or option.

If you attempt to copy the HTML poem in my last post, what do you find pasted into your text editor after ?

Curl at Tokyo's SCSK Corp has a Japanese web site here.



Thursday, 7 August 2014

plain text 800 kanji


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

This is my exercise page reduced to 800 kanji of which 360 are unique :

弥哀徳尊庶霊沼鳥飛歴沼満跳故恭遊后菜培実
質恭傍華段階推弥呉橋斉斉済帰畔弥呉橋根欄
橋蘇我馬珍評判録掘済崇否争崇蘇我飛鳥蘇我
済推創浮鳥舞軸線溶鳥隠背岳雄御得能測隆伊
賀掘遺跡城越遺跡遺跡保護遺跡属落状護岬屈
添武悲舎残荒掘受継録浜荒幹瀬頃追憶努展駆
飛鳥城跡掘得知昭掘査城条坊坪最幅細屈底掘
催線複雑湾底縁奈濃緑紫層属盆縁層質冷院朱
雀院院残富湧貯巧往姿郊離別荘頃覚離遺貴遺
曽跡貴寝普遍寝寝遣院塩釜松浮条院橋線奈受
継催情緒換絵絵漢仮院越極拝架橋渡御越詳俊
綱残割遣秘得張貫求求克遣展駆情緒遣筋院藤
頼藤頼専武頼鎌倉受継福頼尊越無院精舎荘激
死弟藤泰衡将鎮魂鎌倉育委員階阿弥薬藍遺認
眼掘査継続約掘果藍徐鎌倉資領荘得富富御臨
釣松訪藤瑠澄類激増鏡眺望巧将軍満荘譲受拡
荘層楼閣舎閣閣望楼閣橋閣往能竜鏡湖極難劣
満洞御迎仰満死鹿閣際破壊放職承鎌倉僧隆盛
墨院院院狭凝院院余刈段橋架横橋浮照夢窓疎
輩夢窓疎愛芳傑測知似夢窓疎遊苦録残他芸匹
最峰夢窓疎芳鎌倉帰僧蘭渓隆徳院養院座智院
衆湯飲器客湯寄屋湯客玄別専細座向機能趣待
庵訓郡官休庵将軍城屋際遊遊盛屋遊遊照兼栗
香松趣熊熊屋随壊状郎職務余暇録資収退職究
励収資余単究伊勢栗催展故郷桑松信華究漢漢
晩激争展横芸阿弥藤紹衷渋栄庸菱親睦網別慶
雲朋無庵湖芝雲荘坂浜慶雲城亘郎受継扇湖荘
郎座掲匠展掲眺望線類趣強飾郎津荘継昭頃寅
郎推雑継急速雑強類求運搬容易照項雑保鉄狭
浮善運動覚評機能善盟綱領項協協究精執筆保
奇屋紹究測昭院院批判兼雪準慮摘兼茨城別照
越郡院郡駒根乗谷倉福福芳恵梨甲妙退蔵院曹
徳院鹿閣徳院慈照銀閣徳院乗院奈奈養院庫粉
知知院粉福芳徳院養福福福福賀青滋賀軒蘭滋
賀浜玄滋賀根滋賀津離院離宝院願院条城御院
徳松根城紅渓養音院鳥鳥徳城徳城御閣徳徳松
玄栗条城城紅渓音院渉慶雲滋賀浜無庵雲慶阪
阪荘依奈奈裏妙庫登録念最登録件温潮遊個最
無慶温昭福松尾松城根阪専誌念照欧鼓橋奔放
制職招制幕摩動伊郎雇革覚販売際遊遊盛屋遊




Wednesday, 6 August 2014

Kanji duplicate reduction script


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

At 975 characters left in the file (fewer than 40 rows of 25), it is time to consider removing those kanji that are neither frequent in use nor rated 'general use'. But that requires a software script ... or an app ! 

弥哀徳尊庶霊沼鳥飛歴魚沼満跳故恭遊后蔬菜培実質允恭
傍櫻華段階妥推弥呉橋斉斉済帰畔弥呉橋根欄橋蘇我馬嶋
珍評判録坦掘済崇否争崇蘇我飛鳥蘇我済推創浮鳥舞軸線
溶鳥隠背岳雄御身得示能測澤隆伊賀掘遺跡城之越遺跡遺
跡指保護遺跡属箇涌落涌状護岬屈添改武悲舎残荒橘掘受
継録浜荒幹瀬頃追憶努展先駆示飛鳥城跡掘得知昭掘査城
条坊坪最幅細屈底玉掘催汀線複雑湾底玉縁奈濃緑紫層属
盆縁層質冷院朱雀院淳院残富湧貯巧往姿郊離別荘頃嵯峨
覚嵯峨離遺貴遺曽跡貴寝普遍寝寝遣院塩釜松浮条院丹橋
線奈受継莫催情緒換活絵絵漢詩仮院毛越極拝架橋渡御曼
越詳橘俊綱残割遣秘得張貫乞求求克遣展先駆示情緒遣筋
身父院藤頼藤頼専武打頼鎌倉受継永福頼尊毛越無院精舎
荘激死弟藤泰衡将鎮魂鎌倉育委員階阿弥陀薬伽藍遺認眼
掘査継続約掘果伽藍徐鎌倉卿資領荘得第巨富富御臨釣松
訪藤瑠璃澄比類激増鏡眺望巧将軍足満荘譲受拡荘第層楼
閣舎閣閣望楼閣俯瞰視橋閣往能竜瀑鏡湖丸極難足劣満洞
御迎幸仰満死鹿閣際破壊放職鳳章承鎌倉僧隆盛宋墨詩院
院院狭凝院院余刈段橋架堰横橋浮照夢窓疎輩夢窓疎愛芳
傑測知里似夢窓疎遊苦録残他芸匹最峰夢窓疎芳龍瑞鎌倉
帰僧蘭渓隆徳院龍養院座視智院堺衆湯飲器客湯寄屋湯客
玄別専細座向機能趣待庵府訓郡官休庵将軍城屋際遊廻遊
盛屋遊遊照兼栗香松趣熊熊屋随壊状圭郎職務余暇録資収
退職究励収資余単究伊勢栗催展身故郷桑松信九華究漢詣
漢詩晩詩激争展身横芸阿弥藤紹衷渋栄庸菱親睦網別慶雲
縣朋無鄰庵琵琶湖疏芝碧雲荘坂浜慶雲甥城亘郎各受継扇
湖荘郎座掲匠視展掲眺望視線類趣強飾郎津蘆荘継昭頃寅
郎推雑継急速雑全強類求運搬容易照項雑保鉄狭浮活改善
運動視覚評機能視視打活改善盟綱領項協協究宇精執筆保
也奇屋紹究全測昭院院批判打兼雪準慮指摘兼偕茨城別照
毛越磐郡院磐郡駒根乗谷倉福福芳苔府龍府恵指梨甲妙退
蔵院府龍曹府徳院府鹿閣府徳龍院府慈照銀閣府圓徳院府
府乗院奈奈養院庫粉竹知知院粉福芳徳龍院玉養浩福福福
福敦賀青滋賀軒蘭滋賀浜玄滋賀彦根滋賀津桂離府院離府
醍醐宝院府願院府条城丸御府府院府徳府詩府府松府幡根
城丸紅渓養翠音院鳥鳥徳城徳城御閣徳徳松玄栗条城城丸
紅渓音院渉慶雲滋賀浜無鄰庵府府府碧雲荘府慶阪府阪荘
府依奈奈裏妙庫登録念最登録件温荘潮遊個最無鄰菴慶温
荘昭福府松尾松府城府根阪府堺堺根足根根専誌念照欧鼓
橋篭奔放制職招制幕府薩摩動伊郎雇革真真視覚販売際第

The duplicate kanji are VERY visible now. 

is one example in the row

紅渓音院渉慶雲滋賀浜無鄰庵府府府碧雲荘府慶阪府阪荘

In the almost 40 rows above it, occurs 33 times. Run through the lines again. Do you start to see them ? Try separating the lines with a blank line. Try shortening the lines.

You could first remove all '\n' linefeeds, and then use a regexp such as

FIND  expression :   (..........)
REPLACE expr:     \1\n

which in Notepad++ will give almost 100 rows of 10 characters for the file above. It says "for every 10 characters, return that selection followed by a linefeed."

The one ( 1 ) in the case above refers to the first expression in parentheses.  If we'd had a second, it would have been \2.

What features make complex kanji appear to be the same ? Are some easily confused if not next to each other ?

Funny, but when I read 鳥鳥 side-by-side I just KNOW that is not horse ! But seen alone, I can be unsure ... horse or bird ? Crow ?




kana-free Text 3 reduced to 1,125 kanji


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

These are the end rows of the text file when we remove duplicates to the point at which the character count reaches 1,125 kanji :

浮活改善運動住視覚評機能視視打活改善盟綱領項協協究
宇精執筆保也奇屋紹究全測昭院院主批判打兼雪準江慮指
摘兼偕茨城著別照毛越磐郡旧院磐郡光駒根乗谷倉福福芳
苔府龍丈府右恵林指梨甲州妙退蔵院府右龍曹府右徳院府
鹿閣府徳龍院丈府慈照銀閣府圓徳院府巴府旧乗院奈良奈
良養院庫粉竹林知知旧院粉福芳徳龍院玉養浩福福福福敦
賀青滋賀軒蘭滋賀浜玄滋賀彦根滋賀津桂離府院離府醍醐
宝院府願院府条城丸御府丈府院府徳丈府詩府府松府幡根
城丸紅渓養翠音院鳥鳥徳城旧徳城御千閣徳徳松玄栗林条
城城丸紅渓音院渉慶雲滋賀浜無鄰庵府府府碧雲荘府慶阪
府阪荘府依奈良奈良裏千又妙庫登録念最登録件温荘潮遊
個最無鄰菴慶温荘昭福丈府松尾松府城府根阪府堺堺根足
根根専誌念照欧鼓橋篭奔放制職招制江幕府薩摩動伊郎雇
革真真視覚販売児玉英際第復刻談復刻刊英諸採旅惹経験
著協役割果龍松助究育傍著英訳紹第姉妹衆愛専誌項幽玄

Many, many duplicates now leap out at you, right ? Do you recall any mnemonic for them or for some related kanji ? A topic ? A story ?

Grab two rows from the file and try to say by scanning along each row if the row has duplicates found in the other row. Take a moment. Look forward. Glance back ( we often have to glance back when learning to read in a second or third or distantly-related script !)

These two rows are easy :

府阪荘府依奈良奈良裏千又妙庫登録念最登録件温荘潮遊
個最無鄰菴慶温荘昭福丈府松尾松府城府根阪府堺堺根足

These two rows are at the top of my practice file :

弥哀徳尊庶霊沼鳥飛歴魚沼満跳故恭遊后蔬菜培実質允恭
傍櫻華段階妥推弥呉橋斉斉百済帰畔弥呉橋根欄干橋蘇我

For me, this morning, I don't recognize 蔬 ... BUT : I did note its features in comparison to 4 other kanji in these last 2 rows and THAT is what I need to develop : the capability to note features, respond to features, aware or unaware as I may be that these are distinctive for my reading. As a professor of mine once wrote, " to be is to be distinguished."




Tuesday, 5 August 2014

kanji index for our KanjiRecog page 2


漢字 · かんじ

Aule Kanji Pages · Kanji Recog Pages

The 1,189 kanji in our TEXT PAGE 2 for KanjiRecog are

丁七万三上下不与世丙中丸丹主久之乗九也乱了予争事二五井亜交京人仁仇今介仕他付仙代令以仮仲件任企伊会伝伯伴伸似位住佐体何余佛作佳併使例侍侑供価侵係俊保信修俳俵俺倉個倍倒候倣倫健側偶傑備傳傷働像僕僚億優元兄充兆先光免児入全八公六共兵具典内円冊再冒冗写冠凝凡処凱出刀分切刊初判別利到制刷刻則前剣創劇力功加助努労効勇勉動勘務勝募勤勲化北匠区十千午半卒卓協南単卜印原厳去参又及友反収取受叙口古句可台史右号司合吉同名向君否含呂告周味呼命和哀品員唄唐商問啓喜喬営嗜嗣嘆嘉四回団困図固国國園土在圭地坂坊型垣城執基堀堂堯報場塔塗塩境墓増墨壊士声売変夏夕外多夢大天太夫失奈契女奴好如妥妨妻委姿娘娯婚嫌子字存孝季学孫宅宇守安完官宙定宝実客室宮宴家容宿寄寒寛實審寶寺対封専射将尊尋導小少尚尽尾局居屋屍展属履山岡岩岸峰島崇崎崑崩嵐嶺川州巡巣工左巧巨差巳巻市希帝師席帰常幅幡平年幸幹幼広底店府度座庫康廃廉延建弁式弓引弘弟弥張強当形彦彫彰影役彼待後従得復徳徴徹心必忍志応忠念怒怖思性恋恐恩息恵悔悠悩悪悼情惚惠想意愛感態慰憎憶懐成我戚戦戯戸戻房所手才打扱批承技投折抜抵担拒招拠括拾持指挙挟挫振授掛採接控推描提摩撃撮擦支改放政敏敗教敬数敵敷文斎料断新方於施旅旋族旗既日旧旨早旬旺昇昌明星映春昭是時晃晋晩景晴智暁暇暮暴曜曰曲更書曾最月有朋服朗望朝期木未末本杉村条来東松板林枚果枝枠枷柳査栄校根格桂桃案桑條棄棒棚森植検椿楠業極楼楽概榎構様樋模権横樹橋機櫃欠次欣欲歌止正武歳歴死残殴段殺殿母毎比民気水汁求江池汰決沙没沢河油治沼沿況泉法波注泳洋津活派浅浜浦浩浪浴海消涯淀淑淡深淳混添清済減渡湯満準溝滝演漢潜潤潮潰澤激濡瀬点為無然焼煙照煮熊熱燃爆父版牛牧物特犬状狂独猪献獄獅獣玉王玲珍現理瑞璧瓶甚生産甥用田由申男町画界番異疑痴療癌発登白百的益盗盛盟監目直相省看県眞真眠眼着督矢知石砂研砕砦破確磨示礼社神祭福秀私秋科秘秦称移税稚種稲稿穂究空突窓窮立章端竹笑笠第筆等筑答策算管節築篤籍米粂粵精系紀約納純級素細紳紹終組経結絞絡給統絵絶絹継続綴緊総緒線締編練縁繁繋繰續置署罵羅美群義翌習翻翼考者聖聞聰職肇肉肖育背能脂脇脚脱脳腕腰膳臨自興舎舘舞舟航般船良色芥花芳芸若苦英茅茉茶草荏荒荘莉菅菊菜華萌萩萬落葉著蒙蓮蔵薦薩薫藝藤蘭虎虹蛛蜘蝦蝶蟇衆行術街衛衝衣表衰裁装裏裕補裳製複西要見規視覚覧親観角解触言計訊記訣訪設許訳訴証評詞詩話該詳誉誌認誕誘語誤説読課調談論諦諸諾謎講謝識議譲護谷豆豊豚象貞負財販貫責貴買貸費賀資賛賞質購赤走起超越趣足跡路踏躍身車軋軍軒転載轟轢辰農辺辻込辿近返迫述追退送逃通逝造連週進逸遂遅遇運過道達遠遣適遭選遺邦邸郎郡部郷都配酒酔酷醉醜里重野量金釜針鈴鉄銀鋭録鍵鎌鑑長門閉開間関閲闘阿附降限院陣除陳陶陽隆隊階際隠雄雅集雑離難雨雪電露青静非面音響頁頃項順須頓頭頻頼題額顔顕願類風飛食飯飲養館香馬駿騎験骨高鰤鶏鷲黄黎黒龍

These kanji you may need to look up outside the usual books :

佛侑傳國堯實寶屍崑惠曰曾枷條櫃甥粂粵續聰茉莉蛛蜘蟇裳訣軋轢辿醉鰤 but remember : the point of the exercise is recognition as an aid to reading. Once you are used to "reducing" a text using "Replace" with nothing edits you may want to grab large pieces of text and remove all of the kana and most of the kanji you have mastered and then work down through them, removing what you recognize until you are forced to identify kanji by their features and your own mnemonics.