Sunday, 14 October 2012

bug or feature ?


Three pages of kanji definitions have the first meaning repeated, but the second lost.

三 4E09, 23, three, three ; %e4%b8%89
上 4E0A, 37, above, above, up ; %e4%b8%8a
下 4E0B, 7, below, below, down, descend, give, low, inferior ; %e4%b8%8b
不 4E0D, 572, negative, negative, non-, bad, ugly, clumsy ; %e4%b8%8d
 
Yet somehow the first meaning is more eye-catching for me. An unintended feature ?
 
Error in HTML generation script after XML parse of kanjidic2




Thursday, 11 October 2012

kanji by UCS

Here are some the the first, basic Japanese kanji sorted by their UCS value (UTF-16 codepoint)


一  丁  七  万  丈  三  上  下  不  与  世  丘  丙  両  並  中  丸  丹  主  久  乏  乗  乙  九  乱  乳  乾  事  二  亜  享  京  亭  人  仁  今

Is a particular pattern evident or helpful ?

I am in the process of adding a UCS - to- kanji -to- urlencoded - utf-8 page over at kanji.aule-browser.com which uses the Curl web content language (only a few lines of declarative script and a wee bit of procedural script required.)

UPDATE : that page with the HTML urlencoding for each character is at http://www.aule-browser.com/kanji/henshall-sorted-urlencoded.html .

A simpler plain HTML page is http://www.aule-browser.com/kanji/henshall-sorted-by-unicode.html .

Another safe, plain HTML page with no scripts, images, ads or other nuisance has the 1,945 Hernshall basic Japanese kanji sorted as they appear in the book - by their so-called Henshall number - is at http://www.aule-browser.com/kanji/henshall-sorted-by-id.html.

By viewing the page source in your browser you can see that there are no script or image elements to worry about - so you can safely copy this HTML text to your local machine to edit as you see fit.

The HTML text was generating using a Curl applet running off-line and parsing the Kanjidic2 XML.