Two programs for analyzing files encoded according to the Indian Script Code for Information Interchange (ISCII), the Indian national standard.
IsciiName identifies each code, printing the byte offset, the code in hex, and an explanation of the meaning of the code. ATR codes for writing system transition and display mode are interpreted. Typical output looks like this:
000792 0xC2 TA (voiceless unaspirated alveolar stop) 000793 0xE1 Vowel sign EY 000794 0xCF RA (alveolar rhotic) 000795 0xDA Vowel sign long A 000796 0x2C Comma 000797 0x20 Space 000798 0xC9 PHA (voiceless aspirated bilabial stop) 000799 0xE9 Nukta (subscript dot used to derive additional letters) 000800 0xDB Vowel sign I 000801 0xCF RA (alveolar rhotic) 000802 0x20 Space 000803 0xC8 PA (voiceless unaspirated biblabial stop) 000804 0xE8 Virama (suppresses default short A) 000805 0xCF RA (alveolar rhotic) 000806 0xD7 SA (voiceless alveolar fricative) 000807 0xDA Vowel sign long A 000808 0xC4 DA (voiced alveolar stop) 000809 0x20 Space 000810 0xBA JA (voiced palatal affricate) 000811 0xB5 GA (voiced velar stop) 000812 0x20 Space 000813 0xC8 PA (voiceless unaspirated biblabial stop) 000814 0xDA Vowel sign long A 000815 0xAC Vowel letter EY 000816 0xB5 GA (voiced velar stop) 000817 0xDA Vowel sign long A 000818 0x2C Comma 000819 0x0D Carriage return 000820 0x0A Newline (writing system and display mode are reset to default)CountIsciiChars counts the codes in an ISCII file and classifies them according to their type and function. The original purpose was computing accurate letter counts for reading studies, but this information may also be useful when processing or converting ISCII-encoded text. Typical output looks like this:
Indian letters: 6366 Consonants: 3756 Vowel letters: 220 Vowel signs: 2159 Anusvara: 50 Avagraha: 0 Chandrabindu: 180 Visarga: 1 Indian digits: 0 Indian punctuation: 0 Virama/Halant/Pulli: 185 Nukta: 49 OM symbol: 0 ASCII letters: 636 ASCII digits: 92 ASCII punctuation: 591 ASCII whitespace: 1666 ASCII carriage return: 267 ASCII line feed: 267 ASCII control characters: 0 ATR: 0 EXT: 0 INV: 0 ATR/EXT arguments: 0 Invalid codes: 0 Total: 10119
Language | Python |
Environment | OS Independent |
Current version | 2.2 |
Last modified | 2005-10-05 |
License | GNU General Public License |