ISCII Utilities

Description

Two programs for analyzing files encoded according to the Indian Script Code for Information Interchange (ISCII), the Indian national standard.

IsciiName identifies each code, printing the byte offset, the code in hex, and an explanation of the meaning of the code. ATR codes for writing system transition and display mode are interpreted. Typical output looks like this:

000792 0xC2 TA (voiceless unaspirated alveolar stop) 000793 0xE1 Vowel sign EY 000794 0xCF RA (alveolar rhotic) 000795 0xDA Vowel sign long A 000796 0x2C Comma 000797 0x20 Space 000798 0xC9 PHA (voiceless aspirated bilabial stop) 000799 0xE9 Nukta (subscript dot used to derive additional letters) 000800 0xDB Vowel sign I 000801 0xCF RA (alveolar rhotic) 000802 0x20 Space 000803 0xC8 PA (voiceless unaspirated biblabial stop) 000804 0xE8 Virama (suppresses default short A) 000805 0xCF RA (alveolar rhotic) 000806 0xD7 SA (voiceless alveolar fricative) 000807 0xDA Vowel sign long A 000808 0xC4 DA (voiced alveolar stop) 000809 0x20 Space 000810 0xBA JA (voiced palatal affricate) 000811 0xB5 GA (voiced velar stop) 000812 0x20 Space 000813 0xC8 PA (voiceless unaspirated biblabial stop) 000814 0xDA Vowel sign long A 000815 0xAC Vowel letter EY 000816 0xB5 GA (voiced velar stop) 000817 0xDA Vowel sign long A 000818 0x2C Comma 000819 0x0D Carriage return 000820 0x0A Newline (writing system and display mode are reset to default)

CountIsciiChars counts the codes in an ISCII file and classifies them according to their type and function. The original purpose was computing accurate letter counts for reading studies, but this information may also be useful when processing or converting ISCII-encoded text. Typical output looks like this:

Indian letters:			      6366
	 Consonants:		      3756
	 Vowel letters:		       220
	 Vowel signs:		      2159
	 Anusvara:		        50
	 Avagraha:		         0
	 Chandrabindu:		       180
	 Visarga:		         1
Indian digits:			         0
Indian punctuation:		         0
Virama/Halant/Pulli:		       185
Nukta:				        49
OM symbol:			         0
ASCII letters:			       636
ASCII digits:			        92
ASCII punctuation:		       591
ASCII whitespace:		      1666
ASCII carriage return:		       267
ASCII line feed:		       267
ASCII control characters:	         0
ATR:				         0
EXT:				         0
INV:				         0
ATR/EXT arguments:		         0
Invalid codes:			         0
Total:				     10119

Changes

Version 2.2

Version 2.1

Details

LanguagePython
EnvironmentOS Independent
Current version2.2
Last modified2005-10-05
LicenseGNU General Public License

Download



Back to Bill Poser's software page.