Libuninum

News
Description
Details
Environment
Documentation
Downloads
Change Log
Bugs
Roadmap

News

Version 2.7 adds support for Kayah Li, Lepcha, Ol Chiki, Saurashtra, Shan, Sundanese, and Vai. Full width characters are now accepted in Western numbers.

This is a library for converting Unicode strings to numbers and numbers to Unicode strings. Standard functions like strtoul, strtod, and sprintf do this for numbers written in the usual Western number system using the Indo-Arabic numerals, but they do not handle other number systems. The main functions take as input a UTF-32 Unicode string and compute the corresponding unsigned integer. For example, they will convert the Chinese string 五十九万四千三百二十一 to the integer 594,321 and the Devanagari string ७८४९२ to the integer 78,492. Internal computation is done using arbitrary precision arithmetic, so there is no limit on the size of the integer that can be converted.

The value of the string is returned in one of three forms. One option is a string of ASCII characters containing the decimal representation of the integer using the Indo-Arabic digits. This option has the virtue of avoiding any possibility of overflow or truncation. The second option is to obtain the value as a GNU MP mpz_t object. This is only useful if you are going to do further computation using GNU MP. The final option is to obtain the value as an unsigned long integer. If you are going to do internal calculations, this is probably the most convenient option, but some numbers (in fact, infinitely many) will not fit into an unsigned long integer. The library guarantees that no overflow or truncation will occur; if the number will not fit, it sets an error flag and returns 0.

An inverse function accepts as input an unsigned long integer, an mpz_t object, or an ASCII decimal string and converts it to a Unicode string in a selected number system.

If you use the library, I would be interested in knowing what you are using it for. My own application is in my sort utility msort.

In addition to the library, the command-line program numconv is provided both as an example of use of the library and as a utility possibly of use in its own right. In addition to the number system conversions that are its main use, numconv provides a convenient way to delimit numbers generated by other programs without delimitation or with delimitation inappropriate for the locale. To do this, set both input and output to Western numbers and either set the output delimitation parameters directly on the command line or use the -L flag to obtain them from the locale. For example, both:

echo "123456789" | numconv -f Western_Lower -t Western_Lower -g 2 -G 3 -s ' '

and

echo "123,456,789" | numconv -f Western_Lower -t Western_Lower -g 2 -G 3 -s ' '

will produce the output:

12 34 56 689

which might be appropriate in an Indian locale.

There is also a graphical number converter, NumberConverter, which performs a similar function to numconv.

The number systems currently supported (with some variants omitted) are the following. (Unless you have an unusually comprehensive set of fonts, your brower will not display all of them.)

Aegean	𐄝𐄓𐄌
Arabic	٥٤٦
Arabic Alphabetic	ثمو
Armenian Alphabetic	ՇԽԶ
Balinese	᭕᭔᭖
Bengali / Assamese	৫৪৬
Burmese	၅၄၆
Chinese	五百四十六
Chinese Accounting	伍佰肆拾陸
Chinese Counting Rods	𝍤𝍬𝍥
Chinese Place	五四六
Chinese Suzhou	〥〤〦
Common Braille	⠑⠙⠋
Cyrillic Alphabetic	ФМЅ
Devanagari (Hindi, Marathi, Sanskrit)	५४६
Egyptian (hieroglyphic)	𔌻𔌻𔌻𔌻𔌻 𔍓𔍓𔍓𔍓 𔎡𔎡𔎡𔎡𔎡𔎡
Ethiopic	፭፻፬፲፮
Ewellic Decimal	
Ewellic Hexadecimal	`
French/Czech Braille	⠱⠹⠫
Georgian (Mxedruli)	ფმვ
Georgian (Xucuri)	ႴႫႥ
Glagolitic Alphabetic	ⰗⰍⰅ
Greek Alphabetic	ΦΜϚ
Gujarati	૫૪૬
Gurmukhi	੫੪੬
Hebrew	רתמו
Hexadecimal	0x222
Hungarian Runes	
Kannada	೫೪೬
Kayah Li	꤅꤄꤆
Kharoshthi	‭𐩀𐩀𐩃𐩅𐩅𐩆𐩀𐩃
Khmer	៥៤៦
Klingon	
Lao	໕໔໖
Lepcha	᱅᱄᱆
Limbu	᥋᥊᥌
Malayalam	൫൪൬
Mongolian	᠕᠔᠖
New Tai Lue	᧕᧔᧖
Nko	߅߄߆
Ol Chiki	᱕᱔᱖
Old Italic	𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌣𐌢𐌢𐌢𐌢𐌡 𐌠
Old Persian	𐏕𐏕𐏕𐏕𐏕 𐏔𐏔 𐏒𐏒𐏒
Oriya	୫୪୬
Osmanya	𐒥𐒤𐒦
Perso-Arabic	۵۴۶
Phoenician	𐤙𐤙𐤙𐤙𐤙𐤘𐤘𐤘𐤘𐤖𐤖𐤖𐤖𐤖𐤖
Roman numerals	DXLVI
Russian Braille	⠢⠲⠖
Saurashtra	꣕꣔꣖
Shan	႕႔႖
Sinhala	෫෾෸෬
Sundanese	᮵᮴᮶
Tamil Place	௫௪௬
Tamil Traditional	௫௱௪௰௬
Telugu	౫౪౬
Tengwar (mortal)	
Tengwar (Elvish)	
Thai	๕๔๖
Tibetan	༥༤༦
Vai	꘥꘤꘦
Verdurian	
Western	546

Ewellic, Klingon, Tengwar, and Verdurian do not have official Unicode encodings. The library assumes that they are encoded in the Private Use Area in accordance with the encodings registered with the Conscript registry. The Hungarian Runes do not yet have an official Unicode encoding. They are encoded in the Private Use Area in accordance with the proposal of Gaspar Sinai. Kayah Li, Lepcha, Ol Chiki, Saurashtra, Shan, Sinhala, Sundanese, and Vai are encoded according to the not-quite-final draft of Unicode 5.1.

In some cases, both traditional non-place based systems and their modern place-based counterparts are supported. In addition to the specialized Counting Rod and Suzhou numbers, a total of fifteen variants of the "ordinary" Chinese numbers are supported.

The basic interface is from C but a Tcl interface is also provided.

Details

Language	C, Tcl
Dependencies	GMP arbitrary precision arithmetic library
Current version	2.7
Last modified	2007-12-08
License	GNU Lesser General Public License

Environment

The GNU arbitrary precision arithmetic package GMP is required. The library should work on any POSIX-compliant system on which GMP is available, which means just about any POSIX-compliant system. Kernels on which it is reported to work include: FreeBSD, Linux, Mac OS X, OpenBSD. I would appreciate reports of success or failure on other systems.

The installation process seems not to work properly on OpenBSD. First, the configure script may not detect the presence of GNU MP, even if it is properly installed. Second, the -I and -L flags need to be given to gcc but are not automatically added to the makefile by autoconf. I haven't yet figured out how to make things work automatically on OpenBSD. If you don't know either, please bear with me. If you do know, you might tell me.

Documentation

Numconv has a manual page. For the library, for the time being, consult the README files and the sample programs in the Examples directory, as well as numconv.c.

Math Forum listing

Downloads

Source

libuninum-2.7.tar.gz

libuninum-2.7.tar.bz2

libuninum-2.7.zip

If you would like to be notified of new releases, subscribe to libuninum at Freshmeat.

Packages

Debian: Debian packages
Fedora Core: RPMs
FreeBSD: Freshport
Mac OS X: Mac OS X
: Darwinports
Redhat: RPMs
Solaris (SPARC and Intel): Solaris Package Index
T2: T2

Changes

2.7

Adds support for Kayah Li, Lepcha, Ol Chiki, Saurashtra, Shan, Sundanese, and Vai.
Full width numerals and hex digits are now accepted on input.

2.6

When generating Roman numerals, M is now the default for thousands. The variable Uninum_Generate_Roman_With_Bar_P now controls the choice between M and a bar over the unit character. A parallel option has been added to numconv and to NumberConverter (only via the init file).
A README file containing documentation for NumberConverter has been added.

Fix Ethiopic output
Improve documentation.
Add systems added in recent Unicode revisions.

Back to Bill Poser's software page.

Libuninum

Contents

News

Description