2005/11/13

Special support for unicode characters

I've just made it so that certain unicode characters map directly to certain language tokens. This makes it easy for Unicode-enabled source code editors to view certain language tokens (mostly mathematical) as single glyphs, such as the comparison operators, logical AND, logical OR, etc.

For instance, the '<=' token, parsed as two separate ASCII characters '<' and '=', may also be represented by the single unicode character ≤ (U+02264). Similar mappings follow:

TokenUnicodeGlyph
&U+02227
|U+02228
!=U+02260
<=U+02264
>=U+02265
!<U+0226E
!>U+0226F
!<=U+02270
!>=U+02271
<>U+02276
><U+02277
!<>U+02278
!><U+02279


As you may have noticed, I haven't included any of the mathematical function symbols, like the summation, product, and integral signs in the token mapping because I don't believe they belong there. These characters are ripe for usage in Iris' future standard library as aliases for common mathematical functions. So, the greek capital gamma can really represent the gamma function in mathematics! We can follow logically with the jacobian function, PI, infinity, etc. Further, the summation symbol (greek capital sigma) can be used as an alias to a generic vector sum function. The possibilities are endless!

As of right now, as far as the lexer is concerned, you may name your variables in Iris just as you would in mathematics: with the use of greek letters! They are considered alpha characters in Unicode, hence are perfectly legal to be used in identifiers in Iris.

Despite the lack of complete unicode character support from this blogger, I have provided images of the glyphs which are hosted on www.w3.org. Also, unfortunately, I haven't been able to view most of these symbols on my own Windows XP SP2 (english) system.

While I do develop concurrently on both Linux and Windows, I develop on Linux via PuTTY remote shell so the resulting output to the shell screen, albeit correctly outputted, renders incorrectly due to my Windows' lack of font support for such characters. Some of the more common characters do appear correctly, whilst the others appear as a square indicating that those characters are missing.

BTW, the Unicode reference that I'm referring to is http://www.w3.org/TR/MathML2/chapter6.html. It sports a very useful and convenient table-style mapping of unicode character values to their respective glyph. I'll be referring to this table often in the future.

1 Comments:

At 11/22/2005 04:37:00 PM, Blogger Jon Heizer said...

Also a cool idea. Keep it up!

 

Post a Comment

<< Home