2008년 12월 19일 금요일

Encoding name normalization in Python

Python provides codecs module, which provides the codec registry. You can query the registry with codecs.lookup function. codecs.lookup function receives an encoding name as an argument.

The venerable IANA, our Internet Assigned Numbers Authority, maintains official names for character sets. On the other hand, nobody really cares. According to IANA, "UTF-8" is an encoding name, with no aliases. Therefore, "UTF8", "UTF_8" or any such aliases should be invalid. That doesn't really work in the real world.

So Python normalizes encoding names received from codecs.lookup. How exactly this is done isn't really specified. It turns out that CPython does normalization in two separate places: one in Python in the standard library, and one in C in the implementation. There are normalize_encoding function in Lib/encodings/__init__.py, and normalizestring function in Python/codecs.c. Moreover, these two functions perform different normalizations.

IronPython, being an implementation of Python trying to be compatible with CPython, needs to cope with this. You will be surprised by the number of ways things can go wrong if you don't exactly match how this is done. But did you know that the following code work with CPython? (I don't recommend this!!!)

import codecs
codecs.lookup('utf!!!8')


Yes, those are three exclamation marks. I'm not kidding...

2008년 6월 29일 일요일

My reading list

I haven't blogged for a while, so here is a lame blog post listing feeds I am currently subscribed to. I am a Liferea user, by the way. I tried to migrate to online feed reader, but somehow it didn't seem more convinient.

Software projects


I am subscribed to two software project feeds. One is of course IronPython, which I mainly use to monitor the issue tracker. I would prefer mail notification for this, but it's not implemented in CodePlex.

Another is PyPy. Quoting somebody on YouTube (hah!), "PyPy is the most ambitious project any language has ever had", and I believe it. Even if you don't believe it, it's well worth subscribing if you are into programming languages. By the way, the YouTube video is PyPy - Automatic Generation of VMs for Dynamic Languages.

Programming language developers


I am subscribed to blogs of developers implementing programming languages. For IronPython, I have Dino Viehland and Martin Maly on the list. For JRuby, Charles Nutter and Ola Bini are great.

Just like about everybody else, I read John Lam to follow IronRuby. From Mono developers, I only read Jb Evain. If signal-to-noise ratio for me were a bit higher, I would have read Miguel de Icaza -- but I don't.

For JVM, I read Gary Benson and John Rose, although both are usually over my head. Last two blogs in this category are from GCC developers (among other things): Ian Lance Taylor and Tom Tromey. Surprisingly, Ian and Tom are usually not over my head! I thank them for their generosity -- I am entirely sure that they are capable of writing posts inscrutable to me. :)

Others


Being a science fiction fan, I read Charles Stross. There are many great science fiction writers, but that set somehow doesn't seem to intersect with the set of great bloggers a lot.

Being a fan of mathematics, I read Terence Tao. I don't pretend to understand technical materials there, but occasional posts directed to the "public" is simply great. For computer science fans (as opposed to software engineering!) I recommend Scott Aaronson. Be sure to check out his lecture notes!

I read Alp Toker for no particular reason. His posts always have been enjoyable to me. I temporarily have Antonio Cangiano on the roll, mainly not to miss his Ruby shootout.

Korean blogs


That leaves me some Korean blogs. Hye-Shik Chang, a Python developer and a FreeBSD port maintainer, is the best Korean blogger in his niche. Park, Seong Chan is a theoretical physicist who writes great approachable posts on physics news. Kim Gyuhang is a progressive columnist. I sympathize with his political views. He is also my writing model for how to write clear and affecting Korean prose.

2008년 3월 24일 월요일

Inlining in Mono JIT

It seems that as of Mono 1.9, the inliner in Mono JIT compiler never inlines functions with any branching opcode. To those in the position to know, I ask:

1. Is this true?
2. If it is true, should I manually inline functions like the following?

public static int Min(int x, int y) {
if (x < y) return x;
else return y;
}

2008년 1월 13일 일요일

CLR Add-In

Today I came across CLR Add-In. You can start reading from "System.AddIn Resources" link on the left sidebar.

Its discovery and adaptation model looks rather comparable to those of zope.interface, but it also deals with isolation. Reading the blog, it's interesting to see what design choices there are, and how and why CLR Add-In Team made those decisions.

2008년 1월 11일 금요일

Using AT-SPI from IronPython (2)

Before continuing, let me mention that all the relevant code is in the FePy repository:
https://fepy.svn.sourceforge.net/svnroot/fepy/trunk/atspi/

Now IIOP.NET is built, it's time to compile IDL. IDL stands for "Interface Description Language", and it is used to (surprise) describe the interface. AT-SPI's CORBA interface is described in /usr/share/idl/at-spi-1.0/Accessibility.idl, and it includes a bunch of other files. Some of these files are in different directories, so one needs to specify them.

The compiler built is under IDLToCLSCompiler/IDLCompiler/bin. Copy these files (IIOPChannel.dll, IDLPreprocessor.dll, IDLToCLSCompiler.exe) to the current directory, and run:

$ mono IDLToClsCompiler.exe \
-idir /usr/share/idl/bonobo-2.0 \
-idir /usr/share/idl/bonobo-activation-2.0 \
Accessibility /usr/share/idl/at-spi-1.0/Accessibility.idl


This should produce Accessibility.dll in the same directory. build.sh in the repository automates the process up to this point (download, build, and IDL compilation).

So how does one connect to the server? This is "a well known problem" that has its own FAQ entry. Basically, one obtains IOR, "Interoperable Object Reference", by out-of-band mean, as one gets URL from the bookmark. After one got the first object reference, one can follow links to other objects.

It turns out that AT-SPI publishes IOR as a property of X root window under the name "AT_SPI_IOR". Now one could go read X protocol specification and manually construct GetProperty request (opcode 20), etc., but there is an easier way. xprop utility can display X properties, so one runs "xprop -root AT_SPI_IOR" and parses the output. xprop.py in the repository implements this.

Now IOR is a long sequence of hexademical digits, and one needs a tool to decode it. ior-decode-2 in orbit2 package can do so. If you decode IOR from xprop, you can notice a problem. AT-SPI (actually CORBA implementation it is using, namely ORBit2) uses Unix domain socket by default, but IIOP.NET can't use it. One solution is in the ORBit2 FAQ I linked above. Create .orbitrc with this line.

ORBIIOPIPv4=1


This post is already quite long, so let's quickly skim the rest. corba.py implements the necessary initializations from IIOP.NET documentation. typed.py is a workaround for IronPython's limitation (namely the lack of cast operator), first suggested by Dino Viehland. And this is the meat of cliatspi.py.

orb = corba.init()
ior = xprop.get('AT_SPI_IOR')
obj = orb.string_to_object(ior)
registry = typed.typedproxy(obj, Accessibility.Registry)


tree.py is an example AT-SPI client I wrote, printing a tree of accessible objects in the current desktop. It imports cliatspi on IronPython, but imports an existing AT-SPI binding on CPython. (I used one in Debian python-at-spi package.) As IDL is language-neutral, this script actually runs identically on both CPython and IronPython. Extending tree.py to be a useful tool like UISpy on Windows is left as an exercise to the readers.

2008년 1월 10일 목요일

Using AT-SPI from IronPython (1)

This adventure started when Jim Hugunin mentioned System.Windows.Automation library new in .NET 3.0.

GUI test automation and assistive technology (such as screen readers) share some common needs. While UI Automation is named after the former, AT-SPI is named after the later. AT-SPI, which stands for "Assistive Technology Service Provider Interface" -- this is the first of lengthy acronyms that will appear in this post -- is an accessibility standard for Unix/X world. Initially developed by the GNOME project, now it is also supported by Java, Mozilla, OpenOffice.org, and Qt 4.

While Microsoft-Novell interoperability agreement announced an intention to implement UI Automation on Linux (see above Wikipedia links for details), that's not available today on my Debian GNU/Linux desktop. So I looked for a way to use AT-SPI from IronPython on Mono.

First thing I did was to install at-spi package from the Debian repository. That was obvious... Less obvious was how to use it after installation, especially because I am not using GNOME desktop (I am an IceWM user). After some search, I added following two lines to my .xsession.

export GTK_MODULES=gail:atk-bridge
/usr/lib/at-spi/at-spi-registryd &


Now AT-SPI has an accessibility broker which clients talk to, and it talks CORBA. CORBA, which stands for (I warned you) "Common Object Request Broker Architecture", is like a big brother of IPC mechanisms. CORBA has been around for a long time, and while it is sometimes accused of bloat, its bloat is nothing compared to certain XML-based "Simple" Object Access Protocol.

So how does one use CORBA from Mono? A little search found a nice project named IIOP.NET, which "allows a seamless interoperation between .NET, CORBA and J2EE distributed objects." Cool. This project even has a support table for Mono on its status page! The download page mentions both binary and source release, but I couldn't find the binary release. No problem. Download the source release, unzip, and run "make -f Makefile.mono". Note that Makefile is for nmake, a Microsoft dialect of make, which is not compatible with GNU make. The build finished with no problem.

Bah, this is getting too long. Let's continue on the next post.