Monday, May 6, 2013

Porting a Module to Perl 6

CPAN is a huge draw for Perl 5, with approximately umpteen zillion modules available for a wide arrangement of purposes.  It's probably the biggest draw for the Perl 5 language these days, given the newer, hipper scripting languages out there like Ruby, Python, and of course INTERCAL.

The problem is, these modules are not yet usable in Perl 6 directly.  There is an ongoing project to allow Perl 5 code to run in Rakudo, but so far only the most basic code works: like basic loops, quite a few builtins, backticks, etc.  It does inherit from the Perl 6 object system, which is pretty cool, so $foo->WHAT can tell you if you have a Str, Int, or IO::Handle.

So for right now, the only practical way to use Perl 5 modules is to rewrite them in Perl 6.  I just finished porting the File::Spec module, one of Perl 5's core modules, to help deal with file paths on different operating systems. FROGGS++ did much of the initial work on it, but he's moved on the P5 in P6 project mentioned above, so I picked up the slack. The end goal of the project is for me to integrate functionality like Perl 5's Path::Class into the core language, so that OS interoperability comes naturally when using the native functions.

As I got further into the port, I have been convinced that porting the module is a much better choice than relying on the Perl 5 code being integrated.  There are several reasons for this:

Code Cruft


There is a lot of support for operating systems that are now out of date.  This isn't a bad thing.  I'm sure that there's some hobbyist who will want to run Perl 6 on their OS/2 Warp system.  The problem comes when you look inside the code for the OS2 module:
    $path =~ s/^([a-z]:)/\l$1/s;
This little no-op snippet from canonpath (to produce the canonically correct path) converts a lowercase drive letter to lowercase.  It's not harmful, but it does illustrate the fact that no one has edited this code in 9 years.

This isn't the fault of the Perl 5 Porters -- they have plenty of better things to do than to support outdated OSes when not even bug tickets are coming in.  But translating the code sure gives a great opportunity to notice these problems.

In the end, I ended up cutting the entire OS2 module and delegating to Win32.pm, because it had support for things like UNC paths (//server/share) that OS2.pm had only half-implemented.  And so a huge block of code cruft bit the dust.

Readability and Maintainability


Part of the reason these issues happen in the first place is that it's harder to see what's going on in a given piece of code.

An example I came across was in this helper for tmpdir, a method to return the first temporary directory that's writable in a list of parameters.  In Perl 5, we get:

sub _tmpdir {
    my $self = shift;
    my @dirlist = @_;
    my $tmpdir;

    foreach (@dirlist) {
    next unless defined && -d && -w _;
    $tmpdir = $_;
    last;
    }
    return $self->canonpath($tmpdir);
}


That's actually good, idiomatic code for Perl 5, though it can look like spooky action at a distance if you're not aware of what's going on with $_, @_, and shift.

Equivalent code in Perl 6 looks like this:

method !tmpdir( *@dirlist ) {
    my $tmpdir = first { .defined && .IO.w && .IO.d }, @dirlist;
    return self.canonpath($tmpdir);
}


No messing about with parameters and keeping track of the object -- it all happens in the signature.  You no longer have to read through a loop to understand the code either -- in Perl 6 you can just say that you want the first matching candidate, and first() will lazily test the list for you.

The P6 version gets to the point much faster, and it's much closer to natural language: "set $tmpdir to the first defined writable directory in @dirlist."  Less, easier to read code is easier to maintain.

Changing Old Features


At some point, your code was working perfectly and passes all the tests.  But then the computer world changes around you, and it no longer makes any sense.  And you would like to refactor, but people rely on the old functionality.

This is exactly what happened for File::Spec's case_tolerant function.  It essentially looks at the operating system alone, and uses that to determine if the filesystem is case-sensitive.  Which in the old days made perfect sense when Macs used HFS+, Windows used FAT, and Unix used ufs or a variant.  But my computer runs Mac OS X and Windows and has several drive partitions in different formats.  Heck, the NTFS drives are case sensitive in POSIX-land, but as soon as I boot Windows they become case insensitive.

The only reasonable way to check this now is to actually check the filesystem for a specific directory, given widespread support for symlinks.  This breaks the old functionality.  But there's no time like a major language revision to break old APIs and replace them with shiny new ones.

However, there are a couple of major downsides to porting:

This is really time-consuming


Sure, you don't have to implement the algorithm from scratch, and you have plenty of tests to help your development.  It would be possible to just translate the existing code, because things aren't that different.  Change an if( $foo ) to if $foo, etc.

However, a major reason for doing the porting is to use the Perl 6 idioms instead, especially in function declarations and regular expressions where it makes a major difference in code readability.

Dependencies aren't available


Sometimes your code relies on separate modules not available, or on not yet implemented functions.  Your choice becomes to either implement the functionality yourself and embark on yet another yak-shaving expedition, or mark it as todo and wait for the appropriate functionality to arrive.

This has become a much smaller problem as of late as the core language matures.  But "done enough" is not really "done".



Now that I've written this, I've realized that my own project is a microcosm of the Perl 6 saga.  Making a better codebase takes a lot of time, but it ultimately seems worth the effort.

Of course, once I had gotten this far, I realized that File::Spec -- or something very much like it -- would be needed to implement IO::Path objects for non-unixlike OSes.  So stay tuned for the next part in this saga: How to add File::Spec to Rakudo.

Update: It ended up turning into two posts:  One was a simple guide on How to Start Hacking Rakudo Perl 6, and the other covered my follies in trying to add to the compiler for the first time.  But the short story is that IO::Path is now added to Perl 6 and implemented in Rakudo -- this means that both File::Spec and Path::Class' behavior are now available in the core language without adding modules.

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Kudos for porting File::Spec!

    I am myself in a gradual quest of porting all CPAN modules I have been using in my main projects, before porting my own work as well (currently looking into the XML::LibXML family).

    From time to time the occasional "gotchas" creep in, so I started a "porting diary" of sorts, which I hope to make a community effort:
    https://github.com/dginev/perl6-Porting-Pearls

    It's realized via a Github wiki, so everyone is welcome to fork and contribute, and of course use it as a reference while porting.

    ReplyDelete