Irregular Expression: May 2013

Wednesday, May 8, 2013

How to start hacking on Rakudo Perl 6

In the course of writing modules, I finally got the urge to start implementing features I wanted in Rakudo itself. And since there wasn't a real guide on how to set up and patch Rakudo, I decided to share what I had learned in the process.

The nice thing about Perl 6 implementations is that as significant portion of them is written in Perl 6. (Well, one nice thing anyway.) This means that if you're comfortable writing Perl 6 modules and classes, you should feel pretty much at home in the source.

This guide assumes so, and that you have a basic familiarity with Github, git, and make -- enough to commit to repositories and build a software package, anyway.

Getting Started

This first thing is to get your own branch of Rakudo to work on. So go to the Rakudo repository and click the fork button in the upper right. Relax while Github photocopies a book. Once that's done, find an appropriate directory to git clone it to on your own machine.

Go ahead and cd into the new rakudo directory. There are a few setup things that you'll want to do. First of all, go ahead and build Rakudo, using the normal steps:

    perl ./Configure.pl --gen-parrot

    make

    make install

That will pull a copy of NQP and Parrot, and make sure that everything is working okay to begin with. Now that that's done, you'll want to add the new perl6 to your system $PATH environment variable.   Which, if you don't know how to do it -- well here's Google. In particular, you'll need to add the full path to the rakudo/install/bin directory.

There's a couple more things you'll want to do now. First of all:
    make spectest
You don't have to run the full tests now, but let it download the roast repository into your t/spec before hitting ^C. You will need these tests later to make sure you didn't break anything.

Next, you'll want to set up a link back to the main Rakudo repository, so you can pull changes from there. So do:
    git remote add upstream git://github.com/rakudo/rakudo.git

You'll also want the module installer, Panda. Now, obviously, you shouldn't add anything to Rakudo that depends on an outside module. But Panda is the one piece of software you really don't want to break, ever. People will still want to be able to download modules even if functionality changes. We will have to go through a deprecation cycle if you intentionally change something to cause Panda to start failing its tests. So to download and install it:

    git clone git://github.com/tadzik/panda.git

    cd panda

    perl6 bootstrap.pl

This will set up Panda's dependencies, and test all of those modules. The bootstrap script will tell you a path to add to your $PATH environment variable -- add it too, so that panda will run from anywhere.

Finally, you really should set up a new branch to work on, so you can switch back to a working Rakudo if you need to. Move back into the rakudo directory and run:
git checkout -b mynewbranchname

A very short overview of the source

Now that all the setup is done, let's take a quick look around. Most of what we build into Perl 6 lives in the rakudo/src folder, so this is where you'll want to edit the contents.

The vm directory contains files specific to the virtual machines Rakudo runs under. At this time of this writing, there's only one thing in there, parrot, but very soon there will also be a jvm directory. Exciting! Most of the purpose of this code is to map functions to lower-level operations, in either Parrot or Java.
The Perl6 directory contains the grammar and actions used to build the language, as well as the object metamodel. The contents of this folder are written in NQP, or Not Quite Perl. This section determines how the language is parsed.
The core directory contains the files that will be built into the core setting. You'll find classes or subroutines in here for just about everything in Perl: strings, operators like eq, filehandles, sets, and more. Individual files look similar to modules, but these are "modules" that are available to every Perl 6 program.
The gen directory contains files that are created in the compliation process. The core setting lives here, creatively named CORE.setting. And if you look at it, it's just a concatenation of the files in core, put together in the order specified in rakudo/tools/build/Makefile.in. While these files can and do get overwritten in the build process, it's often a good idea to keep a copy of CORE.setting open so you can find what you're looking for faster -- and then go edit it in core.

Let's start hacking!

Now's the time to start changing Rakudo. Have the appropriate amount of fun! Be sure to commit functioning changes occasionally, so that you can git bisect for problems later. And push your edits to Github as a free backup. If you get stuck, drop by #perl6 on irc.freenode.net and ask questions.

If it's your first time, you have to fi^W^W^W^W you will probably make a lot of mistakes. I know I did on my first project, as explained in detail in a previous post. But I promise you, the learning curve is surprisingly easy, and your compiler-fu will increase to fuchsia-belt level in no time. (What? We're not just giving black belts away... and Camelia likes fuchsias.)

Testing and Specs

When you think you're finished with your code, the first thing you should do is merge in the upstream rakudo, and rebuild:

    git fetch upstream

    git merge upstream/nom

    perl Configure.pl

    make

    make spectest

The spectests will make sure that you didn't accidentally break the codebase. You should pass, or at least not fail worse than the current roast data.

You should add your own tests into the roast repository about now. You do have unit tests, right? Writing tests is "optional", just like brushing your teeth -- you don't have to do it, but if you never do it you're in for a lot of pain later. Here's a fine and elegantly crafted hyperlink to S24 (Testing) for reference.

When editing a file that already exists in roast, you may need to fudge the tests for Niecza and Pugs. This tells us "we know the test failed or fails to parse, nothing has changed". Just add lines like the following above broken tests:

    #?pugs 1 skip 'reason'

    #?niecza 1 skip 'reason'

The "1" is actually the number of tests you want to skip, but really, look at the README in roast for more details.

If you want to add a whole new test file, you'll need to add it into rakudo/t/spectest.data. If your code fixes broken tests, then you'll want to *unfudge* by removing the #?rakudo skip lines above the relevant tests.

You should also test that Panda is still working. Since you'll have to rebuild panda after recompling Rakudo anyway, just check the rebootstrap for test failures:
perl6 panda/rebootstrap.pl

Commiting to Rakudo

The easiest way to get your code merged is to push it back to Github, and then send a pull request into Rakudo. If you're really committed to committing, consider sending in a Contributor License Agreement to The Perl Foundation. This makes you eligible for a commit bit to push directly to the Rakudo repo.

If there's a problem, someone will get back to you pretty fast on the Github issues page. Hopefully, these problems will be easy to fix, and a standard git commit; git push will add it to the ticket. If there aren't any problems, someone will just merge it in a couple days.

Huzzah! \o/ A Rakudo Hacker is you!

Implementing IO::Path in Rakudo

I started off implementing the File::Spec module for Perl 6, as explained in the last blog post, but what I really wanted to do was to get some sanity in working with paths through IO::Path objects. And if I was going to do this, I needed to actually edit the core modules of Rakudo.

Starting with a module which does stringy operations on directory paths, I set out with the goal of making some sort of easy-to-use, path manipulation class in the core. Something like how Path::Class works in Perl 5. Then I looked at S32::IO, and realized that IO::Path was exactly what I was seeking. But it was only partially implemented, and only for POSIX.

So I'm going to walk through the steps I took to integrate multiple-OS path support into Rakudo, in the hopes that it will help other people to avoid my mistakes. Which were fairly numerous. :/

This was my first foray into hacking a compiler, and I must confess it was fairly intimidating. I'm no script kiddie, but I hadn't worked on any large open-source projects before. However, the setting is made of Perl 6 code, so it was more or less a matter of integrating what FROGGS and I had already written.

Baby's first steps

I had played around with using File::Spec as a backend to an IO::Path in the IO::Path::More module, so it became clear that this was the best way forward for Rakudo.

I realized right away that IO::Path's interface would have to change to include systems with a concept of volume. I did a small edit to IO.pm to add a $.volume attribute, changed a few lines of code in sub dir, and compiled. Everything worked. I sent a pull request into Rakudo, just to get the interface-changing out of the way first. It tested okay, and was accepted. Wow, I'm good at this!

Naturally, it all got worse from there.

Problem 1: Biting off more than I can chew

The next step was to add the File::Spec modules into the core. So I just started by copying over the .pm files into the core directory. Unlike in normal Perl code, the modules aren't included with use Module;. Instead, I edited the Makefile.in to add the modules in the correct order.

Since the File::Spec object needs to inherit from the subclasses, and the other subclasses inherit from File::Spec::Unix, I went with this order: File::Spec::Unix, File::Spec::Win32, File::Spec::Cygwin, File::Spec. Now, some of you might already see a problem with that. To them, I say: "Shh! No spoilers!"

I realized that I couldn't use the file-scoped class definition (class Foo;) if it was going to end up all in one file, so I switched those out for curly braces. Then I rebuilt the makefile and compiled rakudo.

That generated a whole mess of errors. And not all nice errors like before -- Exception.pm hadn't even loaded yet! This was a bunch of nqp recursion errors. I tried scaling back a little at a time, even commenting out the entire inside of classes, but I still had issues building Rakudo. Eventually, I had to scale back my approach, and add files to the build one at a time.

Problem 2: Inheritance

It turned out that each file in my additions had it's own unique problems. The first was well, it seemed like File::Spec::Unix just, well, disappeared. Unless I completely removed the File::Spec class, and then it worked.

When you declare a subclass, you're actually adding to the main class' package. So File::Spec::Unix is really File::Spec.WHO<Unix>. So if you initialize File::Spec after File::Spec::Unix, it nukes the previous package and its symbol table. This problem was a lot of no fun to figure out, and I'm glad moritz++ and jnthn++ walked me through it.

The solution here was simple enough -- stub out File::Spec with class File::Spec { ... } before creating File::Spec::Unix. This is enough to make sure File::Spec will be able to refer to its children.

Although... the last thing I need is some yahoo doing my class File and then complaining about why they can't load File::Spec. So it was at this time I decided to change File::Spec to IO::Spec. Making a File class I can see -- if you decide to replace class IO, then you deserve what you get.

Problem 3: The language is in the process of building

The setting may feel like normal Perl code, but it's not. It's still in the process of being built. It's like a house in the process of construction. If there's only a wood frame, you can still hang a portrait on the "walls" -- but this will only get in the way when it's time to hang the drywall. Things need to come together in the correct order.

I encountered these problems in a couple of different ways. The first was in using rx// to precompile some patterns before Regex.pm was loaded. Windows-style paths really need this for readablity, because they use both kinds of slashes as separators and the concept of volume is fairly complicated. I tried a lot of different ways of formatting, each of which made the build fail in new and unique ways. Then I discovered MAKE_REGEX() had loaded a bit before the IO modules. This particular problem seemed to be solved.

The next couple of problems were caused by $*OS not being in scope at build time, as terms.pm was way, way down at the end. It works just fine in method calls, but if it's needed as a class attribute, it's simply not in scope when you're building the class. I ended up replicating the same op used in terms.pm to get the kernel string, so I could have it available earlier. Early enough to figure out which subclass of IO::Spec to use for the main object.

So remember, object building happens right away, but subs and methods can carry references to things that happen later.

Problem 4: `Br`eaking `Pa`nda

Everything seemed like it was working pretty okay at this point. Until I got to the day of the masakism IRC seminar. It was at this point that tried to install a module for the class, so I couldn't help but notice that Panda seemed to die horribly. I checked out and built the nom branch to use for the duration, but I really had no idea what was going on.

When I golfed the breakage in Panda, it came down to its "use lib" line -- and lib.pm is shockingly simple. Running use lib 'foo' in the REPL alternated between three different errors from the NQP level. Something was seriously wrong.

My only choice was to work backwards, and see what was causing the problem. I would say git bisect here, but I hadn't actually been making enough commits to effectively get at the problem. So that was the first learning experience here -- commit any time you think you have functional code.

Anyway, it took a lot of edits, and I got most of the way through a novel while waiting for Rakudo to recompile, but I eventually traced it back to the precompiled regexes that were giving me a problem earlier. At this point I was about to give up, and make long, ugly regexes. Finally jnthn++ noted that Regex.pm hadn't loaded when this was trying to run, so I should just move all of the IO modules to later in the build.

So I swapped back in the rx// syntax, and naturally, it all worked. The lesson here is that running some real software can pick up bugs (although the spectests would have shown it too). And that you really do need to make sure that dependencies come earlier. And most of all, if you're stuck, just ask in #perl6.

Spectesting

The methods I developed for IO::Path::More to IO::Path went in painlessly. I ended up writing an additional set of methods for IO::Spec -- .split and .join, to replace .splitpath and .catpath but with basename and dirname syntax. That allows IO::Path.basename to always have the current item in question, and all of the trailing slashes are gone.

It was at this point where I started thinking about testing. IO::Spec had literally hundreds of tests from File::Spec in Perl 5. But ironically, IO::Spec wasn't actually specced. So the question became, should IO::Spec be just a backend, or a fully specified part of Perl 6?

Implementations in Perl 6 are supposed to inform the spec, as well as the other way around. And the more I thought about it, *something* has to do the low-level string operations on paths. And there is no reason to hide it, either. Rakudo already provides access to all of its lower layers via nqp or pir ops, so it made sense to include it as a specced part of Perl 6.

So I went ahead and edited the Specification for S32::IO, adding IO::Spec and several methods for manipulating IO::Paths. Lots of text. And then even more went into writing tests for IO::Path. Naturally, these uncovered some more minor bugs, but that's what tests are for.

Patch Approved

It didn't take all that long for my pull request to get merged, especially after I started writing tests. This whole process took about three weeks. I'll have a few minor cleanup I'm going to have to do in the next couple of days, as I resolve a bug in using IO::Spec::Unix.rel2abs. Parrot just added a readlink op on my request, so IO::Path.resolve should be working soon.

And what we have to show for all this work is a Perl 6 implementation that does file path modifications on Linux, Cygwin, or Windows/DOS.

    On Linux:
    "/foo/./bar//"\   .path.cleanup.parent;  #yields "/foo"
    On Windows: 
    "C:/foo\\.\\bar\\".path.cleanup.parent;  #yields "C:\foo"
    On any platform:
    IO::Path::Win32.new("C:/baz").volume;    #yields "C:"

I never finished VMS, or Mac Classic, but at this point, they can just be dropped in, by adding a new IO::Spec subclass.

So there it is, at long last: sanity in file paths in Perl. I think, if I had known how it was going to go from the beginning, I would have been even more intimidated. Even so, it was just the same kind of debugging I'm used to in modules. Only without the safety rails of the parser and a much longer build time.

But if you can write a module and a class in Perl 6, you already have most of the skills to contribute to the setting. A compiler with internals that feel like a high-level scripting language: that, like digital watches, is a pretty neat idea.

Monday, May 6, 2013

Porting a Module to Perl 6

CPAN is a huge draw for Perl 5, with approximately umpteen zillion modules available for a wide arrangement of purposes. It's probably the biggest draw for the Perl 5 language these days, given the newer, hipper scripting languages out there like Ruby, Python, and of course INTERCAL.

The problem is, these modules are not yet usable in Perl 6 directly. There is an ongoing project to allow Perl 5 code to run in Rakudo, but so far only the most basic code works: like basic loops, quite a few builtins, backticks, etc. It does inherit from the Perl 6 object system, which is pretty cool, so $foo->WHAT can tell you if you have a Str, Int, or IO::Handle.

So for right now, the only practical way to use Perl 5 modules is to rewrite them in Perl 6. I just finished porting the File::Spec module, one of Perl 5's core modules, to help deal with file paths on different operating systems. FROGGS++ did much of the initial work on it, but he's moved on the P5 in P6 project mentioned above, so I picked up the slack. The end goal of the project is for me to integrate functionality like Perl 5's Path::Class into the core language, so that OS interoperability comes naturally when using the native functions.

As I got further into the port, I have been convinced that porting the module is a much better choice than relying on the Perl 5 code being integrated. There are several reasons for this:

Code Cruft

There is a lot of support for operating systems that are now out of date. This isn't a bad thing. I'm sure that there's some hobbyist who will want to run Perl 6 on their OS/2 Warp system. The problem comes when you look inside the code for the OS2 module:
$path =~ s/^([a-z]:)/\l$1/s;
This little no-op snippet from canonpath (to produce the canonically correct path) converts a lowercase drive letter to lowercase. It's not harmful, but it does illustrate the fact that no one has edited this code in 9 years.

This isn't the fault of the Perl 5 Porters -- they have plenty of better things to do than to support outdated OSes when not even bug tickets are coming in. But translating the code sure gives a great opportunity to notice these problems.

In the end, I ended up cutting the entire OS2 module and delegating to Win32.pm, because it had support for things like UNC paths (//server/share) that OS2.pm had only half-implemented. And so a huge block of code cruft bit the dust.

Readability and Maintainability

Part of the reason these issues happen in the first place is that it's harder to see what's going on in a given piece of code.

An example I came across was in this helper for tmpdir, a method to return the first temporary directory that's writable in a list of parameters. In Perl 5, we get:

sub _tmpdir {

    my $self = shift;

    my @dirlist = @_;

    my $tmpdir;



    foreach (@dirlist) {

    next unless defined && -d && -w _;

    $tmpdir = $_;

    last;

    }

    return $self->canonpath($tmpdir);

}

That's actually good, idiomatic code for Perl 5, though it can look like spooky action at a distance if you're not aware of what's going on with $_, @_, and shift.

Equivalent code in Perl 6 looks like this:

method !tmpdir( *@dirlist ) {

    my $tmpdir = first { .defined && .IO.w && .IO.d }, @dirlist;

    return self.canonpath($tmpdir);

}

No messing about with parameters and keeping track of the object -- it all happens in the signature. You no longer have to read through a loop to understand the code either -- in Perl 6 you can just say that you want the first matching candidate, and first() will lazily test the list for you.

The P6 version gets to the point much faster, and it's much closer to natural language: "set $tmpdir to the first defined writable directory in @dirlist." Less, easier to read code is easier to maintain.

Changing Old Features

At some point, your code was working perfectly and passes all the tests. But then the computer world changes around you, and it no longer makes any sense. And you would like to refactor, but people rely on the old functionality.

This is exactly what happened for File::Spec's case_tolerant function. It essentially looks at the operating system alone, and uses that to determine if the filesystem is case-sensitive. Which in the old days made perfect sense when Macs used HFS+, Windows used FAT, and Unix used ufs or a variant. But my computer runs Mac OS X and Windows and has several drive partitions in different formats. Heck, the NTFS drives are case sensitive in POSIX-land, but as soon as I boot Windows they become case insensitive.

The only reasonable way to check this now is to actually check the filesystem for a specific directory, given widespread support for symlinks. This breaks the old functionality. But there's no time like a major language revision to break old APIs and replace them with shiny new ones.

However, there are a couple of major downsides to porting:

This is really time-consuming

Sure, you don't have to implement the algorithm from scratch, and you have plenty of tests to help your development. It would be possible to just translate the existing code, because things aren't that different. Change an if( $foo ) to if $foo, etc.

However, a major reason for doing the porting is to use the Perl 6 idioms instead, especially in function declarations and regular expressions where it makes a major difference in code readability.

Dependencies aren't available

Sometimes your code relies on separate modules not available, or on not yet implemented functions. Your choice becomes to either implement the functionality yourself and embark on yet another yak-shaving expedition, or mark it as todo and wait for the appropriate functionality to arrive.

This has become a much smaller problem as of late as the core language matures. But "done enough" is not really "done".

Now that I've written this, I've realized that my own project is a microcosm of the Perl 6 saga. Making a better codebase takes a lot of time, but it ultimately seems worth the effort.

Of course, once I had gotten this far, I realized that File::Spec -- or something very much like it -- would be needed to implement IO::Path objects for non-unixlike OSes. So stay tuned for the next part in this saga: How to add File::Spec to Rakudo.

Update: It ended up turning into two posts: One was a simple guide on How to Start Hacking Rakudo Perl 6, and the other covered my follies in trying to add to the compiler for the first time. But the short story is that IO::Path is now added to Perl 6 and implemented in Rakudo -- this means that both File::Spec and Path::Class' behavior are now available in the core language without adding modules.