Wednesday, May 8, 2013

Implementing IO::Path in Rakudo

I started off implementing the File::Spec module for Perl 6, as explained in the last blog post, but what I really wanted to do was to get some sanity in working with paths through IO::Path objects.  And if I was going to do this, I needed to actually edit the core modules of Rakudo.

Starting with a module which does stringy operations on directory paths, I set out with the goal of making some sort of easy-to-use, path manipulation class in the core.  Something like how Path::Class works in Perl 5.  Then I looked at S32::IO, and realized that IO::Path was exactly what I was seeking.  But it was only partially implemented, and only for POSIX.

So I'm going to walk through the steps I took to integrate multiple-OS path support into Rakudo, in the hopes that it will help other people to avoid my mistakes.  Which were fairly numerous. :/

This was my first foray into hacking a compiler, and I must confess it was fairly intimidating.  I'm no script kiddie, but I hadn't worked on any large open-source projects before.  However, the setting is made of Perl 6 code, so it was more or less a matter of integrating what FROGGS and I had already written.

Baby's first steps

I had played around with using File::Spec as a backend to an IO::Path in the IO::Path::More module, so it became clear that this was the best way forward for Rakudo. 

I realized right away that IO::Path's interface would have to change to include systems with a concept of volume.  I did a small edit to to add a $.volume attribute, changed a few lines of code in sub dir, and compiled.  Everything worked.  I sent a pull request into Rakudo, just to get the interface-changing out of the way first.  It tested okay, and was accepted.  Wow, I'm good at this!

Naturally, it all got worse from there.

Problem 1: Biting off more than I can chew

The next step was to add the File::Spec modules into the core.  So I just started by copying over the .pm files into the core directory.  Unlike in normal Perl code, the modules aren't included with use Module;.  Instead, I edited the to add the modules in the correct order.

Since the File::Spec object needs to inherit from the subclasses, and the other subclasses inherit from File::Spec::Unix, I went with this order:  File::Spec::Unix, File::Spec::Win32, File::Spec::Cygwin, File::Spec.  Now, some of you might already see a problem with that.  To them, I say: "Shh!  No spoilers!"

I realized that I couldn't use the file-scoped class definition (class Foo;) if it was going to end up all in one file, so I switched those out for curly braces.  Then I rebuilt the makefile and compiled rakudo.

That generated a whole mess of errors.  And not all nice errors like before -- hadn't even loaded yet!  This was a bunch of nqp recursion errors.  I tried scaling back a little at a time, even commenting out the entire inside of classes, but I still had issues building Rakudo.  Eventually, I had to scale back my approach, and add files to the build one at a time.

Problem 2: Inheritance

It turned out that each file in my additions had it's own unique problems.  The first was well, it seemed like File::Spec::Unix just, well, disappeared.  Unless I completely removed the File::Spec class, and then it worked.

When you declare a subclass, you're actually adding to the main class' package.  So File::Spec::Unix is really File::Spec.WHO<Unix>.  So if you initialize File::Spec after File::Spec::Unix, it nukes the previous package and its symbol table.  This problem was a lot of no fun to figure out, and I'm glad moritz++ and jnthn++ walked me through it.

The solution here was simple enough -- stub out File::Spec with class File::Spec { ... } before creating File::Spec::Unix.  This is enough to make sure File::Spec will be able to refer to its children.

Although... the last thing I need is some yahoo doing my class File and then complaining about why they can't load File::Spec.  So it was at this time I decided to change File::Spec to IO::Spec.  Making a File class I can see -- if you decide to replace class IO, then you deserve what you get.

Problem 3: The language is in the process of building

The setting may feel like normal Perl code, but it's not.  It's still in the process of being built.  It's like a house in the process of construction.  If there's only a wood frame, you can still hang a portrait on the "walls" -- but this will only get in the way when it's time to hang the drywall.  Things need to come together in the correct order.

I encountered these problems in a couple of different ways.  The first was in using rx// to precompile some patterns before was loaded.  Windows-style paths really need this for readablity, because they use both kinds of slashes as separators and the concept of volume is fairly complicated.  I tried a lot of different ways of formatting, each of which made the build fail in new and unique ways. Then I discovered MAKE_REGEX() had loaded a bit before the IO modules.  This particular problem seemed to be solved.

The next couple of problems were caused by $*OS not being in scope at build time, as was way, way down at the end.  It works just fine in method calls, but if it's needed as a class attribute, it's simply not in scope when you're building the class.  I ended up replicating the same op used in to get the kernel string, so I could have it available earlier.  Early enough to figure out which subclass of IO::Spec to use for the main object.

So remember, object building happens right away, but subs and methods can carry references to things that happen later.

Problem 4: Breaking Panda

Everything seemed like it was working pretty okay at this point.  Until I got to the day of the masakism IRC seminar.  It was at this point that tried to install a module for the class, so I couldn't help but notice that Panda seemed to die horribly.  I checked out and built the nom branch to use for the duration, but I really had no idea what was going on.

When I golfed the breakage in Panda, it came down to its "use lib" line -- and is shockingly simple.  Running use lib 'foo' in the REPL alternated between three different errors from the NQP level.  Something was seriously wrong.

My only choice was to work backwards, and see what was causing the problem.  I would say git bisect here, but I hadn't actually been making enough commits to effectively get at the problem.  So that was the first learning experience here -- commit any time you think you have functional code.

Anyway, it took a lot of edits, and I got most of the way through a novel while waiting for Rakudo to recompile, but I eventually traced it back to the precompiled regexes that were giving me a problem earlier.  At this point I was about to give up, and make long, ugly regexes. Finally jnthn++ noted that hadn't loaded when this was trying to run, so I should just move all of the IO modules to later in the build.

So I swapped back in the rx// syntax, and naturally, it all worked.  The lesson here is that running some real software can pick up bugs (although the spectests would have shown it too).  And that you really do need to make sure that dependencies come earlier.  And most of all, if you're stuck, just ask in #perl6.


The methods I developed for IO::Path::More to IO::Path went in painlessly.  I ended up writing an additional set of methods for IO::Spec -- .split and .join, to replace .splitpath and .catpath but with basename and dirname syntax.  That allows IO::Path.basename to always have the current item in question, and all of the trailing slashes are gone.

It was at this point where I started thinking about testing.  IO::Spec had literally hundreds of tests from File::Spec in Perl 5.  But ironically, IO::Spec wasn't actually specced.  So the question became, should IO::Spec be just a backend, or a fully specified part of Perl 6?

Implementations in Perl 6 are supposed to inform the spec, as well as the other way around.  And the more I thought about it, *something* has to do the low-level string operations on paths.  And there is no reason to hide it, either.  Rakudo already provides access to all of its lower layers via nqp or pir ops, so it made sense to include it as a specced part of Perl 6.

So I went ahead and edited the Specification for S32::IO, adding IO::Spec and several methods for manipulating IO::Paths.  Lots of text.  And then even more went into writing tests for IO::Path.  Naturally, these uncovered some more minor bugs, but that's what tests are for.

Patch Approved

It didn't take all that long for my pull request to get merged, especially after I started writing tests.  This whole process took about three weeks.  I'll have a few minor cleanup I'm going to have to do in the next couple of days, as I resolve a bug in using IO::Spec::Unix.rel2abs.  Parrot just added a readlink op on my request, so IO::Path.resolve should be working soon.

And what we have to show for all this work is a Perl 6 implementation that does file path modifications on Linux, Cygwin, or Windows/DOS.
    On Linux:
    "/foo/./bar//"\   .path.cleanup.parent;  #yields "/foo"
    On Windows:
    "C:/foo\\.\\bar\\".path.cleanup.parent;  #yields "C:\foo"
    On any platform:"C:/baz").volume;    #yields "C:"

I never finished VMS, or Mac Classic, but at this point, they can just be dropped in, by adding a new IO::Spec subclass.

So there it is, at long last: sanity in file paths in Perl.  I think, if I had known how it was going to go from the beginning, I would have been even more intimidated.  Even so, it was just the same kind of debugging I'm used to in modules.  Only without the safety rails of the parser and a much longer build time.

But if you can write a module and a class in Perl 6, you already have most of the skills to contribute to the setting.  A compiler with internals that feel like a high-level scripting language:  that, like digital watches, is a pretty neat idea.

No comments:

Post a Comment