le petit programmateur

Sunday, July 15, 2012

Debugging with a dishwasher

Debugging, like washing the dishes or doing the laundry, is one of those things that programmers hate doing but just have to do anyway¹. Fortunately, just as in household chores, applying the right tools --- gdb, washing machine, dishwasher --- go a long way in ameliorating our pain. In our daily lives there (sadly) are a myriad things to clean, each with its own washing tool, yet when it comes to debugging the myriad of programming languages and paradigms, we seem to think the same strategy of state inspection at user-specified execution breakpoints applies equally to everybody. Not good.

For instance, suppose a complex nested SQL query fails to produce a tuple it should have and we want to know why. In this case it is not very helpful to step through the source lines of the query, or even its sub-queries. We already know the sequence of execution: each operation (join, select, filter, sort, etc) is going to be executed exactly once, presumably in bottom-up order. What we want to know is where the tuple got lost. Can we inspect the intermediate states? It depends, if the queries are large and complex enough that would not be too easy either.

Another example is Haskell, the quintessential functional language. Functional languages often eschew the loops and branches of imperative languages in favor of recursion and pattern matching. Since unlike loops and branches, related patterns do not need to be defined in the same syntactic locale, tracing through the execution of a Haskell program will send the programmer jumping around in the source file, with little hope of knowing what to expect next. This is also true for Prolog debuggers that signal when a rule fires. Additionally, Haskell possesses an incredibly powerful and complex type system whose type declarations almost constitute its own Turing-complete sub-language. One of the biggest frustrations novices face are the cryptic error messages Haskell emits whenever they get one of these type declarations wrong, a problem debuggers based on breakpoints can do little to help. Some of these issues may explain why despite the availability of more traditional debuggers like Hood, Haskell programmers still find a testing tool like QuickCheck to be their weapon of choice.

Perhaps even more poignant examples come from DSLs. How would we run a debugger on an Ant script or a C++ make file? What about a Antlr grammar that contains an ambiguity? Or a Django template that renders the wrong page? Skeptics will now say we should stop coddling programmers with debuggers and tell them to read their code carefully. Right. With a large enough code base and without the right tool support, we find ourselves back to scrubbing the bathroom floor with a toothbrush.

So are we using the dishwasher to do the laundry?

1. That said, some of the most brilliant hackers who seemingly never need debug their code also seemingly never do any of these other things. Which is why we give them their own smell-proof cubicles.

Monday, May 7, 2012

Installing wxRuby for Ubuntu

WxRuby is required for the Piping and Instrumentation Diagram (Piping) example in Enso, but it is no mean feat to install this Ruby gem under Ubuntu. The blame might possibly be split between a project that was last updated in 2009 and a distro that is notorious for using unusual packages. This post is a record of what I did, primarily as a personal memory aid. If you found this document useful, or if know how to improve it, leave a comment.

1. Installing the pre-compiled gem does not work
The latest pre-compiled gem I found on the wxruby website is version 2.0.1, dated 2009. WX libraries appear to have changed since then (I don't know which version the gem was compiled under) so attempting to use this will result in a symbol error.

LoadError: /var/lib/gems/1.9.1/gems/wxruby-ruby19-2.0.1-x86-linux/lib/wxruby2.so: symbol _ZN16wxStyledTextCtrl7SendMsgEill, version WXU_2.8 not defined in file libwx_gtk2u_stc-2.8.so.0 with link time reference - /var/lib/gems/1.9.1/gems/wxruby-ruby19-2.0.1-x86-linux/lib/wxruby2.so

2. Required dependencies
To compile wxRuby, you will need:

WxRuby sources from their website (link)
Wx Gtk libraries (libwxgtk2.8-dev)
SWIG 1.3.38, a C++ wrapper (link, version 1.3.38)
Development versions for:

GLib (libglib2.0-dev)
Pango (libpango1.0-dev)
Gtk2 and Gtk3 dev libraries (libgtk2.0-dev, libgtk-3-dev)

If you are installing wxRuby for Enso, you will need the Ruby 1.9 version.

Note also that you need a fairly specific version for SWIG, which I needed to compile from source. Make sure to set up the path to run SWIG.

3. Environment variables
We want to ignore all the references to the openGL library, which are not availables in unicode version.

export WXRUBY_EXCLUDED=GLCanvas

WxRuby requires you to provide a version number, but as usual things are never easy:

~/Desktop/wxruby-2.0.1$ rake gem

rake aborted!
Cannot build a package without a version being specified
Create a version by running rake with WXRUBY_VERSION=x.x.x

~/Desktop/wxruby-2.0.1$ export WXRUBY_VERSION=2.0.1
~/Desktop/wxruby-2.0.1$ rake gem

rake aborted!
can't modify frozen String

Thankfully Ruby is a programmer-friendly scripting language. A quick trace locates the line in Ruby's source you can change to make things work:

/usr/lib/ruby/1.9.1/rubygems/version.rb:
190,191c190,191
<     @version = version.to_s
<     @version.strip!
---
>     @version = version.to_s.strip
>     #@version.strip!

This was the line I changed, but this is by no mean authoritative, and I immediately changed it back once I got wxRuby up and running. Caveat emptor.

4. Compiling and installing WxRuby
You may now compile wxRuby by running:

rake gem
sudo gem install

Make sure to uninstall any other version of wxruby, including the Ruby 1.8 version, before you install the new gem.

Test the installation using irb:

~/Desktop/wxruby-2.0.1$ irb
irb(main):001:0> require 'wx'
=> true

Alternatively, you may test this directly with Enso:

ruby applications/Piping/code/visualize.rb

Friday, March 30, 2012

LWC-2 + CG 2012

I've just attended Code Generation 2012 (#cg2012) and it is somewhat different from what I expected. For a start I have always only discussed MDE and DSL ideas with academics, and I have no idea what people who actually make a living building these things think. Turns out that they think quite differently from me. Which is good because this means I get to learn more things. These were incredibly smart folks with possibly more combined experience building MDE tools than the rest of the world put together so it was good just talking to them. Though I was a bit too shy since I'm a newcomer.

These are things I was specifically watching out for:

1. Tool support (for using languages).
Specifically: 1) debuggers, 2) version control, 3) refactoring
Technically hard problems. Debuggers I will blog about next time, with the appropriate passion, suffice to say I have no intention of running gdb on generated code, thankyouverymuch. But version control I will expound in more detail here.

First of all, delta-ing models is hard. No one has a clue how to do it properly, a long-held sentiment of mine that was echoed during this conference. Of the tool leaders, MPS and Essential have no allowance for version control of any kind that I know of. MetaEdit and OOmega avoid the problem by assuming all users are connected to each other through a real-time server (their tools required that). From my own experience, this seemed a bit unrealistic. More importantly, however, it still did not solve the versioning problem.

Version is not just about enabling parallel edits, it is also about maintaining revisions. Unless the svn-equivalent for MetaEdit models is going to store a complete copy of the graph for each version (which might require a data center or two), I don't see how they can do with without some way to delta models. Not only that, deltas are needed to do patching, eg for continuous integration, and refactoring. Refactoring is the other big tool that is missing, but I admit I don't really know what I am looking for in a DSL refactoring tool.

These are technically hard problems, and most tools provided great working compromises. But I think there are still some open problems there.

2. Co-evolution
The one person I spoke to on this (I can't remember who!) basically did not care. In fact he did not care with such gusto I was too embarrassed to ask anyone else. After all, if you do change your DSL/metamodel you should jolly well be expected to update your entire legacy code base written in that DSL. Otherwise you'd better make certain your DSL is backward compatible. Isn't that what they did for C? In fact even C++ (a new language) is compatible with C. If you had to break the DSL because of a requirements update and your DSL code base is 500kloc, well, tough luck.

3. Language reuse
At the lowest level, this means composition: how to build one language on top of another. At the higher level, this should include cross cutting composition and parametricity. There are two parts to composition: composing the model and composing its semantics. I will be bold and daring and claim here that syntax/parsing/graphical editing falls under "semantics". I will leave the details for another blog post, but the general feeling I got from folks was like this:

a) people recognize this is important, but no one seem to be doing anything / know what to do about it

b) they obsess over composing the concrete syntax, ie grammars, graphics, projectional, etc. Ironically, I don't really care about composing syntax. I am perfectly happy to edit two models in separate files or views so long as they behave correctly when they run. If this may sound incredibly impractical, it probably is.

c) semantic composition, which to me is the real challenge, no one seems to worry about. Code generators are not composable. Templates are probably impossible to compose. DSL-type code generation (xtend, merl) are slightly more hopeful, but the path to victory is not clear. There are more composable ways to define semantics: transformation rules, rewrite rules, attribute grammars, etc, but frankly developers don't use these things.

4. Multi-DSL environments
There was some talk about separation of concerns. To me, DSLs are all about separation of concerns. I'm not talking about the PIM/PSM stuff, but separating user interface specifications from data models from navigation by having a different DSL for each task. Aspects, if I might so abuse the term. By generating code, DSLs can implement aspects in an even more powerful and far messier way than AOP.

What I found out:

1. Data modelingThis I really did not expect. It seems that there is a fairly large group that believe "modeling"=="data modeling". From a pragmatic point of view the data modeling perspective is entirely valid. Majority of the people I spoke to doing MDE used it to generate HTML views and SQL from a data model. This is the most common use case although personally I am more interested in the general one.

2. Tool support
Formating, syntax highlighting, are already industry standard. So is constraint checking, which appears to be the generalized variant of type checking for MDE. Enso has none of these, which is an embarrassment, especially since everyone kept calling me out on it. I also admit that they have a point. Personally I think these are the easy problems we already know how to solve by chugging man-hours at it. Syntax highlighting a DSL isn't much different from a GPL. Consistency checking is a more interesting game, especially if cross-model, but support level varies between tools.

3. Learning curve
This was a learning point for me. I didn't realize how much learning curve can affect adoption. As someone who is forced to deal with meta-levels all day long it is difficult to see how challenging things can be to non-tool builders. Listening to potential MDE users complain about the "user-unfriendliness" of MPS pained me a little, especially since I considered them to be one of the most well-thought out and best built tools (yes I really like their stuff). Maybe the real takeaway is that non-MDE folks (include research folks who don't do MDE, prefer tools that are similar to what they already have, eg language extensions for C.

---
Overall, CG2012 has certainly taught me a lot. I got more insights into the problems that real users face.