Scrapheap Challenge at OOPSLA 2005

Ivan Moore and I ran Scrapheap Challenge, a workshop in Postmodern Programming at OOPSLA 2005 last month.

The workshop explored the usability of software components. Secretly it was about reuse, but the term "reuse" has become a dirty word - in my experience software written to be reusable is usually useless and the effort of writing reusable components is mostly wasted.

There's lots of existing software out there and lots of it is being used by lots of different people who are quite able to build applications with it without starting enterprise reuse projects. Instead of "reuse", we think it's more productive to think about what makes a software component useful and how to use software components by actually writing software and reflecting on the experience. We hoped that would give us insight into how we can build systems from existing software and better write pieces of software that other people can use.

It certainly did.

I'm swamped by the email I receive. To help me manage it, I want a tool that notifies me of any messages asking me a question that I haven't answered in three days or more.
Write me a tool that solves the daily Sudoku puzzle in the Daily Telegraph newspaper so that I can look really clever in front of other commuters when taking the train in to work.
Write an "integrationometer" that graphically displays how far the code in my local development workspace has diverged from that in the team's source code repository.

The participants were excellent. Different pairs managed to get a solution to all three challenges. The group had a different backgrounds and used a variety of technologies so that both successful and unsuccessful attempts generated a lot of interesting comparisons.

I expected that the winning solutions to use a dynamic language to compose objects from well-packaged libraries and maybe use some REST style web services. That's not exactly what happened.

Winning pairs did use a dynamic language with an interactive interpreter that allowed them to explore the available APIs, and winning solutions did use some nicely packaged libraries. But the solutions to two out of the three challenges used Unix (or Cygwin) pipelines to compose existing components and used Python to write small filters to provide functionality not met by existing components. The other used GreaseMonkey to plug a bit of JavaScript into the client-side of an Google Mail.

Here's my interpretation of what seemed to work well:

Examples over Documentation: Attempts to use components were helped not by documentation but by the availability of example code that could be copy-and-pasted into the application and then tweaked to fit the situation. In some cases documentation was actually a hindrance. Misleading documentation found through Google led one team down a blind alley trying to use an inappropriate library that didn't actually do what was required. Another team tried to use a component that had huge amounts of well written documentation that was entirely useless and so slowed them down because they had to wade through a lot of text before finding out that they were wasting time. A team using components that had been developed test-first found the tests to be useful both as documentation and a good source of example code.

Source Code over Binary Components: The ability to look inside a component to understand what it did was a big help. No matter how well documented components were, participants turned to the source when it came to the crunch.

Loosely Structured Data over Highly Structured Data: Two out of the three winning solutions used Unix pipelines and Python's basic I/O and string processing APIs to slice and dice data. Attempts to use highly structured data, such as XML did not work well. In fact, one team explicitly removed markup from input data to make it easier to parse with simple string processing functions. However, one pair failed to get far with Visual Basic because they had difficulties connecting two components that each used a different binary representation of text strings. The sweet spot seemed to be semi-structured text with no explicit markup and a common underlying encoding. I think there are two reasons for this. Firstly, if the structure of data that is required by consumers is not exactly the same as that provided by the producer, the structure just hinders extracting the required information. Secondly, when working iteratively it's a big help if you can look at data that you have to process and visually see the structure. Markup usually obfuscates the structure so that you have to render the document before you can see it.

Dynamic Typing over Static Typing: Pairs using a statically typed language got bogged down solving type compatability problems, getting the right versions of class libraries installed and so forth. Pairs using dynamically typed languages were able to experiment with their partially implemented programs and grow their solutions bit by bit as they learned about the problem and technologies.

Focused Components over Frameworks: It was easier to combine components if they were self contained, did one thing well and did not expect the application to be designed a specific way.

Compose Components over Modify Existing Applications: It was easier to compose focused components into a solution with a bit of scripting glue than it was to take an existing application that seemed to do most of what was required and then try and change it fit. Modifying an existing application required understanding how the entire application worked, but writing some glue code that coordinated components was simpler because each component was simple and programmers didn't have to understand how all the components worked.

Rich Component Library over Programming Tools: Solutions using Unix pipelines beat solutions using Java or Visual Basic because Unix (or Cygwin) already comes with a huge set of existing components. It didn't matter that the Unix programmers had to use Vi and so didn't have all the assistance that Java IDEs provide.

Actual Capabilities over Intended Use: The successful solutions involved quite a bit of lateral thinking. Components were used for what they could do, not for what the auther intended them to be used for. For example, one solution used a terminal-mode web browser not for interactive browsing but to strip HTML tags from a document to make it easier to extract required data from the text.

Simplify the Problem over Functional Areas: A successful approach was to use components to simplify the problem to the point that it could be addressed with a bit of custom code. A textbook approach to design would be to divide the program into modules that perform easily identifiable tasks. For example, a program to solve a Sudoku puzzle on a web page might be designed as four modules that individually download the HTML document, parse the puzzle from the HTML, solve a Sudoku puzzle and display the solution. The successful solution to that puzzle instead used the terminal-mode web browser to both download the HTML and render the HTML into raw text. The web browser component straddled two functional areas and made it easy to extract the puzzle from the web page with simple string manipulation functions.

Obviously this is not an exhaustive list. It is also biased by the choice of problems, the mix of participants and the small size of the workshop. I'm sure that an experienced Java programmer who had a lot of class libraries installed on their machine could have solved the problems in the time given. We're going to run the workshop a few more times to better understand the results.

Mistaeks I Hav Made

Scrapheap Challenge at OOPSLA 2005