Technopinions: 2006

Saturday, December 16, 2006

How I Leared to Stop PrintFing and Love Testing

Output Woes
Back when I was a young buck cutting my teeth on silicon and spare cycles, I thought testing consisted of clever, or at least abundant, use of printing to console. Such debugging has the advantage only of being easy to implement. Test maintenance, relevance, repeatability, automatic checking: all these things were rumors, mythical weapons akin to Excalibur or a Vorpal Blade. Since then, I have learned of the simplicity of breakpoints, the grace of call stacks, the depth of profilers, and, more than anything, the breathtaking wonder of Automated Tests. Lo, and suddenly the mythical became the real, and gruesome beasts were slain.

Back to my young buck days. Many times I would write some code, test and debug it, and everything seemed beautiful. I would test inline, printing variables, conditions, paths, and I was good. My tests would even self proclaim their results: "Testing megasort: SUCCESS!" or "Testman saith: ABJECT FAILURE!" Certain of my overwhelming tests, I would blot out the test code so to keep my final source pristine and move on. Later I might notice a small change that needed to be made. Then, the day before (or worse, the day after) a deadline, I noticed that everything was broken. The small change had, of course, affected something I hadn't thought about, something I had had a test for, but had stopped running and removed from the system.

"No problem" I said in my sophomoric exuberance. The next time, I didn't blot the code out. I surrounded it with the ever imaginative "if(DEBUG)" statement. Define DEBUG once for the whole system, and now I could test everything. Of course, I still had to inspect everything, and the tests remained in my code...

Automated Vigor
Thousands of iterations later I might have come up with Automated Tests. I didn't, of course; people much wiser than me forged this weapon, but I gained the benefit.

An Automated Test is a test that's run (gasp) automatically, usually at build time. Automated Test frameworks abstract tests away from the code, so we can keep our code sharp and unblemished from the smelting fires of test code. The tests report to the frameworks, rather than to standard out or an error file, and the framework, not the human, does the checking. The framework only tells us when something goes wrong. Since the tests are always run, we get notified when any tested code breaks, no matter the age. They are a powerful tool of our craft.

I began using Automated Tests when I took over the build process for a medium-sized java program. Whenever the build broke, I needed to show the developer exactly how it broke. This is when I discovered how often the same bug would pop up over and over. I would show a developer his error, and the next week, it would reappear. After introduced Automated Tests (in the form of JUnit) into the code, my life started easing up. I no longer had to recreate a test over and over for the same bug - I could simply point to the test. Then we integrated Automated Tests into our IDE, and things really started looking up. Slowly the development team started running the tests before checking in code, and gradually our productivity increased, our code was cleaner, our bugs fewer.

Automated tests are really just another extension of a Refactoring mindset: if something is worth coding multiple times, it's only worth coding once. If we're going to do it several times, we're better off just doing it once and calling it several times.

Of course, there are dark sides to anything. Automated Tests are code, and code must be maintained. It's possible to spend all our time writing tests that are repeated, or unnecessary. Typically Automated Tests are not deliverable, which might throw off statistics for line-accountants. Occasionally an ill-conceived test can send developers scampering off after ghost-bugs. However, treated with respect they deserve, Automated Tests can help us slaughter bugs and guard against our tendency to resurrect them.

Wednesday, October 04, 2006

Building Ideals

One of the failing points of the classic Waterfall Software Development Process is the separation of implementation and testing. Many processes since have tried to overcome the obvious limitations imposed: the easiest of these by far is Frequent Builds, and the most useful of these is Automated Tests. Bold claims!

Frequent Builds

We know building our code ensures that everything still compiles; it's like saying the sun makes light, or traffic today will crawl. But as our development effort grows, this seemingly obvious check becomes more important. It pays, then, to make sure the check is happening often, and more, to make sure it's happening right. Luckily, or providentially, we have more and more sophisticated auto-build tools, such as Ant, Nant, Visual Build, and so on, which take the pain of auto-building away. Combined with simple scheduling, Frequent Builds are almost trivial. Build once a week, once a day, once an hour. Take your pick.

Automated Tests

Automated Tests add a whole new level of power to us. Every time the build runs, run your tests; then for every test you write, that's one case you never have to worry about again. We won't talk now about Zen and the art of writing automated tests, we'll pretend we all already know when and how to do that rigorously. When we talk about building, we should be including our automated tests as well. If a build doesn't pass its test, it didn't build correctly.

What does it all mean?

But, even assuming everything we've said to this point is perfectly accurate and true, there are still simple questions left unanswered. What's the best way to build? How often should we build? What does a build consist of, anyway?

First thing's last!

Builds should include everything needed for the program to run. This including source code (of course), libraries, image files, and anything else needed while the program runs. This does not include extra documentation, readme files, or pictures of Chichi, the dancing skeleton.

Frequency of builds is a touchy subject. I have been ostracized by bug-eyed zealots for suggesting that builds could happen more than once a day. I have bemoaned weekly progress emails. I haven't heard of wars being fought over the subject, but I'm sure minor skirmishes have. When it comes down to it, you want your source tree to always successfully build; if processor time was nothing, we'd build whenever the source tree was updated (commit, checkin, whatever). Of course, we can't do that... yet... so we need to content ourselves with something somewhat less ambitious. For active development projects, building twice a day is probably sufficient. For any project under even mild development, builds should happen at least once a day; there's no reason for less.

Now, for the Great Act of Building, itself. What should this ritual consist of?

Start Fresh

We want this build, and all builds, to be a perfect representation of the products being built. We don't want old, mildewed code to spoil our pristine masterpiece. Therefore, we always need to ensure we're building from the latest, greatest stuff; make sure every old thing is cleaned up - everything compiled previously and everything previously retrieved from source control.

Trust and Accountability

Removing all previously built components is well understood. Mixed code components tend to make for weird errors. But remove everything previously retrieved from source control? It really depends on how much we trust our source control mechanisms. If we know that SourceSafe is always going to correctly synchronize our local build, we wouldn't worry about this. Too many things can go wrong - software bugs in source control, network failures, power failures. We need to trust that when we start building, everything we have is brand new, and nothing slipped through the wires. This way, if a build breaks in some way, we know it's not related to retrieving the source, just as we know it's not from building the source.

Avoid Conflicts

When we do get the source, we want to make sure everything we're getting is on the same footing. We don't want to start synchronizing and have someone else check in code in the middle, such that we have half of that person's updates. Ideally, of course, source control would deal with this problem, too. Again, we can't take the risk of it being wrong. Avoid these kinds of conflicts by labeling the entire source we're about to build with: use a label like "Auto-Build version 34 attempt January 6, 2003". Then perform a "Get by label" from source control. Other engineers can stick more into source control without us worrying.

Ensure Validity

Of course, we still have to build the system. A failed build tells us the system is not valid. Hopefully this doesn't happen very often. But here, also, is where we run our nice set of automated tests. The more complete the test set is, the more confidence we have that the system we build is good. Here's an important step, though: if even one test fails, we need to treat it as if the build failed in the first place. The system is not valid. This may seem extreme at first; we can worry about the 20% of the time when a fail doesn't matter once we're catching the 80% of the time when it does. At that point, our tests should be catching these boundary cases for us, and we can still expect 100% of tests passing. We want our code to be Ivy League material, after all.

Be Ready to Go

We want our build process to be as complete as possible - best case is that the build not only turns the source into a runable system, not only tests that runable system, but finally packages the system into the distribution form. Java systems can get packaged into jars and even wrapped into the installation programs. Native code can be built into the executable for and zipped up with the necessary README file. We want to be able to grab a build from yesterday, or three weeks ago, and run it just as if it were the final release. Because it might be.

Store Result

Now that our build is done, we need to keep everything in a place we can find it again. Consider your source control for this; if we're Ready to Go with our installation files, we can check them in and let source control maintain the history. Be sure to use labels with this to avoid configuration issues. At the very least, we should keep a list of our builds available and named by date and build number. With the price of storage today, redundancy is more than worth the cost tradeoff.

Minimal Cost Builds

Ultimately, we want our builds to cost very little in terms of time and space. If two builds occur before source changes, we want to store only one result. When a build is complete, we want to clean up our unneeded build materials and the source. Why did we clean up first thing, then, if we're cleaning up last thing, too? In case something went wrong and we never got to the clean up step. If we're going to clean up first then, why bother to clean up at the end? Ideally we want a pristine system before and after the build; we really want our build to just get going, rather than taking the time to clean up after an old build. Hopefully everyone will clean up after himself.

The Process

So, what then is the ideal build process?

Step 1) Clean up the build space (Start Fresh)

Step 2) Update the version number (Avoid Conflicts)

Step 3) Label the build components (Avoid Conflicts)

Step 4) Do a fresh full "Get" from source control (Start Fresh, Trust and Accountability)

Step 5) Build (Ensure Validity)

Step 6) Test (Ensure Validity)

Step 7) Package (Be Ready to Go, Store Results)

Step 8) Clean up the build space (Minimal Cost Builds)

Friday, September 15, 2006

What is a Software Process?

Definition

In the beginning, there was nothing. When the nothing was filled with problems, someone thought of how software could help. This is where Software Process begins. Software Process is the process of taking problems and solving them with software. Problems come in, software comes out. Typically, when people talk about Software Process, they mean a something like:

Software Process: A series of steps which produce software.

But

In The Literature, in business, and in practice, we who talk about Software Process mean more. We don't want "a series of steps"; such vague wording makes the lawyer and accountant in us shudder. We want very specific results from a process, things like repeatability, predictability and improvement. We want to know when our software will be done; we want to know that the software will be awesome, and we want next time to be done faster with yet more awesome software. Of course, there are many Software Processes, but despite claims, they're not all trying to deliver these same goals.

Bytes

Even so, it's no wonder that elements from any one process look similar to elements in another. Most of the time, processes cover specific tasks, like requirements gathering, software design, testing procedure, user feedback, and so on. Some processes try to hide certain steps or role them together, but when it comes down to it, the old fashion "Waterfall" process had these steps down pretty well: Discover Requirements, Design , Code and Test. Any software process is going to have to deal with these steps somehow; otherwise we're not really developing software. Let's call this stuff Software Development Process.

Beyond Bytes

Software Process includes more than that, though. Notice, our typical steps did not do anything for our extended goals. Where do we get them, then? In many ways, that depends on the process. The Personal Software Process deals with these directly. More time and attention is given to recording process step results than to defining the process steps themselves. In this process, we record everything we can: how long we take to perform every step, how many defects we found in every step, how we fixed those defects and how long that took, and how long we spent distracted by the guy who wanted to gab about TV. All this data gets recorded, compiled into a large and growing data set, and consulted whenever new work begins. Sounds menacing to me, but it buys us predictability and reliability: we can say with increasing confidence how long it will take us to complete work and how many mistakes that work will have. Some other processes try to do similar types of analysis but eliminate some of the paperwork overhead, but when it comes down to it, what PSP advocates is probably the best way to get good, hard, reliable numbers. All this has to do with measuring our performance: let's call it Software Process Tracking.

And yet, there's more! Tracking progress is never an end in itself; writing a number down always leads to a question, "How do I improve this number?" Whether we're dealing explicitly with these numbers, or implicitly within our process, improvement is a goal. We always want to deliver better software faster; the only way to do that is to improve. Usually, improvement mechanisms are codified in "Best Practices" or coping mechanisms. When processes explicitly track things, we can confirm improvement equally explicitly. Either way, let's call this Software Process Improvement.

Definition Redux

Software Process: Includes Software Development Process and may include Software Process Tracking and/or Software Process Improvement

Software Development Process: A series of steps which produce software

Software Process Tracking: A series of steps to record and analyze a Software Process which produces estimates of time, reliability, and other performance measures.

Software Process Improvement: Tasks and practices that aim to improve individual and team performance in a Software Process.

There's a lot to software processes, and not just whether they're Agile or not. In fact, none of what we've talked about deals with Agile-ness of processes; that comes later. But if we're going to develop software, and so use a Software Process, it's good to know what we're talking about and what we can get out of it.

Tuesday, August 15, 2006

Unpacking Packages

Many programming languages define ways of organizing code components into larger structures. In Java, these are packages. In C#, they are namespaces. If our application is large enough to need multiple packages, how we define those packages needs serious consideration. What are the goals of multiple packages in an application setting? What should we look for?

Goal: Conceptual Integrity

First and foremost, we want our packages to mean something, to make a Statement. The Statement should be simple and bold, and everything within the package should be geared toward that goal. Statements like "THIS is how the user interface works!" or "Interact with networks here!" are good. Statements like "talk to Databases and interact with users" are probably too complex, but we might be able to rephrase with "Let users manage databases" and be okay. Conceptual Integrity allows people looking for solutions to know if this package is right for them, and it allows everyone working on our application to know if they're developing for this package. It allows us to partition requirements into logical pieces, such that one requirement change will impact the smallest number of packages, and thus hopefully the smallest amount of code. It's probably the single most important reason to have multiple packages.

Goal: Reuse

Just like we want methods reused and classes reused, we want our packages to be reused when appropriate. We want to take that package we wrote for feature 1 and use it for feature 2. Probably this means some refactoring – the first pass may have been pretty feature specific. It's worth the effort, though:

Debugging one shared package will be much more fruitful than debugging two.

Experience and statistics tell us that repeated code will result in repeated bugs.

Testing one package is faster than testing two.

Removing a defect in one package removes it from all components that use that package.

We would love for that package to be useful in a completely different application. If we can get someone else using it, any defect they find is one we don't have to look for. Fix it, and make the package better for everyone. Don't waste time trying to write the ultimate reusable package, though. We're supposed to be writing an application, not a library.

Good Practice: Cautious Outer-App Package Dependencies

Maintenance is not just about requirements: we also have to worry about packages we depend on. Every time a package dependency changes, we need to at least run through an entire testing suite. At worst, we need to do a very big rewrite. This means we want to rely on packages that aren't going to change much. Note this doesn't necessarily apply to packages within our developing application; this means all those 3rd party packages we're trying to exploit. Make sure they're solid.

Good Practice: Avoid Dependency Cycles

Dependency cycle: package A depends on package B which depends on package C, which depends on package A. Avoid them. For one thing, as my college Brian points out, if we have dependency cycles, we're probably violating conceptual integrity in some of those packages, and so we're losing the biggest use for our packages.

Another reason cycles are bad: possible infinite testing. If we have control over all the packages involved, we can cheat here. Sadly, we don't always have that luxury. Here's the problem: Package A is updated, and package C needs to change in response. Now package B has to be checked and also gets updated. Suddenly we're back to package A – dependency cycle. Without exclusive control over everything involved, this could go on ad nauseum. The whole set, packages A, B, and C, might as well be collapsed into one big package while such a lock exists, and that huge package will not have a simple Statement. We've just broken Conceptual Integrity. Cry.

Good Practice: Package Ownership

For each package, define one person to be the "Owner". The primary responsibility is to ensure the package maintains its conceptual integrity: that it remains true to its cause. This person waves the red flag if something goes wrong with the package; this person watches for dependency cycles; this person probably writes lots of tests for the package. This person is the package's mother.

Question: Where's the glue?

"Reuse is great and all, but we can't take Bob's Database Package and use it out of the box – we have to write special code to make it do what we want to do." It's this glue code that makes an application something useful to Housewife Sally, who doesn't know or care about code.

Whoa, slow down big fella. Our packages ARE the glue code. We're not writing libraries, remember? We're writing an app. It's okay to have a package where various pieces come together; somewhere in all this, after all, we need to RUN the program. That can be a package Statement: "This package makes the program go."

Question: Dependency paths and fixing bugs

So, we've got this great package, but we found a bug. And we fix the bug. Good for us! We're even pretty sure we fixed the bug without any repercussions; our whole super testing suite is solid and passes everything just fine. This all means that packages that depend on us don't have to worry about anything, right?

Wrong. For one thing, we changed code. "That means we may have injected a defect", says someone with super clean hands. Small chance, maybe, but dependant packages can't take the risk – they're at least going to run through their test suite.

For another thing, we may have changed something circumstantial that outside packages depend on. Maybe we used to return a list sorted by last name, and now we return it sorted by first. Doesn't matter to us, all we promised was a list, but somewhere someone realized they were all sorted one way, and used that fact. Now that assumption (bad on their part) is broken, and they have to deal with it. Not our fault, per se, something they have to take into account.

Summary

Developing to packages within our application is harder than developing just to our application. Packages are intended to be reusable, which means more flexible. More flexibility means more paths of use, and in turn means more possibility of defects. This means more initial design, more development, and more testing. Why program to packages, then? All the standard, good reasons: Abstraction, Encapsulation, Information hiding, Maintainability, Understanding. Working hard to ensure we have a well architected system will save us down the road. But we need keep a good head on our shoulders - it's easy to try to make things too flexible, too general when our app doesn't need it. And the app is the reason we're here at all.

Tuesday, August 01, 2006

Password Protected

Today, passwords are like wild flowers. Take your pick, you can have as many as you want. Solely for work, I have four passwords - email, timesheet, and 2 different project repositories. Wait, make that five, access to internal machines. Purists will tell me that's ridiculous; all those (at least the first four) should be the same account, and so the same password. Perhaps, but my IT staff aren't purists; politics (ie real life) gets in the way. Besides, I don't want to complain about such things, I just want to know what my password is.

When I get home, I another four or five passwords I use regularly: (personal) email, bank accounts, the all important nethack, Ebay, B&N, ... You get the idea. I have so many passwords, I forget my last name. Add on these important Good Password Tips:

No two passwords should be the same
No password should be written down
Should not contain any word in the dictionary
Passwords should be impossible to remember

Now add some of my favorite rules REQUIRED by some systems:

System A:

Total length must be at least 6 characters

Case sensitive

Must contain a number
Password must change every 90 days

System B:

Must use three or more character classes (upper case, lower case, symbol, number)
Total length must be between 7 and 23 characters
Cannot be any of the most recent 3 passwords

If you're like me, the set of rules is completely unrealistic. I have to have 10-20 passwords that are not written down, impossible to memorize, longer than 6 characters, and tied to a particular account.

Online tools that create pseudo random passphrases are okay, but every 90 days I have to remember another arbitrary string. That's painful. Here's the system I use to deal with all this.

Pick a song, poem, speech, or passage to memorize. It should be relatively arbitrary, but relevant to you. In other words, it should be something you want to memorize. The Gettysburg Address, a Psalm, a Shakespeare sonnet, Jabberwalky, anything works as long as it has multiple sentences and is at least a good paragraph length, say 50 words. We'll use the following example:

Once upon a midnight dreary, while I pondered, weak and weary, Over many a quaint and curious volume of forgotten lore, While I nodded, nearly napping, suddenly there came a tapping, As of someone gently rapping, rapping at my chamber door. " 'Tis some visitor," I muttered, "tapping at my chamber door; Only this, and nothing more."
Now, whenever you need a password, pick a number. If today is the eighth, choose 8. Then we'll start with the eighth word: pondered. Now how many letters need to be in the password? Say 10. Then we look at the phrase starting at pondered and counting for 10 words (treat punctuation like words):

I pondered, weak and weary, Over many a

Take the first letter of each word, including punctuation again:

Ip,waw,Oma

Hmm. That looks like a pretty good password. Three character classes, fairly random. Also, it's easy to memorize the pass-phrase: you already know it. Here's another key: as long as no one knows your pass-poem (Poe's The Raven in this case), you can write down key information. Here, we start at the eighth word and use ten words. Write down 8,10. I usually write down lots more, because there's lots you have to remember if you don't use the password often:

ebay, icarus, 8, 10

The advantage of this system is that lots is kept in your head, but the tricky stuff is written down. You memorize one poem and write down keys into that poem. And, unless you're like me and tell everyone your system, people don't know what 8,10 means at all. The pass-phrase is cryptic and the algorithm is cryptic. It's not fool proof, but it's a lot easier than other methods I've tried for the long term. (Don't write down your key next to your pass-poem!)

Special cases: No Punctuation
Some systems don't allow punctuation: then when counting for the phrase, skip punctuation. Write down a minus to indicate you didn't use punctuation. From before, using 8 and 10 again, we get

pondered weak and weary Over many a quaint and curious

which becomes:

pwawOmaqac

and you write down:

ebay, icarus, 8,10-

Note the minus at the end indicates you ignored punctuation.

Monday, July 31, 2006

C# Properties

The first time someone waved c# properties in my face, I smelled trouble. It reminded me of other silver bullets I've watched fly my direction, and I wondered what low-grade thought process crafted those gleaming orbs from marketing hype.

First, rough background. In java, you write:

public MyClass{
private int myValue;
public int getValue(){return myValue;}
public void setValue(int value){ myValue= value;}
}

In C#, with Properties, you write:

public MyClass{
private int myValue;
public int Value{
  get{return myValue;}
  set{myValue=value;}
}
}

One of the "big benefits" of Properties is that you can treat them like fields:

MyClass.Value = x;

But, when compiled, the properties get replaced with get_ and set_ methods. This is a Big Deal (tm) to me; it means Properties are lying. They're lying because you're calling a method, but you think you're calling a field. The overhead possibilities are terrible. You said

MyClass.Value;

thinking you're getting a fast value back, but that call could be running off to a database, worse, on another machine. You said

MyClass.Value+=x;

and you just hit that database twice. Convention says they shouldn't do that, but convention also says people shouldn't fly planes into buildings. Who are you going to trust?

This doesn't bother me nearly so much:

MyClass.getValue();

is telling me there may be some work involved. I'm more okay with it running to a database and coming back. I'm absolutely fine with it just returning a value. My expectation of a longish call was bettered by immediate response. It's like getting a Wendys Value Meal, and they give you a free ice cream. It makes you happy.

Properties do the opposite. I expect them to come back immediately; when they hit a database, it's like getting that Value Meal and having no bread on the sandwich. I'm kinda ticked, and I'm not going to go to that Wendys again.

So keep your Properties to yourself. I'm a happy man with my accessors and excessive parentheses and ice cream. Take your silver bullet and get the jerk who stole your bread.

Technopinions