Software Integrity

 

Swift: Close to greatness in programming language design, Part 3

Swift: Close to greatness in programming language design, Part 3

Welcome back

Ahead of Coverity Static Analysis support for the Swift programming language, we are examining design decisions in the language from the perspective of defect patterns detectable with static analysis. Before digging into Part 3, I recommend reading Part 1 and Part 2 in this series if you have not already.

Defect patterns part 3: heavy

Now we consider defect patterns that I consider “heavy” because they are either a challenge in language design, a challenge in static analysis, or both. And many of these defects are quite common in production code.

Avoided: monitor-style parallel programming bugs

Languages like C# and Java were designed from the start for parallel programming via threads, locks, and monitor variables. This “wrong by default” paradigm is an almost limitless source of programming errors, many of which can be detected with static analysis. Just as one example from real code:

synchronized (clients) {
    HashMap m = new HashMap(clients);
    m.put(dc, cq);
    clients = m;
}

I think the flawed thinking goes like this: “I need to lock something for this assignment, but I don’t think I need to lock the whole ‘this’ object. I can just lock this field while I assign to it.” Of course, this code will lock the object referenced by the field, which is useless for guarding an overwrite of that field.

And what’s up with copying the whole structure? It looks like they want to be able to “atomically” update this field (by creating a new object on each update) so that readers do not have to lock the field to read it. Even if the memory model guaranteed that another thread reading the updated reference would also read a fully coherent and updated object when accessing it (IIRC, Java has no such guarantee and even C# does not because the update is after the constructor finishes), there’s still a problem.

Suppose two threads hit this code. Thread1 acquires the lock while thread2 is blocked waiting for it. Thread1 completes the code, updating the field clients. A third thread (or thread1 again) enters the code and locks the updated field. The two threads are now executing the same code with the same ‘this’ object but holding two different locks. Their execution can freely interleave to lose one of the updates to the HashMap.

Anyway, it appears that Swift has wisely avoided the whole monitor-style parallel programming hell. I haven’t studied closely the alternative it does have, but it seems there’s a large class of defect patterns around locking hygiene and discipline that will not apply to Swift, thankfully!

Likely substantially avoided: null reference bugs

Almost all industrial programming languages have what I call “the implicit option” on reference/pointer variables; in other words, they can be NULL, nullptr, null, nil or None. C++ references are kind-of a way to specify a non-NULL pointer, but for technological or idiomatic reasons, they do not seem to solve the implicit option problem in practice.

Code inconsistencies related to the implicit option (null), many of which indicate a null dereference is possible, are a dominant source of static analysis defects. You might think that other runtime errors in safe languages, such as array out-of-bounds or failed casts, would be a similar trove of defects. Our data, over and over, indicate that there’s something different about nullity, the implicit option, that makes it a trove of defects. I believe there are two aspects that make it problematic. First, the implicitness of the option can lead to ambiguity about who (what part of the code) is responsible for handling the null case; each side of a call might assume the other is taking care of the null case. Second, nullness is often tied to exceptional cases. The normal case is often non-null but in some reasonable but uncommon circumstances null is also possible. This can mean it is difficult to test, or at least requires a specific effort to test. It’s almost as if “null” was specifically designed to trick experienced programmers into mishandling certain exceptional cases in their code.

The Swift designers seem to have put significant effort into blocking this trick. Swift variables that are possibly nil have to be dealt with in a specific way before data is accessed from them. There are a number of idioms for intelligently dealing with nil cases, and it’s easy enough to “forcibly unwrap” an option with a trailing ‘!’, but I suspect that forcing people to think more about their use of optional types will significantly reduce errors in handling of nil. I don’t think they will be zero, and I suspect static analysis will still be able to find some bugs. Nevertheless, this is a big improvement over other popular industrial languages.

Maybe somewhat avoided: resource leak bugs

If nullity bugs are the bread of static analysis bugs, resource leaks are the butter. Especially in safe, garbage-collected languages like C# and Java, it is so easy to neglect disposing objects that require explicit disposal (such as objects that hold open handles to open files or network sockets). Having the garbage collector reclaim memory for you automatically is great until it lulls you into complacency when it comes to objects requiring explicit closure. (The garbage collector can literally take forever to close them.) This is the “wrong by default” dark side of safe languages like C# and Java, and it’s probably why (roughly speaking) your Android phone sometimes drains the battery like crazy for no apparent reason.

Java 7 (year 2011) finally made it not-crazy-difficult to do the right thing (in most cases) with resources like this, with the try-with-resources statement. I believe the C# equivalent “using” statement was in the language from the start (year 2000). With a few caveats, such as ambiguity around what is actually a resource that needs closing, these features make it easy to do the right thing when you remember. Also, since resource leak bugs like this tend to involve an isolated portion of code, they are pretty well detectable with static analysis.

Swift has taken a bold design stance when it comes to resource leaks, with the language relying on automatic reference counting (ARC). I would call this idea neither new nor magical. It does solve the resource leak problem as we know it from Java and C#, but with the potential for new kinds of errors and resource leaks. Resources will be reclaimed quickly and automatically assuming there are no strong reference cycles, but reference cycles are common in non-trivial object-oriented code. Therefore, there is a new burden to carefully inject “weak” or “unowned” references to ensure automatic reclamation of memory and other resources.

I have mixed feelings about this system. It is easier for simple code to avoid resource leaks, and I expect the rate of people writing resource leaks into their code to go down significantly. But it will be notably more difficult to avoid resource leaks in complex code. The required referencing discipline is unfortunately reminiscent of locking discipline in monitor-style parallel programming, but in this case the discipline has the potential to permeate all parts of the program. Reference cycles do not require some specialized construct or API; they are easy to create, even subtly or inadvertently.

It is likely that static analysis could find many of the “strong reference cycle” resource leaks, but I suspect it will be a smaller fraction of all bugs compared to resource leaks in Java or C#. This concerns me. My guess is that someone not using static analysis might be better off on average with an ARC language like Swift, but someone using static analysis would be better off with copying garbage collection like in Java and C#, because the average kind of resource leak defect is simpler, more easily detectable with static analysis, and more easily fixable.

If there is (or will be) good dynamic tooling for detecting reference cycles in Swift, then it might be a better experience overall. Or if there’s some technological improvement that reduces or eliminates the need for manually annotating weak/unowned references or explicitly closing resources, Swift seems more equipped to absorb such an improvement than C#/Java. I appreciate the bold attack on this problem, but without one of these things, I’m not convinced this is a real improvement in ease-of-correctness vs. C#/Java.

Largely avoided: swapped arguments

A somewhat rare but interesting bug is passing function arguments in the wrong order, which Coverity can sometimes detect when the compiler can’t. We don’t generally think of this as a problem addressable by the programming language, but Swift does substantively address this problem, likely based on its Objective-C / Smalltalk roots. Declared functions have to opt-in to allowing positional parameter passing; by default callers must pass parameters based on name. C#, Python, and possibly others have something like this feature, but not to the same degree as Swift.

Unless some people idiomatically opt out of this feature, I expect this to virtually eliminate bugs in which arguments with compatible types are accidentally passed in the wrong order. There is a cost in convenience (conciseness), but there’s no doubt a small improvement in likely correctness.

Somewhat avoided: intended side effect in call

Another classic mistake in Java and C# is related to “expressions as statements” above but seems more like a language design trade-off than a hole:

public void setPassword(String password) {
    if (!StringUtils.isBlank(password)) {
        password.trim();
    }
    this.password = password;
}

Because Java Strings are immutable, trim() returns a String with whitespace trimmed; it does not modify the receiver object. Therefore, password.trim() doesn’t have any useful effect since the return value is ignored. Note that since Java does not have complex value types, the alternative is Strings as mutable reference types. This would likely to be more error-prone because of mutation in the presence of aliasing.

Strings in Swift are structs, or value types, so they behave much like C++ strings, which can be modified “in place” without worrying about aliasing. (I assume there’s reference counting and copy-on-write under the hood for performance.) For example, the append() method modifies a String in place while the appending() method returns a String without modifying those provided. The lowercased() method also returns a String with the result, though there’s no side-effecting version of this method. These naming conventions are nice and should help, but are far from perfect.

A more subtle case detected by our static analysis (Java):

public void apply(float dt, SpringPoint node) {
    windDirection.x += dt * (FastMath.nextRandomFloat() - 0.5f);
    windDirection.z += dt * (FastMath.nextRandomFloat() - 0.5f);
    windDirection.normalize();
    ...

The normalize() method creates a new Vector3f object with the normalized result, leaving the receiver object unchanged. Coverity is smart enough to recognize this kind of inter-procedural programming mistake.

Swift has a compiler warning related to this, when a non-mutating function is called on a value type and the result it ignored, but this does not work for reference types and compiler warnings are too often ignored.

Defects like this, including those involving built-in APIs such as String, are common, though still dwarfed by nullity issues and resource leaks. In Java, they’re about as common as all the above “basic” defect patterns combined. Therefore, language designers should do more to address this problem. I would suggest that non-void functions by default have a non-ignorable return value, meaning it’s an error to simply discard their result as part of an expression statement (or similar). Return values considered legitimately ignorable can be annotated as such, perhaps with parentheses, such as “mutating func popFirst() -> (Set.Element?).” And there should be some concise way of intentionally ignoring non-ignorable return, perhaps with parentheses as in “(foo.toString()); // cache result.”

Avoided, like other static languages: typos in identifiers

Nothing special here about Swift, but I have to bring it up as part of the good design choices in Swift.

Our static analysis data on open source software in JavaScript, Python, PHP and Ruby indicates that programming in a dynamic language creates a real hazard for mistyping identifiers and not noticing the problem right away. Even in Python, the most strict in terms of reporting possible correctness issues as runtime errors, it is surprisingly easy for mistyped identifiers to go unnoticed, as in this brilliant example that Coverity Static Analysis finds in an open-source Python program:

def pc_finished(self):
    try:
        self.pc.finished.diconnect()
    except:
        pass
    self.stack.setCurrentIndex(0)
    self.stack.removeWidget(self.pc)
    self.pc = None

Spot the error? Care to wager that you’ll spot it 95% of the time in a code review or unit tests?

It says “diconnect” instead of “disconnect.” And because this is (with good intentions) done in a context in which exceptions are ignored, the AttributeError from the typo will be caught and ignored, and the disconnect operation quietly omitted.

If your dynamic language allows references to arbitrary globals and also allows arbitrary expressions to be statements (ahem, JavaScript), then you can accidentally write the statement “breal;” instead of “break;” and have it accepted. Really, I wish I were making this one up.

Note that our Coverity checker for this problem is now patent-pending.

Summary

Coverity Static Analysis delivers a lot of value in covering design holes left by programming languages. In general, newer languages avoid more of these holes, and that is reflected in measurably fewer defects reported by static analysis. However, some languages like Ruby and PHP (both year 1995) have more than their share of problems for their age. (Java was also born in 1995.)

Of languages we have worked on for static analysis, Swift (year 2014) is by far the youngest and likewise seems to have the fewest design holes that lead to known bug patterns and create obvious opportunities for static analysis. This presents a challenge for us in adding Swift support to our product, because there’s less “low-hanging fruit.” Even so, we have a number of existing checkers, most not described above, that should apply nicely to Swift. For example, we have a great heuristic checker for finding copy-paste bugs, and it’s hard to imagine programming language improvements that would mitigate those. (Update: just today a customer shared a copy-paste bug and my colleague Charles-Henri Gros pointed out that ML-style pattern matching would have eliminated of the redundancy that enabled the inconsistency.)

But Swift falls short of greatness, in my opinion, because our experience shows some design choices that will clearly lead to some senseless bugs. In particular, I’m referring to (a) allowing any expression as a legal statement and (b) allowing unreachable code. This is generally unfortunate, but happens to be good for us as sellers of an industry-leading static analysis tool.

Learn more about the Coverity Static Analysis tool. 

Acknowledgments

Charles-Henri Gros, Tom Hildebrandt, Quinn Jackson, and Duckki Oe helped to spot some errors and unclear statements. Thanks!