Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Programming Languages vs. Fat Fingers (spinellis.gr)
42 points by petsos on Dec 9, 2012 | hide | past | favorite | 20 comments


As a historical note, the Mercury period-vs-comma¹ bug probably wasn't a typo as such. In that era, programmers didn't type their programs; they wrote them on paper, using special printed forms² so that their intent would be clear to the professional typists who entered it. To prevent typos, everything would be typed at least twice; in the days before diff, there was special-purpose hardware for this.³

I can think of three possibilities for the programmer making such an error: actually writing it wrong (which seems unlikely to me because the mental context of writing a loop range doesn't lend itself to writing a single number); a skipping pen; or a locale issue — i.e. a programmer of European origin mixing up the characters.

¹ http://catless.ncl.ac.uk/Risks/9.54.html#subj1.1 ² http://en.wikipedia.org/wiki/File:FortranCodingForm.png ³ http://en.wikipedia.org/wiki/Keypunch#IBM_056_Card_Verifier


I cannot find how they controlled for the wordiness of the different languages. They changed one token in each file, but the number of tokens per file might be different. For example, Python likely will be shorter than Java due to its significant whitespace.

Also, the 'replace a single character in a token by noise' change may have hugely different effects, not only because of differences in keywords (begin…end vs {…}) but also, and probably more so, because of average variable and function name length (for the languages tested, this is a cultural issue, but it would not surprise me if the effect were large. You won't find 'FooFactory' in a perl program)


Using my complete lack of statistical knowledge, I multiplied the wrong output rate % by the total lines of code in the examples from original paper here http://www.spinellis.gr/pubs/conf/2012-PLATEAU-Fuzzer/pub/ht... to get a very bad approximation of fat fingering adjusted for program length. You'd expect more typos in a longer program; the original experiment always introduced 1 typo per run regardless of program length.

You guys enjoy while I prepare for the lynch mob of Statisticians :-)

    Lang    Err %   LOC    LOC adjusted Err %
    Ruby    0.17    159   27.03
    Python  0.15    161   24.15
    Perl    0.22    156   34.32
    PHP     0.36    224   80.64
    JS      0.18    102   18.36
    Java    0.1     331   33.1
    Haskell 0.15    114   17.1
    C#      0.095   389   36.955
    C++     0.08    461   36.88
    C       0.1     458   45.8


Looks like an improvement to my (completely unbiased, of course) eyes. Haskell moves away from C++/Java, C moves awy from them in the reverse direction, and PHP moves into its own league.

The surprises, IMO, are JavaScript (I would place it close to PHP) and perl (apparently, it is easy to come up with character sequences that are not valid perl :-))

Thinking of ways to get a perfect language according to this metric: the way to get there is to introduce lots of redundancies in the grammar. For example, if one requires two exact copies of the same source before code compiles, any single change will give compilation errors. However, programmers would build tools to defeat such strategies.

Maybe, one should scale for actual content, e.g. by weighing against the size of gzipped source code?


Yeah, look at the JavaScript LOC! Who wrote the rosetta code for those, Brendan Eich?!

This hints at another way to optimize for this metric; make the language as expressive as possible. Less characters should translate into less typos. Paul Graham strikes again! (http://www.paulgraham.com/power.html)

As to your point about redundancy, I think the researchers are in agreement with you on that one if you consider unit tests to be a sort of redundancy, expressing the same concept in two different ways. They bring this up repeatedly in their report.

Obligatory Perl jab: It surprised me that any of the Perl solutions used more than one line. :-P


  > Thinking of ways to get a perfect language according to 
  > this metric: the way to get there is to introduce lots of 
  > redundancies in the grammar. For example, if one requires 
  > two exact copies of the same source before code compiles
C Header-files T__T


So, assuming all typos inserted are in fact a serious error to the program and assume that the output they used to compare is a unit test. Then an interesting number to look at is:

    Errors remaining = Successful run - Faults caught in unit test
In that regard most languages in the study are quite equal. Although with the static languages you most likely catch the errors much earlier and you probably get a much better hint of where the error is, instead of just an assertion that you have an error. And of course this assumes that you do have unit tests.


This basically studies how impactful typos are for these various languages. Well, typos are always trivial to find and fix.

It's more serious issues that I would want a type system to help me with (like null references, concurrent access, etc.)


A typo is trivial to find when you know there exists a typo.

The scary part of this is the percent of programs that ran successfully but produced the wrong output. Outputs errors can go unnoticed.

I don't know if I am scared enough to stop using python though.


Nice article that show some of the virtues of statically-typed languages.


Although 'strongly typed' shouldn't be confused with 'dynamic typed'.


I find it funny that Java scored very similarly to Haskell, when otherwise the languages are quite different.


Java : lots of syntax to fuzz, all have to be right.

Haskell : few tokens, stronger checking (i.e no corecions), though Rosetta code is not as type-heavy as real Haskell code.

I have a conjecture that Haskell examples designed by experts - with types in mind as we do in production systems - would have lower compile rates than the Rosetta examples, that are written mostly by non-experts without regard to maintainability.

In production Haskell code, I will usually wrap Double values with newtypes, to e.g distinguish currency amounts, percent data and ratios from each other, specifically to guard against typos where I accidentally pass doubles parameters in the wrong order. Designing code with an intent to make it less vulnerable to fat fingers is certainly possible.


I don't see any value or meaning in this "experiment". What should it demonstrate?


Features such as automatic creation of mentioned variables and dynamic typing allow a mistake in code to change a correct program into a syntactically valid program that does the wrong thing.


It doesn't mean that if you write C you get less bugs. IMHO that's a not meaningful experitment that makes no sense in practice


Now show that this is a significant cause of costly errors in practice...


I know my evidence is just anecdotal, but I see it a lot. Man, do I miss my C++ and my Haskell when I have to use Python and javascript at work.

Most people seem to disagree with that happening "a lot", so maybe I'm working with bad codebases or just being unlucky. Indeed, a controlled study would yield more reliable information.


It is a comparison of several programming languages regarding how likely it is that a typo causes the program to still compile and run but produce wrong output. If a typo goes unnoticed you would really want the parser to notice or at least an obvious behaviour at runtime (e.g. a crash) instead of the code silently working and producing the wrong results.


>> I think that the most significant outcome of our study is the demonstration of the potential of comparative language fuzz testing for evaluating programming language designs.

> What should it demonstrate

E.g. Java is better than Haskell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: