Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hello, I’m Mr. Null. My Name Makes Me Invisible to Computers (wired.com)
238 points by BerislavLopac on Sept 4, 2016 | hide | past | favorite | 208 comments


10 years ago I worked for Jagex on RuneScape, an RPG. There are rules for that game about conduct and so on - no swearing, no scamming, blah blah. If you see someone doing something "bad" you can report them and the customer support team reviews the event and can allocate "black marks" on players who infringe.

One particular user accumulated thousands of black marks. We thought they were doing something incredibly bad. A black mark gets you banned for a short period of time, this player was banned for a millennium. But they kept sending messages saying that it was unjust, they did nothing wrong.

Turns out their playername was 'null' and half of other players' offences were attributed to them due to null in Java being cast to 'null' curing concatenation...


Wow. How was working for JaGeX? As a kid I was wayyy too obsessed with that game, even staying up past curfew times to play RS.


This is entirely secondhand from people I knew who worked there, but: developers treated well, support staff not so great. They've struggled to make a success of anything since Runescape.

There were also rumours of an internal scripting metalanguage for RS, that had no local variables and all the other hallmarks of something put together in a hurry that ends up being a success and then a burden.


my busted computer barely ran Runescape when I was growing up, I played The Realm Online instead since it ran on basically anything at the time.

I'd save Christmas and birthday cash to pay for my subscription annually.

Then World of Warcraft came around...looking back, that was the end of the Realm.

It's still around, believe it or not. The player count is usually ~20 people. Sometimes I get nostalgic and install it just to hear the music when it launches.

Ahh, memories. Damn shame the game is owned by a private family now instead of actual developers, the game would be a massive hit on mobile with a f2p model.


> It's still around, believe it or not. The player count is usually ~20 people. Sometimes I get nostalgic and install it just to hear the music when it launches.

At least you can still log in and meander around. I'd love to be able to do that with MxO, hyperjump around the city a bit and just enjoy the sounds and settings. Alas, the servers get shut down by SOE and that's the end of that. A shame when I reckon you could run it off a sodding azure box for sod all cost these days.


What was it like working with Andrew Gower? He strikes me as a technical genius with some of the stuff Jagex was doing with Java circa early 2000's [1].

[1]:https://www.youtube.com/watch?v=yrVUegwSKlY


Holy crap, I wrote that forum software. I left in 2007 so I had no idea they kept it but if it wasn't rebooted for years, probably.

The whole ethos of the company, driven by Andrew, was "minimal software for maximum output". I learned a lot about how much you can benefit from concentrating on efficiency up front compared to how much that ends up creating problems down the road.


I couldn't quite understand the video. Why did they clone Maya and maintain feature parity and use the same formats?


They didn't clone Maya, but the features it has. Being able to build everything, including all 3D material, in their own environment, made it easy to link everything together. And when something changes, that requires an update to the 3D model, they don't have to go back to Maya to edit the model there.


Wasn't expecting the Chuck Norris joke in the end. 2009 was an interesting year.


Damn, that's impressive. Thanks for linking it.


I remember being obsessed with Runescape 10 years ago!

I can remember pretty peculiar stuff from the game, now from my perspective as a programmer.

I remember that the anti-swearing algorithms detected bad words in a lot of ways, but those ways were so much that you would get your normal conversations sensored (oh man, I hated those asterisks so much). You would always find a way to say the bad word anyway... I guess that with today's advances in machine learning those ambiguities could be removed in a better way.

Love the 'null' story by the way.


Yeah the 'bad words' thing was hard, it's your classic Scunthorpe problem. Since the game was mainly targeted at people 12 years old and up, it was important to get right. However no matter how good the filter got, people would find creative ways to get around it, including a short period where players started using lots of fires and arranging them to spell out swear words.


> Since the game was mainly targeted at people 12 years old and up, it was important to get right

Let's be honest, this is not important. Even if 12 year olds didn't constantly see/hear/use "bad words", there are really no ill effects of lack of censorship in that way.


It's important for the subscription revenue - which comes from their parents.


Exactly, this wasn't puritanical zeal from Jagex but a realistic decision based on the fact that parents are the ones who control the credit cards.


Then it's even less important to get "right", since the consumers who actually see the results don't care if they accidentally see swear words, and almost any token effort at censorship (a "parental control" option mode, for example) will probably be satisfactory. You might lose some subscribers from extremely zealous parents who bother to read logs and see some stuff "accidentally" get through because the filter is too permissive, but I imagine the vast majority will give you the benefit of the doubt.


Are you a parent? Speaking as one, it doesn't work like that.

One parent sees a dick pic/swearing in a game, they ban the game, tell all their friends and then someone puts it in a school FB group, which is then shared around the country.

(Yes, Runescape was before much of this. That just made it slower, but the same process happens)


The best thing here, I think, would be to create an informational pamphlet kids could print out and give to their parents, showing examples of dick pics and swearing in literally every game imaginable, with the message of "this is just what 12-year-olds do to other 12-year-olds when given microphones and the ability to upload pictures; there is no way to stop this. Even if you ban them from games altogether, other 12-year-olds are just going to swear and draw graffiti dicks in places your child can see at school. Just give up already."


> Turns out their playername was 'null' and half of other players' offences were attributed to them due to null in Java being cast to 'null' curing concatenation...

Why did this happen though? I mean, why was it casted to begin with and why was there a null value?


Usually what happens there is String.toString(null) has been called somewhere (which will return the string "null").

Nulls are fairly normal in Java, and modern Java systems often have ways to define if something can return nulls or not. In the old days you just had to know.


> half of other players' offences were attributed to them due to null in Java being cast to 'null' curing concatenation

Why would half of the players have a playername of null?


He meant "attributed to him", I think.


No, they don't know whether the player "null" was male or female, therefore they used the pronoun "they", which is correct.


> the pronoun "they", which is correct

It's also highly ambiguous in this specific case (I had trouble understanding the sentence, too), and a simple "him/her" would have been preferable.


Got it (really didn't knew this use of "they").

Reading back, perhaps "the player" would be better than "they", if it was just one player having problems.


I would love to hear more stories about working for jagex back then - that game was my life and what got me into programming :)


>being cast to 'null' curing concatenation

What does this mean?


"" + null is "null"


Aren't you glad Java doesn't allow operator overloading, so this sort of mischief is impossible? /s


You're kidding, surely? Java really does that?

"" + null should be null (or throw an exception). Null is null and nothing else.


Nobody writes "" + null. They write "" + myVariableRef. And myVariableRef is null. As I posted earlier, the code is compiled to "" + String.valueOf(myVariableRef).

Since the empty String is a valid String, and Strinfg.valueOf() also returns a String, what other behavior would you ever imagine being sane or possible?


> Since the empty String is a valid String, and Strinfg.valueOf() also returns a String, what other behavior would you ever imagine being sane or possible?

IMHO, String.valueOf() throwing NPE upon encountering a null reference would be both sane and possible.

Unfortunately, it does indeed appear that it returns "null" instead [1].

[1] https://docs.oracle.com/javase/7/docs/api/java/lang/String.h...


On the other hand, it's quite convenient to not have to do null checks in debug messages. Compare:

log.trace("doStuff(" + obj1 + ", " + obj2 + ")");

to this:

log.trace("doStuff(" + (obj1 != null ? obj1 : "null") + ", " + (obj2 != null ? obj2 : "null") + ")");

The second one is IMHO quite hard to read, even in this short example (and code readability is important when you do code reviews)


That is why you should use string formatting instead of concatenation.

For string formatting, rendering "null" (or "") makes sense; for coercion not.


That just resolves into everyone always doing formatting, and never doing concatenation, which yields the same problem.


It should just throw NPE. I understand why people writing quick debug printlns could be infuriated by that, but the way it works now leads to much more subtle bugs.


An empty string of it HAS to be a string would make infinitely more sense. However a type error wouldn't toady on a strongly typed language


I think it should be an exception as those types can not be concatenated.

But still , "" + null being "null" is just plain crazy. I could see "" + null == "" (as you are concatenating nothing to the string), or "" + null == null as you are concatenating a string to null, but "null" is bonkers.


This is why I shrug when people complain about JavaScript casting weirdnesses


The + operator in this case is compiled to call String.valueOf() which returns "null" when given null.



Should have been "during" but I cannot edit my comment so I suppose that typo will live in prosperity.


Posterity?


That's the joke


Do you have a 10 year cape?


Tony Hoare, null's creator, regrets its invention:

“I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.”

https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retra...

https://www.infoq.com/presentations/Null-References-The-Bill...


Including "null" is one of the most unfortunate things about Go. I'm glad to see that other modern languages (Swift, Rust, and so on) are avoiding it.


I too find it annoying in Go, though I'm not sure what the default value of a reference in a struct would be otherwise.

However, I do see the value of NULL in a database context even though it makes database interfaces harder -- especially in Go, where the standard marshaling paradigm means anything NULLable has to be a reference and thus have a nil-check every time it's used.

The conceptual match is so awkward that when I write anything database-related for Go, if I have the option then at the same time I make everything in the database NOT NULL; even though that screws with the database.

Ah, NULL. When I think about the pain it causes, balanced against its utility, I sometimes wish I'd never heard of it.

And I'm sure learned people said the same thing about Zero, once upon a time.


Go's wholesale embrace of null is kinda jarring at this day and age, since it doesn't even have safe null dereferencing operators like Groovy, C#, Kotlin et. al. It's like Java all over again.

Considering Rob Pike is a huge fan of Tony Hoare, and the inevitable mountain of pain caused by null references, that's kinda surprising.

But I guess "Worse is Better" in the sense of "simplicity of implementation is valued over all else" is still the guiding principle of Go. As Tony Hoare himself said: "But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement". It seems like we're doomed to repeating this mistake again and again.


> I too find it annoying in Go, though I'm not sure what the default value of a reference in a struct would be otherwise.

The way it works in Rust is that you can't have a reference without the thing you're referring to. There isn't a default value and because of that it's an error to try to have a reference without something to refer to.

The way Rust works with this is to have a type called Option (I think this is a monad in haskell?) that lets you say this can be None or Some, so you have to explicitly handle the None case whenever you use it (either by panicing, or matching or some other method).


You’re thinking of Maybe[1]:

  data Maybe a = Just a | Nothing
      deriving (Eq, Ord)
Yes, it is a monad[2].

  return :: a -> Maybe a
  return x  = Just x

  (>>=)  :: Maybe a -> (a -> Maybe b) -> Maybe b
  (>>=) m g = case m of
                 Nothing -> Nothing
                 Just x  -> g x
[1]: https://wiki.haskell.org/Maybe

[2]: https://en.wikibooks.org/wiki/Haskell/Understanding_monads/M...


It's a monad in Rust too (with Some being return, and and_then[1] being >>=). Rust's generics just aren't yet[2] flexible enough to abstract over monads within the language.

[1] https://doc.rust-lang.org/std/option/enum.Option.html#method... [2] https://github.com/rust-lang/rfcs/issues/324


For anyone who has the same reaction as I did when first hearing of this, "But isn't that just like a nullable value?" (don't worry, this post will not contain any monad analogies)

Yes. Ish. In languages that have a null/nil/None then almost anything you have passed to any function could be null. Every function you write could have a null passed into it and you either have to do checks on everything or trust that other people will never pass those values in.

That's pretty OK until something unexpectedly receives one and hands it off to another function that doesn't deal with it gracefully.

In Haskell (and I assume others) this can only happen to Maybe types, they're the only ones that can have the value of Nothing. So the compiler knows which things can and cannot be Nothing, and therefore can throw an error on compilation if you are trying to pass a Maybe String (something that can be either a "Just String" that you can get a String from or a Nothing) into a function which only understands how to process a String.

This feels like it might be restrictive, but there are generic functions you can use to take functions which only understand how to process a String and turn it into one that handles a Maybe.

It's quite a nice system, even though I get flashbacks to java complaining I haven't dealt with all the exceptions.

Haskell doesn't completely stop you from shooting yourself in the foot though, you can still ask for an item from an empty list and break things. There are interesting approaches to problems like that however: http://goto.ucsd.edu/~rjhala/liquid/haskell/blog/about/

Finally, it's worth pointing out that Maybe and Nothing and Just and all that aren't built into the language, they're defined using the code that gbacon wrote. So in a way, Haskell doesn't have a Maybe type, people have written types that work really well for this kind of problem and everyone uses it because it's so useful.

[disclaimer: I've probably written 'function', 'type', 'value' and other terms with quite specific meanings in a very general way. Apologies if this hurts the understanding, and I would appreciate corrections if that's the case, but just assume I'm not being too precise if things don't make sense]


> Finally, it's worth pointing out that Maybe and Nothing and Just and all that aren't built into the language, they're defined using the code that gbacon wrote. So in a way, Haskell doesn't have a Maybe type, people have written types that work really well for this kind of problem and everyone uses it because it's so useful.

Yep, and I believe it's the same with with Rust, it's been put into the standard library but it's not some special thing that only the compiler can make.

And like you said, the whole idea is that the normal case is that you can't have null values. If you need them for something you declare that need explicitly and have to handle it explicitly or it's a compile error. That way it can be statically checked that you've handled things.


Option (and/or option) is the usual name of Maybe in ML style languages like SML, ocaml, etc.


What's wrong with using an Option/Maybe type? Can't Go do that?


You could (just use an interface). However, it would be a pain to use because Go doesn't have generics (if a type X implements the Maybe interface, it is not true that []X can be casted to []Maybe). So you would have to always make temporary slices to copy all of your Xs into.


Go effectively has Option types for database query results: https://golang.org/pkg/database/sql/#NullBool


This is not a generic option type, but rather a tri-state bool, or an Option<bool>. Go has no user-defined generics, so you can't have a bool type. It does have built-in "magical" generics, namely arrays, slices maps and channels, but no option/maybes. Language-level option types are not unheard of (C#, Swift and Kotlin all have a notion of this sort, although they all support proper user-defined generics as well).


> Language-level option types are not unheard of (C#, Swift and Kotlin all have a notion of this sort)

Swift's Optional is a library type: https://github.com/apple/swift/blob/master/stdlib/public/cor...

Though the compiler is aware of it and it does have language-level support (nil, !, ?, if let)


Swift most definitely has a null: nil. The difference is that you can (and it is the default to) declare variables of type non-nil object (non-optional), unlike Objective-C.


nil is just shorthand for Optional.None, which is Just Another Value. (sort of, really you can make nil translate to any type, but please don't)


Fair but I don't think the original claim was intended that pedantically! :)


The point of the original claim was that modern language (like Swift or Rust) tend not to make null part of every (reference) type. That Swift has a shorthand for Optional.None doesn't change that, nil isn't a valid value for e.g. a String (as it would be in Java or Go), only for String? (which is a shorthand for Optional<String>)


What you are describing is no different than modern Java where variables are marked @Nullable and its a compiler or linter error to dereference it without being inside a null-check conditional. If you don't use this in Java it is just the same as having String? as your type everywhere.


> What you are describing is no different than modern Java

It is in fact quite different. String/String? is not an optional annotation requiring the use of a linter and integration within an entire ecosystem which mostly doesn't use or care for it, it's a core part of the language's type system.

> If you don't use this in Java it is just the same as having String? as your type everywhere.

Except using String? everywhere is less convenient than using String everywhere, whereas not using @Nullable and a linter is much easier than doing so.,


String isn't a reference type in Swift, but yes.


Well, while Rust does have pointer types[1], it doesn't allow them to be used in typical code (i.e. dereferenced), except within "unsafe" blocks. A null pointer does exist[2]. I believe this is needed for such things as interoperability with C code.

[1] https://doc.rust-lang.org/std/primitive.pointer.html

[2] https://doc.rust-lang.org/std/ptr/fn.null.html


The only use case I've had for these personally is having a global config "constant" which can be reloaded behind a mutex. unsafe + Box::new() + mem::transmute() on a static mutable pointer. I believe I copied this from Servo based on a suggestion on IRC.

IIRC this was pre-1.0 Rust, so there's probably a better way to do it now.


I've recently played with using nil pointer references as a form of default overloading. The process in the function block was something like:

1) assign default value to variable

2) check if pointer parameter is nil

3) if pointer is not nil assign pointer value to variable

4) do work with variable

Am I overlooking a simpler way to do this? I feel like lib.LibraryFuncion(nil, nil) is nicer then lib.LibraryFunction(lib.LibraryDefaultOne, lib.LibraryDefaultTwo), though admittedly the explicitness of the second option is appealing.


Could you just create well-named functions which call LibraryFunction() internally rather than relying on the caller to specify 1 or more default arguments?


Yeah. It's funny I'm currently working on two Go libraries one uses the convention you listed [1] and the other [2] uses nil pointers.

[1]: https://github.com/b3ntly/go-sms/blob/master/client.go

[2]: https://github.com/b3ntly/mLock/blob/master/lock.go

Disclaimer: Both libraries are in the pre-alpha stages and are extremely light on tests and/or broken.


Hiding it in the implementation detail won't get rid of it.

While you can do pointer arithmatic there will be null pointers and while memory is addressible there will be pointer arithmatic.

And while I'm a programmer, I want access to all the functionality the CPU offers.


> While you can do pointer arithmatic there will be null pointers

There is no reason why the concept of a null pointer has to exist. If there were no special null pointers it would be perfectly okay for the operating system to actually allocate memory at 0x00000000. With unrestricted pointer arithmetic you can of course make invalid pointers, but a reasonable restriction is to only allow `pointer+integer -> pointer` and `pointer-pointer -> integer`. You can't make a null pointer with just those.


It's was meant to be there , It's a feature ,Going for the option/maybe types would have been too much according to it's designers.


I hear the "too complicated" argument a lot from defenders of null, but it doesn't quite make sense to me.

Good code is generally agreed upon to always check whether function inputs or the results of function calls are null (if they are nullable). Why not make it a compile-time error if you don't check rather than hoping that the fallible programmer is vigilant about checking and delaying potential crashes until runtime?

Go is extremely pedantic about a number of things like unused library imports, trailing commas, etc. which have absolutely no bearing on the actual compiled code, but it intentionally leaves things like this up to programmers who have shown that they can't be trusted to deal with it properly.

Having to manually deal with null is much more complicated than having an Option/Optional type in my opinion. We've also seen that it's far less safe.


> Go is extremely pedantic about a number of things like unused library imports, trailing commas, etc. which have absolutely no bearing on the actual compiled code

Trailing commas, agreed. But unused library imports have the (probably unintended) side effect that their `func init()` will execute. Which is also why there is an idiomatic way to import a module without giving it an identifier, just to have this side effect.


Good point. And there can actually be more than one init() function per package!


Yes, but I guess that they're just concatenated at compile-time.


I have to disagree with Hoare sensei there. The number is nowhere near a billion. I personally know of one case that totaled to a billion.


Would love to learn more about this single billion dollar mistake


SQL summation without coalescing null addenda to zero. The sum result was a null. This had been going on for years. Oops!

The lesson/mitigation was to add a NOT NULL attribute on top of a DEFAULT 0.


Hoare was talking about null references specifically. Or, to be even more precise, making references nullable by default, and allowing all operations on them in the type system, with U.B. or runtime errors if they actually happen.

NULL in SQL is a very different and largely unrelated concept. Though probably a mistake just as bad - not the concept itself even, but the name. Why did they ever think that "NULL" was a smart name to give to an unknown value? For developers, by the time SQL was a thing, "null" was already a fairly established way to describe null pointers/references. For non-developers, whom SQL presumably targeted, "null" just means zero, which is emphatically not the same thing as "unknown". They really should have called it "UNKNOWN" - then its semantics would make sense, and people wouldn't use it to denote a value that is missing (but known).


Sounds like this could have also been mitigated by either the SQL server warning about SUM() operating over a nullable column or a "where foo is not null" clause. Your solution is best, though.


Whatever you may think of `null` (ahem https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retra...), it is not a string type. It is horrifying to me that all of the problems described in the article are cases of allowing a user to execute arbitrary code by string input, and the article fails to point this out.


It is also security policies put together by overly cautious people. They say "I can't trust our developers to write properly secure code, and it won't hurt anyway so I'll just institute a blanket policy to block everything that might appear in an SQL injection attack. Let's see... we'll block quotes, ";", the word "null"... maybe a few more things."


This is also horrifying, as it indicates that they're taking wildly destructive precautions but still putting user input into SQL queries.


I don't think the suggestion was that the policy is created because the data is going into sql queries. The policies are often in place because the people in charge of the policy flat out don't trust the developers. Say the policy comes out of the CTO's office, they have two options: 1) liberal password policy, risk a crappy developer screwing up, or 2) conservative policy, not have to worry at night.


Slight change of topic here. Your comment made me realize that once a system gets too complicated the only change that can be made is to add more complexity.

In this case. It would be better to ensure all SQL is properly escaped, but because that isn't a trivial task, instead you end up adding another layer of complexity.


Right, and in larger organizations, the "they" are two completely different groups of people. My current situation: Developers in City A are expected to use secure coding standards. But the IT group in City B, halfway across the country, installs a intruder-alert proxy which blocks urls which might contain HTML or SQL. The CTO is probably too busy playing golf with Oracle salesfolks to care, everyone is just looking after their own end of the business.


I think you're misreading what the parent said. It sounded like that even if prepared queries (like PDO), some developers don't trust the database devs to Do The Right Thing(tm).


If they're using prepared statements, they shouldn't be doing anything about SQL injection attacks, as either the prepared statements work or they don't.


Exactly -- "if they are". Are they? How can you, as a manager, know? How can you make 100% sure that every developer to ever work on the code always does the right thing?

Then consider the downside and cost of filtering the input, vs. the downside and cost of injection attack times a probability of 0.01% of a developer messing up -- and a policy, however idiotic it may seem to the individual developer, may make sense on a business level.


A manager can just add easily create a policy about prepared statements as create a policy about mangling text to "protect" user input that is being put into SQL queries. And neither is easier to monitor. Adding the second as an additional policy because of uncertainty about compliance with the first is silly -- it bus you nothing where the first is complied with and gets you another monitoring problem. Better to just expend the same additional effort you'd expend monitoring the second policy to just improve monitoring the first policy.


It seems like you are ready to declare "second line of defense" as a silly thing in principe?

Sometimes adding a second line of defense at 90% efficiency is cheaper than increasing the first line from 90% to 99%.


I was responding to the premise. I was not responding to not the premise. Shrug.


You can use 100% prepared statements, and still be vulnerable to sql injection. Imagine SQL dynamically building SQL. It's real and I've seen it.


We had to integrate a webshop with the customer's existing SAP installation. The consultants in charge of SAP gave us a SOAP webservice that, among other things, had a command called ExecuteSQL - we could pass a raw SQL string and the SAP server would execute it. They "protected" it by blacklisting common naughty words like DROP, ALTER, etc., as well as semicolons and CAST (I don't really know why). When we needed to actually include a semicolon in the query as part of a string literal, I had to work around their filter by replacing the string literal with CONVERT(VARCHAR(MAX), 0x.......) .


Seems far more likely to be type coercion, or over enthusiastic validation, rather than arbitrary code execution.


Type coercion does not necessarily imply the contents were execute as code.


I'm not aware of any language that coerces the string `'null'` to the value `null`. Are you aware of any?


ColdFusion does. There was a famous stackoverflow thread on that "We have an employee whose last name is Null. Our employee lookup application is killed when that last name is used as the search term (which happens to be quite often now)"

http://stackoverflow.com/questions/4456438/how-to-pass-null-...


A badly coded website could go in the other direction, eg in JS, someone writes a poor string equality check:

    function normalise(s) {
        return (s + "").toLowerCase();
    }

    function isEqual(a, b) {
        return normalise(a) == normalise(b);
    }
And then someone else uses it in some form validation logic:

    if (isEqual(surname, null)) {
        return error;
    }


Then as the next step, some programmer somewhere notices that all the NULLs in the database have been converted into strings somewhere, and puts UPDATE table SET column = NULL WHERE column = "NULL" somewhere, maybe even in a trigger, and that's that.

The problem with data destruction is that only one thing in the entire data pipeline has to spuriously map two distinct values to the same output value, and the data is destroyed and can not be recovered. Very few programmers think carefully about whether a serialization or data transfer format maps every single possible input to a different output. It may only take one oversight to ruin your year.


It tends to not be in programming languages but in textual data formats (including many standard implementations of comma-separated files) and even databases that can only hold string values by nature but also want a way to represent null (and fail to fix this by escaping the hell out of everything with quotation marks, and yet it isn't that "code" is executing).


Rails at least used to do this when reading back from a database (eg if you had a nullable string field it would come back as nil if it happened to be the literal "nil")

Edit: IIRC there was a directive that would prevent the behavior though?


Java and JavaScript do the opposite: null values are represented by the string "null" ( http://stackoverflow.com/questions/24591212/why-the-coercion... ) - it's conceivable many programs processing strings would assume "null" in a string comes from a null value originally.


Your inaccurate description gave me a small heart attack. We all know JavaScript is horrible, but that is next-level horrible.

null is coerced to "null" when it's needed as a string. This behaviour is similar to printf in C. It's not the case that "null" === null like True == 1 in Python. (This would be my interpretation of "x is represented by y".)


If you want 'next-level horrible', open your browser's console and take a look at what you get back from "typeof null".


JavaScriptCore:

>>> typeof null

object

>>> typeof "null"

string

>>> typeof (null + "")

string


Something always bugs me about this criticism:

#1 arguably makes sense. Unless "null" were a "type" (which it isn't), the other possible options for this ("undefined", or "string", "number", "array", etc.) all make less sense.

#2 is expected/correct.

#3 makes sense if you accept that the "+" operator aggressively casts its arguments to strings when they are anything other than two numbers. Arguably that's a bad design, but it's not unexpected or unpredictable.

The one completely unexplainable "+" behavior I know of in JS (i.e., a recent node/v8) is the second of these:

  > [] + {}
  '[object Object]' // makes sense because the string representation of [] is ''
  > {} + []
  0 // ???
  > var x = {} + []
  undefined
  > x
  '[object Object]' // but assigning to a variable first makes sense again


> #1 arguably makes sense. Unless "null" were a "type" (which it isn't), the other possible options for this ("undefined", or "string", "number", "array", etc.) all make less sense.

`typeof` gets the primitive type of a value, and exactly like undefined null is a primitive value with its own primitive type, as `typeof undefined` returns `"undefined"` `typeof null` was supposed to return `"null"`, it doesn't due to a bug in the original implementation which was then spec-enshrined as fixing it would break too much stuff: http://www.2ality.com/2013/10/typeof-null.html


This is because:

> [] + {}

Becomes:

> [].toString() + {}.toString()

Empty array to a string is "" (empty string). Object to string is "[object Object]".

> {} + []

Becomes:

> +[]

The prefix turns the array into a number, same as Number([]). This seems to be because {} is interpreted as a code block, not an object, due to it being the first thing.

Put it in parentheses and it'll do toString like before.


And now the obligatory WAT video: https://www.destroyallsoftware.com/talks/wat


The first of which is just my point. Null cannot be treated safely as an object in any way. To attempt such elicits an exception. Yet its type is 'object' nonetheless, needlessly complicating every type check that has to do with objects. I've been working with Javascript, professionally and as a hobbyist, since about 1995, and have yet to encounter any satisfactory reason why this should be the case.

Edit: And a comment nearby explains that it is the way it is for the same reason Make barfs on leading spaces. Well, at least that's a reason! A terrible, terrible reason.


No, it might be a conversion from a database to CSV that's doing it or something similar.


"Types", sounds like someone doesn't understand dynamic languages, which are some of the most popular languages today.


To be more clear than my previous response, I am well aware that dynamically typed languages do all sorts of nonsense, but nearly none of them consider this `null` to equal `"null"`, certainly not the widely popular ones.

Side note: please refrain from "sounds like someone doesn't understand". It is extremely condescending, dissuades discussion by most people whether they actually know the topic or not, and can be especially deflating to people who are less educated but currently learning.

If there is something that you feel I misunderstood about the way popular languages treat the string `"null"`, feel free to explain it (there are many other response comments I've gotten that are good examples). But if the purpose of your comment is to make me or anyone else feel ignorant, please just keep it to yourself.


I strongly believe parent was being sarcastic.


Ugh, on re-reading, very probably. I'm suser sensitive to this stuff with all the "inclusion is a joke" nonsense going around today, and probably overreacted to jest. Apologies if so.


Sadly, all of my professional work to date has been in dynamic languages. They still have types, and `null` is still not `"null"`.

Edit: whoops, forgot the one time I wrote a one-off Mac app in Swift.


I was kinda sorta being sarcastic, though that kind of thing is a big problem in dynamic languages. Javascript has the concept of "truthy" values of ... whatever rather than a simple boolean type. Realistically this particular problem is not due to dynamic languages per se. Most likely it's due to some stage of a process where data was serialized/deserialized without sufficient care. In general it's more of an "in band signaling" problem.


Two weeks ago, when I applied for a new passport (in Germany), I was asked to make sure that my personal information on a printed out form was correct.

Immediately the line saying "Artist's/Religious name: null" caught my eyes. I told the officer: "Well, the info about my artist's name is wrong. I don't have one."

As I was expecting beforehand, I was told: "That's exactly what 'null' means here." To which I replied: "And what if my artist's name was 'null'?". To which I got no satisfactory response. I guess these problems are only apparent to programmers...

So, if anyone wants to cause confusion, now you know how.


Reminds me of a tweet I saw a while back where someone was told their four digit pin couldn't be a year. To which they replied "aren't they all years?"



Well, technically, any PIN starting with one or more zeroes is not a year.


Based on my own interactions with the Bürgeramt I think it would be most fun to have a Religious Name of "null null" -- at which point it's win-win, since you can confuse the bureaucrats and/or demand that your full Religious Name be recognized since, well, religion.

(Or, to up the ante, one might consider "Keine Angabe" as a Religious Name. It would stand out amongst the nulls, while also being the actual null they are trying to point to.)


If I remember correctly, "null" is only on the printout but not displayed on screen. As applications are handled completely electronically there is some chance that those two cases are actually distinguished (i.e. only the largely irrelevant printing part of the software has this problem).


What if someone has an "artist's name" and a "religious name", and they're not the same?


You have 2 lines of 30 characters each to work with. Likely they will be separated by a slash but as far as I know details for this are not regulated.

Note that you need to submit proof. For an artist's name it needs to overshadow your common name in at least some parts of life and entry is at the discretion of the officer. Famously, a sex worker was not allowed to add her artist's name into her ID card.


If your artist name was 'null' then Artists's/Religious name: null would be appropriate, would it not?


In this particular case, yes. That's not the issue. In the general case, the issue would now lie in deciding whether null is actually a name or a 'value' indicating the absence of one.


> For those of you unwise in the ways of programming, the problem is that “null” is one of those famously “reserved” text strings in many programming languages.

It's not a text string but there are some systems that present it as such and those are the issue. Lazy programming. I remember back working for a financial institution and was told to add some validation to a service that handled adding investment plans to someone's account. The third party service we had to use was using SOAP and NULL values were represented by, you guessed it, simply putting the NULL value in the XML instead of simply omitting the item itself.

No amount of protest would make the third party service change the way it worked. "It's always worked like that and we haven't had an issue" and the "if we changed it thousands of our customers would be negatively affected! We can't do that!" were the primary excuses.


Pretty much any language that evaluates this as true:

    "null" == null


Python doesn't, and any language that evaluates this as true is crazy. Not even JavaScript does this.


It is — sort of — in both. It happens by programmers who are getting superstitious about what the actual type of a variable they have is, and `str()` it. Then they "know" what they're working with. (Alas, the string "None" or "null".)


Not even PHP (it took me about 10 minutes to remember how to PHP before I could test it to be sure).


What's the difference between yourself and a Wordpress plugin?

None, doing things in PHP makes you really insecure.


Based on the replies you've received, I'd say that actually there is a very small minority list of languages which do this.


The only time I can think of that happening is poorly written SQL queries built out of concatenating strings. And if a system is doing that it has far bigger problems than the last name "Null"


Even at that, in SQL 'null' and null are definitely not the same thing.


Ah yeah, you're right, as even in that situation you'd wrap the name in quotes.


Not C, C++, or D.


This is silly. Not even this is true in many languages null == null



I love the "Falsehoods programmers believe about X" articles. They are basically ready-made failure test cases. Whenever a junior co-worker would say something like "I know! I'll just write our own date and time handling code." I'd point him at [1], let him stew on that for an hour or so, and then, sure enough, go back and lo and behold he's convinced to just use a ready-made third party library instead.

1: http://infiniteundo.com/post/25326999628/falsehoods-programm...


Poor time-handling also have big consequences as well. In Nevada, for example, overtime calculations use a rolling 24-hour period, and as a result many workers are incorrectly paid each year because their employer's system doesn't account for DST changes.


I'm all for good testing, and for encouraging programmers to think more carefully, but most of these items strike me as needlessly pedantic. My goal is to build software that works in practice. If I have the time to build in edge cases that accommodate super-minorities, I'll probably start by fixing my app in IE6 to pick up that extra 2%. My guess is that'll be a bigger uptick in signups than I'll get for somehow handling names that can't be represented by "all Unicode code points."

If you live in the U.S. and your name can't be represented by 7-bit ASCII (forget Unicode!), you've probably had to adopt an alternate name that can, or else you've spent a good chunk of your life banging your head against every private and public company that needs you to fill out a form. When you sign up for my app, you'll probably use that alternate name out of habit.


Your points best apply to situations where the end user is the same as the person named. While this may seem to be the case for most web applications, let's note the interesting results on, say, Facebook when trying to even search for already-made friends whose '7-bit use names' might not be the ones you expect. (In fact, I recall odd results from Facebook when searching without accents, which are generally require special attention to enter on a US keyboard.)

Further, if you live in the US doesn't mean you've lived there long. Are you ready to cope when the user hasn't settled on, or perhaps even understands, the need for said use name? Heck, US population is 4.4% -- who's focused on super-minorities, now?


We had a customer named Echo. He couldn't make a payment because our credit card processor looked for common Unix shell commands and filtered them out. I think we ended up creating a discount code just for him so that he'd get the item for free to make up for the annoyance (zero amount due meant no call to the card company).


...filtering out... common Unix commands? By the credit card processor?

I would ask 'why', but every time my brain goes there it comes up with terrible, terrible scenarios that I just don't want to think about.


Do you mean you're not supposed to have a system(3) call with a command built by concatenating user-provided strings as part of your validation process?


DON'T WANT TO THINK ABOUT.


Their explanation was that someone could send them a command line and it might could get executed (like a SQL injection), so they disallowed them pretty early in their processing to prevent it from happening.

Is this likely? I'm sure someone has tried it at some point. And an excess of caution is something that I sort of want in a card processor.


But why only the common unix commands? I can't think of a level of caution where filtering 'echo' is appropriate but, filtering a more obscure command, perhaps 'dc', is not.

An excess of caution would be to put the process in its own chroot'ed jail, with only the bare minimum of tools for needed functionality. And '/bin/echo' is not one of those tools.


Probably because they use a web application firewall that's full of strange rules to try to protect any random crappy application behind it.


The comments on wired shed some light on what might be happening, that legacy systems used the string 'NULL' to represent null and now the legacy data has intermingled and can't be cleaned up. Gross!

This guy is basically a walking fuzzer on forms and input systems. He could find vulnerabilities without actually doing anything illegal ;)


Lots of Irish people are walking SQL injection testers, and don't even know it.


And people from The Hague.


as in "'s Gravenhage"?



That made my day.


When it comes to email validate, wow why. But if you have to validate whether the recipient gets the email. You will have to do this either way. If you still want to do more, validate the hostname against dns as that can be quick and compared to a typist is not a performance issue(don't forget idn). If you still want to do more, you are only really left with size checks and a few control characters.

https://github.com/beached/validate_email is about what I think is the most one should validate without sending an email you have to send anyways


For those thinking that only a complete idiot would consume the string null vs the value null when processing form input, you're pretty much correct. Most languages, even scripting ones, don't rely on the value of the string, but of the value of the reference. And while there are probably many web-front ends that happily work with "null" as a field, that's likely not the issue here.

The issue that will most likely strike are all of the assumptions that your front-end team don't care about- downstream systems. This typically comes from software that is packaged for explicit purposes. In Healthcare we have things called MPIs (master patient index) and as a rule we typically have data scrubbers because in the MPI world we HAVE to work with bad data, self-entered data. We don't have the luxury of relying on primary keys for everything. Names like "Baby Smith" often comes in as the yet-to-be-named baby of the Smith family. I won't go into HealthIT 101 here, but suffice to say, we must do a lot of data scrubbing and/or annotation.

What compounds the issue even more is that most of these software components, be it healthcare or other industries, are typically resold and therefore have customers that keep piling on more and more use-cases. There are solutions to the problems, but many of them involve breaking human-end workflows (how do you tell a customer that you'll now require every patient to remember their patient ID, for example, before you admit them?). Those workflow changes tend to be non-starters and from there, the chain of never-ending bad coding compromises enter the scene.

Now I'm not defending the practice of treating "baby" or "null" or any other string as a special case, but being part of some of these workflow issues for years has led me to believe that not only are coders around the world going to continue these practices, there's not much they can do about them until more and more of our software gets more built-in intelligence. Maybe in the future we can actually detect it better when the name "baby" actually is their first name! (And actually with all of the deep learning going on lately, I'd imagine someone is building something like this now!)

All I'm saying is that it takes a broader understanding of the issue here to fully get why it's not just a front-end coder's error, but rather a deeper, as-of-now hard to solve problem. Think about when you start integrating with outside systems that don't follow your organization's understanding of a name. What about when you get into analytics? When you connect to external systems that performs mailings? Do YOU know the assumptions and use-cases that went into the platforms you're about to hop on board with? Probably not all of them.


This is kind of finding that "little Bobby tables" exists in real life :-)

[1] https://xkcd.com/327/


Mr. Null's second cousin.


wired, don't ever change. I'm not running ad blockers but wired blocked me claiming I am. Made me remember nothing in wired is worth reading and I should do something more useful or enjoyable.


Agreed. No ad blockers here, but the web site claims there are.


Ironically, I have an adblocker but Wired doesn't complain about it, and let me view the content just fine.


The other version of this is the stack exchange question.

An employee, whose last name is Null, kills our employee lookup: https://news.ycombinator.com/item?id=3900224

We have an employee whose last name is Null. He kills our employee lookup (2012): https://news.ycombinator.com/item?id=6140631


Type coercion and a misguided approach to removing character counts is a pain.

It is embarrassing how little of the code I have seen uses type unions either explicitly or implicitly. How often crucial checks and balances are removed for example or test code. How often copy and paste is used over abstract.

This is an embarrassment to all of us, and I am not sure that it is absent from my work.


Wired must listen to NPR http://www.npr.org/2016/04/02/472716929/bluff-the-listener

Edit: or the other way around if you read the dates...


My friend posted a screenshot a couple of days ago of a system rejecting name input values that were considered rude or offensive. His last name? 'Wang'.

He had to lie about his last name to progress the form.


Sounds like another addition for the "Falsehoods Programmers Believe About Names" article [1].

Other problematic names: "O'Neill" (apostrophes), "Gómez" (accents), "Rees-Mogg" (hyphens).

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...


Isn't Wang kind of a common name in... somewhere? China, I guess? (The founder of Wang computers was of Chinese descent, IIRC...)


Wang is an extremely common Chinese surname, yes. It's a romanisation of a couple of names which also sometimes become Wong and Ong.

https://en.wikipedia.org/wiki/Wang_(surname)


This is the funniest article I have ever read on HN. Christopher Null, you are a wonderful humorist and you may be destined for immortality (at least in some circles).


Isn't a name in a database and/or variable just a string?

How can it become the data type Null without a literal casting?

Is this just affecting scripting languages?


I think you underestimate how much stuff relies on string concatenation or string interpolation to build SQL statements that are also case-insensitive by nature. Yes, people should probably use an ORM and yes they should properly quote input, but so much code exists that doesn't. Even a strongly typed host language doesn't preclude stringly typed SQL.


No, people shouldn't quote input. They should bind input.


Please explain! I assume you mean bind to a type, but how does that help when you build a SQL statement? Surely whatever type you use has a 'toString' (or language equivalent) that is explicitly or implicitly used with string concatenation or string interpolation.


No, you put in placeholders like ? in the query string, and supply replacement parameters out-of-band separately. See for example the bindValue api in the PHP DBAL API: http://docs.doctrine-project.org/projects/doctrine-dbal/en/l...

This way, possibly-user-supplied values are never mixed with the query string, and you don't have to worry about quoting.


Haha, I didn't know that was what that was called. It has been a few years since I did database driven development.


You build your SQL statement using named parameters for the values you're going to pass in, and you use your database's client API to create a command and bind parameter values to it. Then you ask the API to execute the command.

The binding step accepts values in your language's native data types, and handles conversions to the database's data types for you in a safe manner.


Gotcha. I didn't know what this was called. Basically it is what I meant, but a good portion of deployed code doesn't use this from my experience. Personally I'd use an ORM, but that's just me.


Scripting languages shouldn't be confused by this either


For those blocked by ad blockers: take the title of the article, paste it into to google and then view the cached page.


Judging from the comments so far, the best name for a privacy conscious person seems to be Null O'Bash.


I once had to (try to) explain to a colleague why he couldn't have a Windows username corresponding to his preferred initials - "PRN" (he settled for "PN"). I imagine someone with the initials "CON" would have a similar problem.


Any site that registers his last name as a problem is vulnerable to SQL injection, no?

Doesn't make sense to me that he says most big companies have a problem with his last name; they generally do not have this vulnerability.


Big companies probably have more pieces of software that increase the chance of one of them having trouble with null, or they're overly cautious because they know they have that risk.


I feel like this is the real life incarnation of the classic sci-fi trope "does not compute".


I had a math teacher named Mr. Null.


Meh, in what language does ('Null'==Null) evaluate to true?


I'm guessing this mostly happens during serialization to and from strings. One programmer does String.valueOf(x) instead of x.toString() to prevent NullPointerExceptions. This works pretty well until the next guy does x == null || x.equals("null") because "null"s pop up in UI. At this point this is irreversible as nobody can tell what is null and what is "null".


It doesn't have to be at the language level. It's not unreasonable to assume that someone, somewhere in the mists of time decided that the string "NULL" was a reasonable choice for representing missing data in some input data and things kind of snowballed from there.


I've seen this in code I've reviewed before, where people will check for `== "empty"` instead of just using types like `null` which were designed for this (in javascript).


A practise that may come from some shell scripting languages where `[ $x == "" ]` is an invalid command because empty strings aren't strings.


That may be true in older shells, but in modern shells [ $x == "" ] is valid, and "" is a single token whose value is the empty string. But I've seen old code that does things like

    if [ x$x == "x" ] ; then ...
apparently because it was necessary at one time.


It depends on what you mean by "older" shells. More modern shells tend to support this properly, but the baseline for most shell scripts is /bin/sh which is not guaranteed to be modern – it is often a new release of Sh.


Anyone who's done even a trivial amount of programming in the shell knows to do: [ "$x" == "" ]


That does not help, as far as POSIX is concerned.

    $ [ "$x" == "" ]
    sh: 2: [: unexpected operator


Normally through coercion, and where the idea of a missing field hasn't been considered by a parser.


Until something gets stored as CSV at some point where there's no straight-forward way to store Null.


An empty field?


No, an empty field represents "" - the empty string. NULL and "" are not the same thing.


ColdFusion, if I remember correctly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: