> You can just make it implementation defined No you can't. Let me repeat again....

dzaima · on March 31, 2025

But an implementation wouldn't choose to specify its behavior for an operation to be something that it knows it cannot do efficiently, if the goal is to be efficient, for extremely obvious reasons.

If your hardware does twos complement, the compiler should choose to define the implementation-defined behavior to be twos complement. If the hardware traps, the implementation-defined behavior would be to trap. And so on.

And, as long as the goal isn't to exploit the overflows for fancy optimizations unrelated to what the hardware does, that'll perfectly fine and produce code that doesn't do anything extra over what you'd expect, and won't ever summon cthulhu (as long as the behavior the implementation defines isn't to summon cthulhu).

(that's all assuming that there's a single best definable specific behavior for a given op for a given architecture, which isn't actually true; e.g. ARM scalar shifts do `shift %= width`, but SIMD shifts shift the other direction on a negative amount, and give 0 on `abs(shift) >= width`, only looking at the low 8 bits of the shift amount)

eru · on March 31, 2025

> (that's all assuming that there's a single best definable specific behavior for a given op for a given architecture, which isn't actually true; e.g. ARM scalar shifts do `shift %= width`, but SIMD shifts shift the other direction on a negative amount, and give 0 on `abs(shift) >= width`, only looking at the low 8 bits of the shift amount)

In that case, your implementation defined behaviour could still be that your compiler is allowed to non-deterministically pick between any of the allowed behaviours. But nasal demons would still be verboten.

grandempire · on March 31, 2025

> your implementation defined behaviour could still be that your compiler is allowed to non-deterministically

No. Non-determinism is a key difference between undefined and implementation defined.

Remember back in school - a function is well defined if it has exactly one output for every input.

Let me turn this question around since we keep going in circles. If the standard said the words you just said instead of “undefined” how would that make C better? Is there some bad implication I am missing?

eru · on April 1, 2025

> No. Non-determinism is a key difference between undefined and implementation defined.

Actually, I think that's the difference in the C spec between implementation defined and unspecified. Eg function arguments can be evaluated in any order the compiler feels like, but it's not UB (because the only thing that's allowed to vary is the order, nasal demons are still verboten.)

You are right that it seems like the C standard expects implementation defined behaviour to be deterministic. That's annoying and should probably be fixed.

> If the standard said the words you just said instead of “undefined” how would that make C better? Is there some bad implication I am missing?

A lot better! If you know that eg shifts (or signed integer addition) always produce a value (even if non-deterministically) out of a documented set of valid options, that would make your program much more easy to reason about for the programmer, than UB's nasal demons that fly back in time.

Informally, many people already try to reason about their C programs like this.

See also how eg malloc returns an address, but apart from some basic requirements on alignment etc, the spec doesn't specify what address gets returned: the implementation can pick non-deterministically.

grandempire · on March 31, 2025

Yes, we are in complete agreement.

eru · on March 31, 2025

> If your hardware does something else, `fwrapv` will not help you.

Yes, and both you and your compiler know what your hardware is doing.

If you have hardware that does something else, you could do another implementation defined thing. Eg one's complement or trapping or whatever it is your hardware is doing.

>> Unsigned integer overflow doesn't have any conditionals either.

> That's because there is no detection needed. The instruction set for unsigned integers is designed to do that.

It's designed to do that on some (well, almost all) computers. Just like it's designed to do the same for signed integers. Most computers don't even have a special instruction for signed integer addition, because they are twos-complement machines.

---

Implementation defined could also be:

'Do whatever your processor does with an 'add' instruction; but don't let anything weird travel backwards in time, or make additional assumptions about execution paths not happening.'

grandempire · on March 31, 2025

> Do whatever your processor does with an 'add' instruction; but don't let anything weird travel backwards in time, or make additional assumptions about execution paths not happening.'

That’s exactly what the standard means by undefined. Anything more you are reading into it is an absurd hypothetical. If you find such an implementation - let me know so I can avoid it. Clang and gcc do reasonable things - as you have pointed out there are helpful flags to configure it too.

eru · on March 31, 2025

> Clang and gcc do reasonable things - as you have pointed out there are helpful flags to configure it too.

Without these specific flags, they don't do 'reasonable things'. They happily assume that UB can't happen and use that information to declare certain code paths dead and optimise them away.

The simplest example is something like this:

    signed char x = 20;
    while(x < x+1) {
        do_something();
        x++;
    }

GCC and Clang when told to optimise will happily assume that you have an infinite loop here. (Unless explicitly told otherwise via the compiler flags we mentioned.)

grandempire · on March 31, 2025

That’s a completely reasonable thing to happen - no computer melting here. And as you said you can configure your desired behavior.

Did you know you can accidentally write infinite loops in rust?

dzaima · on March 31, 2025

The parent post has a mistake - it should use an "int" type instead of "signed char", otherwise the implicit promotion done by a+b breaks the desired showcase.

That fixed, it wouldn't be an infinite loop if taking the "x+1" to do what the hardware would do, i.e. twos complement; "x < x+1" would become false for x==INT_MAX; and yet clang considers it UB, and thus the loop is UB once it hits the wraparound point, and that can result in unintentional computer melting: https://godbolt.org/z/833KzYY5G

Of course you still generally need to have whatever harmful code you don't want ran here to still be present in the binary somewhere, and "melt_computer();" is a rather weird thing to have, but a "perform_heavy_work();" or "tolerate_high_temps_for_a_bit();" or "void debug_override_temperature_readings(){...}" are more realistic. (of course you might need special permissions to do those, but, uh, it's possible to get said permissions, especially if the point of the software is to operate those things.. and there are plenty of harmful things one can do without special perms)

(edit: actually, even without that char↔int fix, the parent post still shows UB because infinite loops are UB in C (as long as the condition isn't a compile-time constant) (also the x++ is UB on reaching the char limit); copy-pasting it directly into harmless_loop still results in computer melting)

eru · on April 1, 2025

No, infinite loops aren't generally UB in C. Only when they essentially "don't do anything".

Assume that `do_something();` does some IO, and C is perfectly happy with that loop.

You could have also added extra conditions to make the loop look less infinite to the C compiler. Eg do a check in the loop body, and optionally break from it.

> Of course you still generally need to have whatever harmful code you don't want ran here to still be present in the binary somewhere, [...]

Well, assume that we are eg looking at the 'sudo' program. In any case, almost any code can become harmful, if memory gets corrupted, and if attackers control the input.

dzaima · on April 1, 2025

Oops, yeah, need to remove the `do_something();` for the loop to be UB.

> Well, assume that we are eg looking at the 'sudo' program. In any case, almost any code can become harmful, if memory gets corrupted, and if attackers control the input.

Not really; OOB stores, sure, but, other than that, even something basic like username comparison or something resulting in OOB loads on too-long names could technically be completely safe if OOB loads were defined as "either returns an arbitrary value, or crashes", just resulting in superfluous rejections.

And, in a safe language, no matter how wrongly you'd implement flag/argument parsing (besides some equivalent of just straight up passing the args to system() or equivalent), as long as the final thing actually processing the request & comparing passwords doesn't assume anything specific of the parsed internal data format, it could cause no actual exploitable issues.

grandempire · on March 31, 2025

> and that can result in unintentional computer melting:

If your program already has the capability to melt your computer, then you can accidentally trigger that with a bug. That’s a massive if, and that’s a risk of any bug in such a program.

Once again, you can already configure the behavior to avoid this (ftrapv, or fwrapv) which is as much as rust will do.

dzaima · on March 31, 2025

But UB extends that to "a bug anywhere can potentially cause anything to go wrong", whereas traditional logic bugs require the bug to be functionally related to the bad-if-misused code (for which you can do things like carefully vet the potentially-harmful code and be sure your program will never do anything harmful regardless of how many bugs all other code in the program has).

See all the exploits that go from a single OOB write (or other sources of UB) to arbitrary code execution; it's really not that hard. Whereas such spooky-bugs-at-a-distance is just plain impossible in safe Rust or Java.

grandempire · on March 31, 2025

> Whereas such spooky-bugs-at-a-distance is just plain impossible in safe Rust or Java.

There are all kinds of bugs in these compilers all the time. Not to mention what Java might do with a multithreading bug. I don’t think you can assert that.

And once again - you can match those languages intended treatment of signed integers by using a compiler flag.

I think clang should insert a ret at the end of the function. Gcc chooses to infinite loop instead which is far better. This is unfortunate for clang.

dzaima · on March 31, 2025

> There are all kinds of bugs in these compilers all the time. I don’t think you can assert that.

That's a separate discussion; generally you need specific conditions in the source code to have compiler bugs affect your code (and for any given compiler bug it's much more likely for it to visibly break your program (i.e. be a clear release blocker) than quietly break on some specific user input; assuming you have appropriate testing).

I don't think I've ever even heard of any exploitable cases of spooky-bug-at-a-distance in Java, whereas such are commonplace for C projects.

Bad multi-threading in Java still won't produce spooky bugs at a distance - worst you may get is a long torn on a 32-bit boundary on 32-bit systems, or reordered reads/writes.

> I think clang should insert a ret at the end of the function.

......But UB means it's not required to do that... ...that's, like, the main thing the discussion is about. This isn't any form of issue or bug in clang to be fixed, it's intentionally chosen that that behavior is fine on UB.

(to be clear, despite all this, I think UB is a fair enough thing, and am perfectly fine with compilers "exploiting" it for perf (I've even had a case of compiler optimizations on integer overflow fixing a bug! found out when I realized I didn't run certain tests on ubsan/debug builds), and, indeed, outside of OOB stores, is generally not too exploitable; but it's still very far from not being potentially very problematic (and the "potential"ness is generally independent from specific code), especially for projects where safety matters)

formerly_proven · on March 31, 2025

Yeah you definitely wanna review section 3 of the C standard again because it literally doesn't say that lol

grandempire · on March 31, 2025

Please read the context. let me quote myself

> A conforming implementation may chose to melt your computer - and you can choose to not use that implementation. But clang and gcc will never melt your Linux.

And for the 3rd time, a real world case that might ruin your computer is an embedded device without memory protection.

jenadine · on March 31, 2025

Your conception of UB is flawed.

This has nothing to do with memory protection. If a process has the ability to do something, then it may do so. It wouldn't be because GCC and clang choose to melt your computer, it is because your program has a bug, and the consequences of that bug can be anything, including jumping into random executable code that happens to ruin the computer.

grandempire · on March 31, 2025

What are you responding to here?

> If a process has the ability to do something, then it may do so

You agree with me. Once again, on an embedded device writing over a buffer may do absolutely anything. Thanks to memory protection, what a process can do is more limited. The C spec is written for both cases.

No gcc and clang will not insert code to start rm’ing files when you overflow a signed integer.

jenadine · on April 1, 2025

> gcc and clang will not insert code to start rm’ing files when you overflow a signed integer.

It could. If it detect that a code path cannot be taken without causing overflow, it will assume this code path cannot be taken and will optimise it by removing it. No need even to return from the function. If you reach this code anyway, it can run whatever functions is on the binary after it. If you're unlucky that's a function to remove temporary files which would be ran with bogus arguments.

eru · on April 1, 2025

If you are unlucky, it's going to execute whatever user data a malicious attacker added by carefully crafted input.

(Yes, these days that's harder, but you can still do a more complicated version of the same attacks. See return oriented programming.)