Rust is easy to understand as "a language by browser writers for writing browser...

pcwalton · on March 30, 2025

> I have been working on some software in Rust recently that needs bit and byte manipulation, and we have "unsafe" everywhere and hugely complicated spaghetti compared to the equivalent code in C.

I'm curious what makes this so different from my experience. I rarely ever have to write "unsafe", and I'm writing quite low-level engine code that certainly uses bit manipulation. In fact, crates like bitflags and fixedbitset make it so easy that I tend to get dinged in code reviews for using bit flags when structs of booleans would be simpler :)

eru · on March 30, 2025

> I'm curious what makes this so different from my experience. I rarely ever have to write "unsafe", and I'm writing quite low-level engine code that certainly uses bit manipulation.

Perhaps your usecase is similar enough to eg JavaScript engines? Because that's a usecase that browser writers would at least have in mind?

Ygg2 · on March 30, 2025

It's not. It's Bevy.

What effects this is separating your unsafe into unsafe abstractions. If you're careful you don't have to write too much unsafe. Granted it's not easy.

pclmulqdq · on March 30, 2025

A game engine is much more similar to a browser engine than you think. The operations you have discussed, using bits as flags, are also the most basic forms of binary-level manipulation out there. Things like tagged pointers and bit flags fit nicely and neatly into an encapsulated unsafe abstraction (provided you want to add an extra 1000 lines of code for it).

As far as I can tell, any time you want to rely on the exact binary layout of something in memory, you need unsafe. As a corollary, any time you want to bit cast from one type to another, you need unsafe. This means that things like succinct data structures and building network protocols need quite a bit of unsafe everywhere. The former needs you to do things like "store 14 bits of X here and 12 bits there..." The latter needs control of bit and byte layout because you want to carefully eliminate implementation-defined compiler behavior.

CryZe · on March 30, 2025

Take a look into bytemuck or zerocopy. I haven't used unsafe when doing byte level manipulation in a long time.

pcwalton · on March 30, 2025

> As far as I can tell, any time you want to rely on the exact binary layout of something in memory, you need unsafe. As a corollary, any time you want to bit cast from one type to another, you need unsafe.

No, that's what bytemuck is for. If bytemuck didn't exist, sure, I'd be using a lot of unsafe.

pclmulqdq · on March 31, 2025

Bytemuck handles the latter, not the former. In the applications I am working on, both matter.

pcwalton · on March 31, 2025

I can't quite parse your statement, but if you mean that bytemuck doesn't let you "rely on the exact binary layout of something in memory", then yes, it does: it lets you cast from a Pod type to another Pod type, which exposes the memory layout of both types.

Ygg2 · on March 30, 2025

And a jet engine and IC engine are engines, but putting your car engine into a Boeing and vice versa would be an unwise decision.

I'd argue you're over-abstracting the differences. They have different purposes. Game engines need high performance, while browsers need to enable wide selection of APIs.

eru · on April 1, 2025

> And a jet engine and IC engine are engines, but putting your car engine into a Boeing and vice versa would be an unwise decision.

Sure, but you'd expect the design software for a car engine to be at least half-way relevant for a jet engine in a pinch.

Compared to eg the software an architect would use to design a bridge.

pclmulqdq · on March 31, 2025

Browsers need very high performance, too.

Ygg2 · on March 31, 2025

To be fair to my analogy, both ICE and Jet engines need high RPMs too, and pull power but not on the same level.

Browsers layout and rendering will have some elements similar to that of a game, but it isn't on the same level. A game can usually assume that it's trusted to run that shader, in a browser that's a vector for attack. Security design will impact game engines and browsers differently.

Granted, people are doing their darnedest to make games in the lowest common denominator technology, i.e. Electron.

eru · on April 4, 2025

C was designed to write Unix, but they didn't overfit it for that very specific case only. So it has seen application elsewhere.

Neither was Rust overfitted to exactly only writing browser engines.

motorest · on March 30, 2025

> * Javascript-like syntax and a Javascript-like package manager

I think this is not serious criticism. The "javascript-like package manager" reference actually refers to fixing a major problem with the developer experience playing legacy programming languages such as C or C++. Java has those, .NET has those, every single mainstream programming language has those. Except C or C++.

Rust might be riddled with "the emperor has no clothes" aspects, but having a package manager is not it.

pclmulqdq · on March 30, 2025

That's not a negative aspect of Rust, and I'm not picking out a list of things I dislike about Rust. Making a statement that isn't unequivocally positive does not equal criticizing something.

Cargo is, however, a similarity with JS. On the whole a good one. Also, cargo works much more like npm than maven, for example.

pornel · on March 30, 2025

Cargo has been co-created by Yehuda Katz, who worked on Ruby's Bundler before. Cargo has been designed after npm, so it definitely took lessons from it, but it doesn't make sense to just broadly attribute this to Rust being JavaScript-adjacent.

The Rust syntax is not coming from JavaScript. It even has conflict with it, using `let` and `const` differently, since the `let` in Rust comes from Ocaml, not JS.

Both JS and Rust copy from the same C/C++ roots. Rust's curly brace flavor is more similar to Go and Swift. The original author of Rust liked a lot of languages with different syntaxes, but the C-like syntax has been a pragmatic choice to avoid putting off the target audience of C++ programmers:

http://venge.net/graydon/talks/intro-talk-2.pdf

pjmlp · on March 30, 2025

Most likely because many keep forgetting that bit and byte manipulation in C is a mix of implementation defined and UB, depending on how it is coded.

motorest · on March 30, 2025

What's the problem of using toolchain-specific features instead of behavior defined by the standard? Isn't this the bread and butter of embedded development and the reason why some behavior is purposely left undefined in the standard?

pjmlp · on March 30, 2025

Security exploits is the problem, because what common people that don't read WG14 mailings think the word undefined means, and what everyone else involved with creating compilers understand what they are allowed to do, is not the same.

motorest · on March 30, 2025

> Security exploits is the problem, because what common people that don't read WG14 mailings think the word undefined means, and what everyone else involved with creating compilers understand what they are allowed to do, is not the same.

Your comment lacks credibility. Your hypothetical scenario would only be conceivable if a) a team was well versed enough in C to adopt a specific toolchain to leverage implementetion-defined behavior that leveraged behavior left undefined by the C standard, b) somehow the same team decides on a whim to replace their toolset with some other random toolset without eve being aware of their toolset-specific code. This is far from a realistic scenario, and reads more like mindless complains about UB coming from a place of ignorance.

pjmlp · on March 30, 2025

CVE database, the ongoing liability laws in cybersecurity across several nations, and companies that also happen to be C and C++ compiler vendors, are my credibility.

It is incredible how for the last 50 years we keep getting ad-hominens from folks that think actually to know anything at all about security, and only clueless junior developers don't know what they are doing.

eru · on March 30, 2025

> [...] and the reason why some behavior is purposely left undefined in the standard?

It's one of the saner reasons why there's UB in C and C++. But there's lots of crazier reasons.

motorest · on March 30, 2025

> It's one of the saner reasons why there's UB in C and C++.

It is the reason why the standard purposely leaves some areas undefined.

C and C++ detractors talk a lot about UB but they always show their understanding on the subject is at best superficial. The parrot UB as if it was this major gotcha, when it is literally behavior the standardization committees make a call to not define behavior to purposely leave it open, so that any implementation can still be conforming even if they decide to implement behavior not defined in the standard. It is that simple, but somehow people parrot UB behavior as if it was this major gotcha. Baffling.

pjmlp · on March 30, 2025

Baffling is the ignorance in the ways of WG14 and WG21, while pretending to be a know it all, maybe update yourself on the relevant papers for C2y and C++26, aimed at clearing up references to UB with erranous behaviour, or implementation defined.

formerly_proven · on March 30, 2025

This subthread is kinda strange with people claiming IB and UB are the same, considering that C/C++ have had clearly delineated definitions of undefined behavior versus implementation-defined behavior for decades in their term definitions.

grandempire · on March 30, 2025

No, we do understand the difference and the comment still stands. UB is a choice by the standard committee.

Signed integer overflow is undefined because it’s not even clear if you can detect it happens in all implementations. Do you want a conditional after every integer add?

eru · on March 31, 2025

> Signed integer overflow is undefined because it’s not even clear if you can detect it happens in all implementations. Do you want a conditional after every integer add?

Huh? You can just make it implementation defined, and most implementations would declare it to work like twos-complement wrap-around. Just like unsigned integer overflow is already defined in the standard.

Where does the conditional you are talking about come from? Unsigned integer overflow doesn't have any conditionals either.

Compilers like GCC already support wrap-around for signed integers with a command line option. It's spelled "-fwrapv", and I don't think it involves any conditionals.

grandempire · on March 31, 2025

> You can just make it implementation defined

No you can't. Let me repeat again. One very possible implementation is to crash on signed integer overflow (`-ftrapv`). But if it cannot easily detect it, then it must either: a) check every integer add. b) use some less reliable detection method (random sampling, etc).

To be well defined means the implementation will reliably do the same thing. A procedure which sometimes does one thing or another, is not well defined.

> Unsigned integer overflow doesn't have any conditionals either.

That's because there is no detection needed. The instruction set for unsigned integers is designed to do that.

> -fwrapv

Here is the documentation:

> This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation.

https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

What this means is that if your hardware wraps on signed integer overflow, you can tell GCC that you are perfectly happy with that behavior. Then GCC understands and will handle it just fine.

If your hardware does something else, `fwrapv` will not help you.

dzaima · on March 31, 2025

But an implementation wouldn't choose to specify its behavior for an operation to be something that it knows it cannot do efficiently, if the goal is to be efficient, for extremely obvious reasons.

If your hardware does twos complement, the compiler should choose to define the implementation-defined behavior to be twos complement. If the hardware traps, the implementation-defined behavior would be to trap. And so on.

And, as long as the goal isn't to exploit the overflows for fancy optimizations unrelated to what the hardware does, that'll perfectly fine and produce code that doesn't do anything extra over what you'd expect, and won't ever summon cthulhu (as long as the behavior the implementation defines isn't to summon cthulhu).

(that's all assuming that there's a single best definable specific behavior for a given op for a given architecture, which isn't actually true; e.g. ARM scalar shifts do `shift %= width`, but SIMD shifts shift the other direction on a negative amount, and give 0 on `abs(shift) >= width`, only looking at the low 8 bits of the shift amount)

eru · on March 31, 2025

> (that's all assuming that there's a single best definable specific behavior for a given op for a given architecture, which isn't actually true; e.g. ARM scalar shifts do `shift %= width`, but SIMD shifts shift the other direction on a negative amount, and give 0 on `abs(shift) >= width`, only looking at the low 8 bits of the shift amount)

In that case, your implementation defined behaviour could still be that your compiler is allowed to non-deterministically pick between any of the allowed behaviours. But nasal demons would still be verboten.

grandempire · on March 31, 2025

> your implementation defined behaviour could still be that your compiler is allowed to non-deterministically

No. Non-determinism is a key difference between undefined and implementation defined.

Remember back in school - a function is well defined if it has exactly one output for every input.

Let me turn this question around since we keep going in circles. If the standard said the words you just said instead of “undefined” how would that make C better? Is there some bad implication I am missing?

eru · on April 1, 2025

> No. Non-determinism is a key difference between undefined and implementation defined.

Actually, I think that's the difference in the C spec between implementation defined and unspecified. Eg function arguments can be evaluated in any order the compiler feels like, but it's not UB (because the only thing that's allowed to vary is the order, nasal demons are still verboten.)

You are right that it seems like the C standard expects implementation defined behaviour to be deterministic. That's annoying and should probably be fixed.

> If the standard said the words you just said instead of “undefined” how would that make C better? Is there some bad implication I am missing?

A lot better! If you know that eg shifts (or signed integer addition) always produce a value (even if non-deterministically) out of a documented set of valid options, that would make your program much more easy to reason about for the programmer, than UB's nasal demons that fly back in time.

Informally, many people already try to reason about their C programs like this.

See also how eg malloc returns an address, but apart from some basic requirements on alignment etc, the spec doesn't specify what address gets returned: the implementation can pick non-deterministically.

grandempire · on March 31, 2025

Yes, we are in complete agreement.

eru · on March 31, 2025

> If your hardware does something else, `fwrapv` will not help you.

Yes, and both you and your compiler know what your hardware is doing.

If you have hardware that does something else, you could do another implementation defined thing. Eg one's complement or trapping or whatever it is your hardware is doing.

>> Unsigned integer overflow doesn't have any conditionals either.

> That's because there is no detection needed. The instruction set for unsigned integers is designed to do that.

It's designed to do that on some (well, almost all) computers. Just like it's designed to do the same for signed integers. Most computers don't even have a special instruction for signed integer addition, because they are twos-complement machines.

---

Implementation defined could also be:

'Do whatever your processor does with an 'add' instruction; but don't let anything weird travel backwards in time, or make additional assumptions about execution paths not happening.'

grandempire · on March 31, 2025

> Do whatever your processor does with an 'add' instruction; but don't let anything weird travel backwards in time, or make additional assumptions about execution paths not happening.'

That’s exactly what the standard means by undefined. Anything more you are reading into it is an absurd hypothetical. If you find such an implementation - let me know so I can avoid it. Clang and gcc do reasonable things - as you have pointed out there are helpful flags to configure it too.

eru · on March 31, 2025

> Clang and gcc do reasonable things - as you have pointed out there are helpful flags to configure it too.

Without these specific flags, they don't do 'reasonable things'. They happily assume that UB can't happen and use that information to declare certain code paths dead and optimise them away.

The simplest example is something like this:

    signed char x = 20;
    while(x < x+1) {
        do_something();
        x++;
    }

GCC and Clang when told to optimise will happily assume that you have an infinite loop here. (Unless explicitly told otherwise via the compiler flags we mentioned.)

grandempire · on March 31, 2025

That’s a completely reasonable thing to happen - no computer melting here. And as you said you can configure your desired behavior.

Did you know you can accidentally write infinite loops in rust?

dzaima · on March 31, 2025

The parent post has a mistake - it should use an "int" type instead of "signed char", otherwise the implicit promotion done by a+b breaks the desired showcase.

That fixed, it wouldn't be an infinite loop if taking the "x+1" to do what the hardware would do, i.e. twos complement; "x < x+1" would become false for x==INT_MAX; and yet clang considers it UB, and thus the loop is UB once it hits the wraparound point, and that can result in unintentional computer melting: https://godbolt.org/z/833KzYY5G

Of course you still generally need to have whatever harmful code you don't want ran here to still be present in the binary somewhere, and "melt_computer();" is a rather weird thing to have, but a "perform_heavy_work();" or "tolerate_high_temps_for_a_bit();" or "void debug_override_temperature_readings(){...}" are more realistic. (of course you might need special permissions to do those, but, uh, it's possible to get said permissions, especially if the point of the software is to operate those things.. and there are plenty of harmful things one can do without special perms)

(edit: actually, even without that char↔int fix, the parent post still shows UB because infinite loops are UB in C (as long as the condition isn't a compile-time constant) (also the x++ is UB on reaching the char limit); copy-pasting it directly into harmless_loop still results in computer melting)

eru · on April 1, 2025

No, infinite loops aren't generally UB in C. Only when they essentially "don't do anything".

Assume that `do_something();` does some IO, and C is perfectly happy with that loop.

You could have also added extra conditions to make the loop look less infinite to the C compiler. Eg do a check in the loop body, and optionally break from it.

> Of course you still generally need to have whatever harmful code you don't want ran here to still be present in the binary somewhere, [...]

Well, assume that we are eg looking at the 'sudo' program. In any case, almost any code can become harmful, if memory gets corrupted, and if attackers control the input.

dzaima · on April 1, 2025

Oops, yeah, need to remove the `do_something();` for the loop to be UB.

> Well, assume that we are eg looking at the 'sudo' program. In any case, almost any code can become harmful, if memory gets corrupted, and if attackers control the input.

Not really; OOB stores, sure, but, other than that, even something basic like username comparison or something resulting in OOB loads on too-long names could technically be completely safe if OOB loads were defined as "either returns an arbitrary value, or crashes", just resulting in superfluous rejections.

And, in a safe language, no matter how wrongly you'd implement flag/argument parsing (besides some equivalent of just straight up passing the args to system() or equivalent), as long as the final thing actually processing the request & comparing passwords doesn't assume anything specific of the parsed internal data format, it could cause no actual exploitable issues.

grandempire · on March 31, 2025

> and that can result in unintentional computer melting:

If your program already has the capability to melt your computer, then you can accidentally trigger that with a bug. That’s a massive if, and that’s a risk of any bug in such a program.

Once again, you can already configure the behavior to avoid this (ftrapv, or fwrapv) which is as much as rust will do.

dzaima · on March 31, 2025

But UB extends that to "a bug anywhere can potentially cause anything to go wrong", whereas traditional logic bugs require the bug to be functionally related to the bad-if-misused code (for which you can do things like carefully vet the potentially-harmful code and be sure your program will never do anything harmful regardless of how many bugs all other code in the program has).

See all the exploits that go from a single OOB write (or other sources of UB) to arbitrary code execution; it's really not that hard. Whereas such spooky-bugs-at-a-distance is just plain impossible in safe Rust or Java.

grandempire · on March 31, 2025

> Whereas such spooky-bugs-at-a-distance is just plain impossible in safe Rust or Java.

There are all kinds of bugs in these compilers all the time. Not to mention what Java might do with a multithreading bug. I don’t think you can assert that.

And once again - you can match those languages intended treatment of signed integers by using a compiler flag.

I think clang should insert a ret at the end of the function. Gcc chooses to infinite loop instead which is far better. This is unfortunate for clang.

dzaima · on March 31, 2025

> There are all kinds of bugs in these compilers all the time. I don’t think you can assert that.

That's a separate discussion; generally you need specific conditions in the source code to have compiler bugs affect your code (and for any given compiler bug it's much more likely for it to visibly break your program (i.e. be a clear release blocker) than quietly break on some specific user input; assuming you have appropriate testing).

I don't think I've ever even heard of any exploitable cases of spooky-bug-at-a-distance in Java, whereas such are commonplace for C projects.

Bad multi-threading in Java still won't produce spooky bugs at a distance - worst you may get is a long torn on a 32-bit boundary on 32-bit systems, or reordered reads/writes.

> I think clang should insert a ret at the end of the function.

......But UB means it's not required to do that... ...that's, like, the main thing the discussion is about. This isn't any form of issue or bug in clang to be fixed, it's intentionally chosen that that behavior is fine on UB.

(to be clear, despite all this, I think UB is a fair enough thing, and am perfectly fine with compilers "exploiting" it for perf (I've even had a case of compiler optimizations on integer overflow fixing a bug! found out when I realized I didn't run certain tests on ubsan/debug builds), and, indeed, outside of OOB stores, is generally not too exploitable; but it's still very far from not being potentially very problematic (and the "potential"ness is generally independent from specific code), especially for projects where safety matters)

formerly_proven · on March 31, 2025

Yeah you definitely wanna review section 3 of the C standard again because it literally doesn't say that lol

grandempire · on March 31, 2025

Please read the context. let me quote myself

> A conforming implementation may chose to melt your computer - and you can choose to not use that implementation. But clang and gcc will never melt your Linux.

And for the 3rd time, a real world case that might ruin your computer is an embedded device without memory protection.

jenadine · on March 31, 2025

Your conception of UB is flawed.

This has nothing to do with memory protection. If a process has the ability to do something, then it may do so. It wouldn't be because GCC and clang choose to melt your computer, it is because your program has a bug, and the consequences of that bug can be anything, including jumping into random executable code that happens to ruin the computer.

grandempire · on March 31, 2025

What are you responding to here?

> If a process has the ability to do something, then it may do so

You agree with me. Once again, on an embedded device writing over a buffer may do absolutely anything. Thanks to memory protection, what a process can do is more limited. The C spec is written for both cases.

No gcc and clang will not insert code to start rm’ing files when you overflow a signed integer.

jenadine · on April 1, 2025

> gcc and clang will not insert code to start rm’ing files when you overflow a signed integer.

It could. If it detect that a code path cannot be taken without causing overflow, it will assume this code path cannot be taken and will optimise it by removing it. No need even to return from the function. If you reach this code anyway, it can run whatever functions is on the binary after it. If you're unlucky that's a function to remove temporary files which would be ran with bogus arguments.

eru · on April 1, 2025

If you are unlucky, it's going to execute whatever user data a malicious attacker added by carefully crafted input.

(Yes, these days that's harder, but you can still do a more complicated version of the same attacks. See return oriented programming.)

grandempire · on March 30, 2025

You are upset we aren’t familiar with papers and proposals which are not yet agreed on as standard? And for unreleased C++ versions?

jenadine · on March 30, 2025

What's upsetting is comments that confidently state inaccurate or clearly wrong statements, thereby spreading misconceptions

motorest · on March 31, 2025

> What's upsetting is comments that confidently state inaccurate or clearly wrong statements, thereby spreading misconceptions

The bulk of the people in this thread clearly have no grasp of what UB is, even though they are very vocal in the way they parrot their misconceptions and outright absurdities. They would never write half the absurd and misguided claims they did if they were even aware the standards explicitly use the term "non-portable" to define UB.

pjmlp · on March 30, 2025

Nah, thankfully liability legislation is finally happening so folks will actually bother to learn standards and how compilers approach them.

Or eventually face consequences.

grandempire · on March 30, 2025

Now we are waiting for new legislation and overhaul of the tech system to make your point?

pjmlp · on March 31, 2025

Nah, it is already here depending on the country one lives on, I don't need to make any point.

Also I stand by my comments even if they hurt feelings from people which probably never openened a single copy of ISO standards, not even the table of contents.

grandempire · on March 31, 2025

Ok, I thought you were telling us we wont get UB until we read papers which are proposals for future standards. If you find any corrections in this thread about existing standards, please comment.

wrs · on March 30, 2025

You’re describing implementation-defined behavior, not undefined behavior. IB does something but the committee doesn’t say what. UB does anything and the compiler can assume you never intended to cause it.

grandempire · on March 30, 2025

Exactly. The classic example is invalid pointer deref. It’s too costly to check every deref against every allocation (outside of special debug modes). So the system usually can only detect if there is a virtual memory page fault, in which case it can crash.

In embedded systems without virtual memory there is no validity checking at all. Or it could periodically check (random sampling).

So the standard makes the reasonable choice to leave it undefined. If you can detect it and crash, that’s great. If not and you accidentally overwrite your programs instructions, anything can happen.

Rust only avoids UB in so far as it relies on default clang behavior on modern hardware.

jenadine · on March 30, 2025

> Rust only avoids UB in so far as it relies on default clang behavior on modern hardware.

That's not true.

Rust behaviour avoids UB by not compiling with invalid reference deref because it checks life time.

Clang's behaviour is likely a crash but anything can still happen. It is literally undefined.

grandempire · on March 30, 2025

> Clang's behaviour is likely a crash but anything can still happen.

You act as if clang itself is random. Clang will do and handful of things none of them which will melt your computer.

A given implementation may choose to do anything. Clang and gcc on major operating systems do reasonable things.

> It is literally undefined.

Ok but I’m explaining why undefined is the right thing.

jenadine · on March 30, 2025

The code generated by clang can do anything.

Ok, clang will not directly melt my computer because clang just generates code, but imagine this:

     if (temperature_too_high) 
          lower_temperature();

if somewhere else the code access invalid pointer, clang may decide to remove this condition altogether because it thinks it must be dead code, for example. And running this code will melt the computer even though the programmer thought this wouldn't be possible.

grandempire · on March 30, 2025

That’s not true.

A conforming implementation may chose to melt your computer - and you can choose to not use that implementation. But clang and gcc will never melt your Linux.

Once again, a real life example where that is relevant is an embedded device with no memory protection. If you overwrite a buffer you can overwrite the OS code - breaking your computer and requiring you to flash the memory.

It seems there is a lot of misguided accusations going around about who misunderstands UB.

eru · on March 31, 2025

> But clang and gcc will never melt your Linux.

If you use gcc or clang to compile your kernel, they will happily melt your Linux.

grandempire · on March 31, 2025

If you're referring to what might happen when you overwrite a buffer in privileged kernel code, yes - but that is true regardless of clang, gcc, or what the C standard says about UB.

steveklabnik · on March 30, 2025

Re Rust, that’s simply not true. Rust relies on compile time validity checks that have nothing to do with virtual memory.

grandempire · on March 30, 2025

You confused two parts. I am not saying that Rust has the same pointer deref, I’m saying that if you try to specify rust you will find parts that implicitly rely on clang or hardware default. In other words the behavior is unspecified.

steveklabnik · on March 30, 2025

The borrow checker has a formal proof and it is completely independent from hardware.

grandempire · on March 30, 2025

Gatorade gives you electrolytes.

eru · on March 31, 2025

> Exactly. The classic example is invalid pointer deref. It’s too costly to check every deref against every allocation (outside of special debug modes). So the system usually can only detect if there is a virtual memory page fault, in which case it can crash.

You could still define what happens.

Eg you could define that whatever error might happen because of invalid pointer deref can't travel back in time. (UB famously can travel back in time: UB at any time during the execution makes the whole execution UB.)

dzaima · on March 31, 2025

UB doesn't allow arbitrary time travel; any time travel you see should be explainable by happening to run code that undoes what was done previously, and where it's not possible to explain it as such it's a compiler bug (ref: wg14 member: https://news.ycombinator.com/item?id=40836898).

This is now clearly standardized, but even without that it's pretty trivial to see how compilers would generally abide by this - if you call an external function that does arbitrary things, it may exit(), so the call can't be optimized out even if it's followed by UB. (printf & co are effectively arbitrary calls for these purposes for multiple reasons). And all other behavior is just writing things to memory, which can be undone by happening to write what was there before. (ok there's also volatile, and a bug in clang (not gcc though): https://github.com/llvm/llvm-project/issues/102237)

eru · on April 2, 2025

Check the ensuing discussion under the comment you linked to. Some limited time travel is still allowed for UB.

dzaima · on April 2, 2025

There's still no time travel in any of those posts; some may require an explanation of "the compiler replaced the UB division with a call to 'bar'" or something, but "calling 'bar'" is still a subset of the "anything" that UB can result in. If there's a specific example you think doesn't follow that, I'd appreciate a specific reference.

(and singron's mention of SIGSEGV handling is entirely irrelevant; the C standard only cares about the abstract machine; in general real memory state can temporarily appear in arbitrarily weird states in practice if interrupted by a signal even without UB, e.g. https://godbolt.org/z/91ModhbEY will have all a[0..15] written even if there's a handled trap on the write to b[0] (or otherwise an interrupt between the two SIMD writes) (using restrict for simplicity, but taking an 'int n' argument and using that for the loop bound will result in ~the same thing without restrict, just with less direct assembly); the same applies even to Rust, though of course with Rust you shouldn't be able to have a situation where you trap on a safe write; could still have a random interrupt happen in the middle though)

immibis · on March 30, 2025

And then it was a decision on the part of compiler vendors to define it to do insane things like time travel. Not the standards committee. Compiler vendors could have just as well defined it to wrap.

jenadine · on March 30, 2025

Some compiler decided to define overflow as wrapping. Such as GCC/clang when passing the -fwrapv flag.

Most projects don't use that flag though, why not?

Nite that if you assume -fwrapv, you're not writing in C anymore, you're using a vendor specific dialect.

immibis · on March 30, 2025

Everyone writes in vendor-specific dialects of C. Both POSIX and Win32 define behaviour that isn't defined in C (such as rules for unaligned pointers), while also undefining behaviour that is defined in C (such as what happens if you call fopen when one of the functions in your program is called "open"). Even on embedded platforms, there's no "main" function in freestanding C - your entry point is a vendor-specific extension.

eru · on March 31, 2025

> Everyone writes in vendor-specific dialects of C.

That's not true. Some people try to write fully spec compliant C programs that would behave the same on every compliant C compiler.

immibis · on April 2, 2025

Do they define functions called "open"?

eru · on March 31, 2025

Time travel is a consequence of UB anywhere in an execution making the whole execution UB.

The committee could have said that UB only makes the rest of the execution UB, not what happened before.

eru · on March 31, 2025

> It is the reason why the standard purposely leaves some areas undefined.

No. It is _one_ of the reasons, not _the_ reason.

For example, signed integer overflow. If all you were worried about was the behaviour of the underlying hardware, you would declare this _implementation defined_. (With the vast majority of implementations today picking twos-complement.)

Instead, even compilers that target two-complement machines leave this deliberately undefined, so they can target exploit this corner for performance gains, without actually have to properly prove that leaving out certain checks would be sound under twos-complement.

pclmulqdq · on March 30, 2025

Exactly this. Most of the art of it is avoiding UB and sticking to implementation-defined behavior.

1932812267 · on March 30, 2025

I love your username, btw :)