How to train your program verifier

woodruffw · 2026-02-23T04:06:30 1771819590

At a very quick look, no evidence is given that the "bugs" found in requests are in fact reachable, i.e. not prevented by construction. And sure enough, the very first one is impossible because of a validating guard[1]: `address_in_network` only gets called after `is_valid_cidr`, which enforces the presence of a slash.

I think we should hold claims about effective static analysis and/or program verification to a higher standard than this.

[1]: https://github.com/psf/requests/blob/4bd79e397304d46dfccd76f...

JimDabell · 2026-02-23T06:08:05 1771826885

> the very first one is impossible because of a validating guard[1]: `address_in_network` only gets called after `is_valid_cidr`, which enforces the presence of a slash.

It’s correct to flag this code. The check is performed manually outside of the function in question. If you call the function directly, the bug surfaces.

There is no mention in the function documentation of the validation requirement, making it easy to call incorrectly. Also, if it is required to call the validator before calling this function, then the function could just call it itself.

In short, it’s possible to make this code safe by definition, but instead it relies upon the developer to always make the undocumented right choices every single time it is called. I would expect something more rigorous from verified code.

woodruffw · 2026-02-23T13:08:52 1771852132

> It’s correct to flag this code. The check is performed manually outside of the function in question. If you call the function directly, the bug surfaces.

No, that’s just called a precondition. I’m not aware of a single program that doesn’t have functions like these, particularly internal APIs.

(It should go without saying, but it’s not even an issue in this case: it’ll cause an IndexError, but so will thousands of other APIs. Python very explicitly doesn’t have exceptions in its type contract; anything can always raise anything.)

> I would expect something more rigorous from verified code.

Nobody said that requests is “formally verified.” The only place where that claim is made is in the AI-generated blog post above.

sebastianmestre · 2026-02-23T10:59:21 1771844361

> I would expect something more rigorous from verified code.

I think you just want the illusion of safety :p

A big advantage of verified code is that it enables you to write the sketchy and dangerous-looking code BECAUSE it's proven correct

In fact, skipping as many safety checks as possible is highly desirable. For performance, yes, but also because it's less code to maintain.

Our tools already do this to some extent, for performance. E.g. compilers that remove your bounds or type checks in the generated code when it can prove it's not needed.

teraflop · 2026-02-23T07:07:17 1771830437

That doesn't mean there's a problem with the code, only with the documentation. So the article is wrong to call it a "real bug". At most it's poor code style that could theoretically lead to a bug in the future.

There's nothing inherently wrong with a function throwing an exception when it receives invalid input. The math.sqrt function isn't buggy because it fails if you pass it a negative argument.

Someone · 2026-02-23T08:16:30 1771834590

> That doesn't mean there's a problem with the code, only with the documentation.

I disagree. If the obvious way to use an API is the incorrect way, there is a problem with the code.

If you must call A each time before calling B, drop A and have B do both things.

If you must call A once before calling B, make A return a token that you then must pass to B to show you called A.

As another example, look at https://blog.trailofbits.com/2026/02/18/carelessness-versus-... (HN discussion: https://news.ycombinator.com/item?id=47060334):

“Two popular AES libraries, aes-js and pyaes, “helpfully” provide a default IV in their AES-CTR API, leading to a large number of key/IV reuse bugs. These bugs potentially affect thousands of downstream projects.”

Would you call that “poor code style that could theoretically lead to a bug in the future”, too?

woodruffw · 2026-02-23T13:34:18 1771853658

The API in question is almost certainly internal. The only reason it isn’t marked as such is because Python doesn’t have great facilities for that kind of encapsulation.

Invariant-preserving types are always going to be the right way to eliminate certain classes of bugs, but they’re also completely overkill in this context given that the “bug” in question doesn’t even cause unsound program behavior; it just raises an exception, which is completely sound and well-defined.

seanmcdirmid · 2026-02-23T05:40:26 1771825226

Most (all?) static analyzers are conservative, and reducing your false positive rate is always a struggle. You should never expect a false positive rate of zero (it’s probably impossible to not have false positives), but you shouldn’t be presenting your false positives as successes either.

SkiFire13 · 2026-02-23T11:51:28 1771847488

> it’s probably impossible to not have false positives

It's possible to have no false positives or no false negatives, but it can be proven it's impossible to have neither of them.

woodruffw · 2026-02-23T06:00:01 1771826401

Sure, but this one doesn’t pass the sniff test. I’ve written plenty of static analysis tools (including ones that do symbolic execution), and one of the first things you do to ensure that your results are valid is create some model of tainting/reachability. Even an analysis that’s 1-callsite sensitive would have caught this and discarded it as a false positive.

(In case it isn’t clear, I’m saying this is slop that someone whipped up and didn’t even bother to spot check.)

saithound · 2026-02-23T05:19:48 1771823988

What if you asked your favorite AI agent to produce mathematics at the level of Vladimir Voevodsky, Fields Medal-winning, foundation-shaking work but directed toward something the legendary Nikolaj Bjørner (co-creator of Z3) could actually use?

Well, you'd get this embarrassing mess, apparently.

geraneum · 2026-02-23T09:53:26 1771840406

That’s because they didn’t add “and don’t make mistakes!”.

And yes, the exclamation mark matters!

grey-area · 2026-02-23T11:11:26 1771845086

Should have used ultrathink. I'm disappointed this is not called deep thought.

grey-area · 2026-02-23T07:01:33 1771830093

I miss the days when humans submitted things they had done to this site, instead of generating long slop articles in 5 minutes: ‘LLM‑based code synthesis—while mind-numbingly effective—’ about slop code they generated in 5 minutes (or worse in hours) with foolish prompts:’Produce mathematics at the level of Vladimir Voevodsky, Fields Medal-winning, foundation-shaking work’.

Should we even read this or should we get an LLM to summarise it onto a few bullet points again?

This bit was interesting in illuminating the human authors’ credulity (assuming they believe in their own article):

‘The central move was elegant: stop asking only “is the system safe?“, start asking “how far is it from safety?“‘

This ersatz profundity couched in a false opposition is common in generated text - does it have anything at all to do with the code generated or is it all just convincing bullshit?