The `realloc` problem isn't the only one - it also breaks many formerly-well-def...

usefulcat · on June 27, 2024

> it also breaks many formerly-well-defined programs that use `malloc_usable_size`

Does it? According to the documentation for malloc_reusable_size, it can be used to find out the actual size of an allocation, but you still have to call realloc before writing to bytes beyond the size passed to malloc.

"This function is intended to only be used for diagnostics and statistics; writing to the excess memory without first calling realloc(3) to resize the allocation is not supported."

o11c · on June 27, 2024

Yes, that's what the documentation has been retroactively changed to say. It used to say that the memory was usable immediately.

Also, `realloc` is not guaranteed to actually operate in-place.

dwattttt · on June 26, 2024

> give me a buffer of semi-arbitrary size and tell me how big it is; I promise to resize it later

I'm not sure how you could use this usefully. If you don't care what size you get, why would you allocate in the first place? And if you do have a minimum size you need right now, and a need for it to be bigger later, isn't that what a malloc/realloc dance is for?

o11c · on June 26, 2024

Cases where the exact buffer size doesn't matter are ubiquitous, for example:

* read a file(-like) in streaming mode. Whenever the buffer is empty, fill it. The actual allocated size does not matter at all for most kinds of file.

* push objects onto a vector when the capacity is used up, reallocate at a larger size and keep pushing. The actual allocated capacity doesn't matter at any point.

* implement a bloom filter for approximate set membership. If the allocator happens to give you a little more than your mathematical estimation for the chances you want, you might as well use it.

In fact, I dare say: every allocation size that is neither `0` or `sizeof(T)` doesn't fundamentally care about the size (a given implementation may care, but if the standard bothered to implement useful new functionality we would change the implementations).

This is unlike, say, my desire for skewed-alignment allocators, which is but not particularly useful for most programs.

dataflow · on June 27, 2024

> In fact, I dare say: every allocation size that is neither `0` or `sizeof(T)` doesn't fundamentally care about the size

Not true? When you copy an array (common case: string) that is passed to you, you want the copy to the the same size. No more (that wastes memory) and no less (then you don't have a copy), just as with sizeof(T).

nickelpro · on June 27, 2024

I hadn't thought about the `sizeof(T)` point before, and I can think of a handful of exceptions, but it's a great and expressive rule of thumb.

freeone3000 · on June 26, 2024

Example: reading a file line-by-line. You don’t, can’t, know how big a line is. Your best option is to allocate some random chunk of memory, like 4 kilobytes, read 4 kilobytes from the file into memory, and see if you happened across a newline in there. If you did, shuffle everything around a little / realloc / ringbuffer shenanigans; do your favourite. If you didn’t, make the buffer bigger (by 1.5x? 2x? Log2x?), and try again.

This dance is super common with variable length protocols over unframed streams - ie, most things over tcp. So this is an exceptionally common pattern in IO operations.

Other common times this pattern happens: finding all items in a list that satisfy a predicate; tracking items a user has added to an interface; cycle detection in graphs; …

dwattttt · on June 27, 2024

I don't think this or the sibling would be improved with a function that gave you an arbitrarily sized allocation where you also need to query the size of that allocation though? In all the cases you need to know what the size of the buffer is, even though you don't care whether it's 1kb, 4kb, or 8kb (although I imagine you'd care if you got 16b or 16gb)

nickelpro · on June 27, 2024

It's not about querying the size, it's about what the allocator has available without having to ask the operating system for new pages.

It's got a contiguous block of 6371 bytes? Cool I'll take that, the specific size didn't matter that much.

nine_k · on June 27, 2024

E.g. "Give me something in the ballpark of 4 KiB; more is OK, but tell me how much".

It would allow the allocator to return a chunk it has handy, instead of cutting it to size and fragmenting things, only to be asked to make the chunk larger a few microseconds later.

It will save both on housekeeping and fragmentation handling if the caller knows that the chunk will likely need to grow.

celrod · on June 27, 2024

C++23 added `allocate_at_least`: https://en.cppreference.com/w/cpp/memory/allocator_traits/al...

I'm not sure if any standard libraries have an implementation that takes advantage of the "at least" yet.

kzrdude · on June 27, 2024

It's a performance optimization for growing datastructures. If you can use all the space that was actually allocated, you can (on average) call realloc less often when the container is growing dynamically.

uecker · on June 27, 2024

Life would be much simpler for many us if people would stop complaining on internet forms and starting contributing to open-source or standardization efforts.

matheusmoreira · on June 28, 2024

Over many years I've made several attempts to contribute to GNU projects and I don't think I've ever succeeded. At some point I started to wonder if I just suck at all this. I don't seem to encounter any problems when I interact with any other project though so that can't be it.

And I don't mean simple reports either, I mean I've sent patches which ranged from bug fixes to minor and major features. Most recent example: sent a patch that added a separate path variable to bash specifically for sourcing bash scripts, thereby creating a simple library/module system. At some point people called my idea "schizophrenic" and I just left.

I was developing some GCC patches to add builtins that emit Linux system call code. This is just something I'm personally interested in. Lost the work when my hard drive crashed and I'm unsure whether to restart that little project. The people on the mailing list didn't seem to agree with it very much when I tried to justify it.

Honestly the idea that I might spend all this time and effort figuring out and hacking on the absolutely gigantic GCC codebase, only to end up with zero results to show for it, makes complaining on internet forums a very attractive alternative. Who knows, maybe someone who's already involved will read the comments and be convinced. Someone like you.

uecker · on June 28, 2024

Fair enough. It is just that got involved for similar reasons. I wanted certain things to work and nobody listened to me or fixed the bugs I filed. Now I know first hand what tremendous amount of work it is to change anything, I am not so excited about comments such as "the standards committee should simply". It is never that simple for a variety of reasons. It would also be very important to file bugs to compilers and this is a simple way to contribute, even if this can be frustrating because often nothing happens for a long time.

matheusmoreira · on June 29, 2024

I see what you mean. Yeah it's extremely difficult. I guess I learned that the hard way. Thought everything would proceed smoothly if I just showed up with the code already working. All they had to do is review it and apply it if they found no issues, right? Not so.

I've also filed a few bugs and feature requests on the LLVM issue tracker and GNU mailing lists. LLVM has like a trillion open issues, it's like shouting into the void. GNU almost always tells me the existing stuff should be enough and that I don't really need whatever it is I'm asking for.

For example, I requested a way to make the linker add extra PT_NULL segments to the ELF output file so that I could find and patch those segments later. LLVM has yet to respond to the issue I created, and GNU says the arcane linker scripting is enough even though the example they gave me didn't work. Only person who responded favorably to me was the maintainer of the mold linker, he added a couple lines of code and suddenly I could easily patch ELF segments. As a result, I fully integrated the mold linker into my makefile and switched to it.

My point is this:

> I am not so excited about comments such as "the standards committee should simply"

I can relate with that feeling. It's just that we're also not so excited by comments along the lines of "pull requests welcome". That's a direct challenge to rise up and directly participate. When we do, it often happens that we are not even given the time of day. To say it's frustrating would be an understatement.

I'm not going to enumerate every negative experience I've had, better to leave it all in the past. However, I've concluded that many times effort does not equal reward, and that it might just be easier to ask the insiders to change things instead and leave things be if they don't seem convinced. Change is difficult enough for the insiders to make, for an outsider it's orders of magnitude more difficult.

uecker · on June 30, 2024

This is certainly true. I am also not in the position to see all my contributions accepted. But it is also unclear how to improve the situation. Big open-source projects certainly are dominated by commercial interests. I found GCC compared to others relative welcoming to outsiders - there are still quite some people with the original hacker spirit involved. To some degree one has to accept that in a community, one can not always get what one wants. But I certainly I agree though that it should be easier to get involved and be included. In my opinion there are two major problems: First, the overall community should value openness more and not simply prefer projects which may have some technical advantage but are then often controlled by only a couple of commercial entities. If we do not value openness, we do not get it. Second, I think complexity is a huge problem. I think we have to decompose and split our software into smaller parts, where each can be more easily be replaced. This would take away power from the people controlling huge frameworks such as LLVM and I think this would be a good thing. And finally, people need to be braver. You are not getting heard when you give up too easily. And if do not get your patch in, maybe create your own project, or maintain your fork, etc.

matheusmoreira · on July 2, 2024

> But it is also unclear how to improve the situation.

Then please allow me to make a few suggestions based on my limited experience.

I'll begin by saying it's not really my intention to just show up out of nowhere and demand things. When everything is done, I'll go away and the maintainers will remain. They'll continue being responsible for the project while I get to not think about it anymore. It would be disrespectful to demand that they maintain code I wrote.

The key point I want to make is: it's profoundly demotivating when maintainers don't even engage with the contributors. People spend time and effort learning a project, making the change and sending in the patches in good faith. We ask only for their genuine consideration.

If the patch has issues, it's more than enough to reply with a short review detailing what needs to change in order for the patch to be considered. The most likely result of such a review is a v2 patch set being sent with those exact changes implemented. That is the nature of peer review.

What usually happens in my experience is the patches get straight up ignored and forgotten about for an untold amount of time. Then the maintainer suddenly shows up and implements the thing himself! That's nice, in the end I got the feature wanted I guess. I just didn't get to become a contributor. I leave wondering why I even bothered to do all that work.

That treatment makes me feel like I'm beneath them, beneath their consideration. I spent days discussing a patch on a mailing list. One person had numerous objections to the idea I was proposing, I tried to address them all but then it turned out he hadn't even read the patch I sent. That seriously almost made me quit on the spot.

The word peer in peer review is extremely important. Reading the work, considering it and offering genuine thoughts, this is how people treat their peers. That is true respect, even when nothing but criticism is offered. Someone who refuses to even read our code doesn't really see us as equals. We're not fellow programmers with ideas worth considering, we're schizophrenics posting crazy talk and submitting nonsense code.

Someone contributed a bug fix to one of my projects about 6 months ago. As soon as I saw the pull request, I acknowledged it and reviewed it. I requested some simple changes, he made those changes and then I accepted it. I made it a point to engage with him so that he could get the commits into the repository and be fully credited for the contribution in the git history. His eyeballs rendered one bug shallow, I felt like that was the least I could do. I think this is a good way to improve the situation. It's certainly the way that I would like to be treated.

It's also important to be honest with oneself and the scope of the project. Sometimes people want perfectly reasonable things but the maintainer has no plans to implement them because of limited time or because they feel it's out of scope for the project. I think it's important to be polite but firm in those cases. If there's no chance that a feature will be accepted, maintainers need to make the decision as early as possible and communicate it clearly. That way people won't waste time and effort implementing a feature that will never be accepted.

> And finally, people need to be braver. You are not getting heard when you give up too easily. And if do not get your patch in, maybe create your own project, or maintain your fork, etc.

Absolutely agree. Especially the part where people create their own projects. My website is powered by a fork of an unmaintained templating engine. I actually want to rewrite all of it some day. Hopefully in my own programming language which I'm also working on as often as time allows.

o11c · on June 28, 2024

Is "don't go out and actively break programs you used to promise would work" such a great ask?

uecker · on June 28, 2024

About what are you talking exactly? What was actively broken?

o11c · on June 28, 2024

`__builtin_dynamic_object_size` has been buggy since it was first implemented, breaking programs that relied on the previously-documented behavior of `malloc_usable_size`.

Instead of fixing the bug, the developers decided to remove the documentation and replace it with something else, breaking backwards compatibility.

Now, it's not that we can't ever break backwards compatibility - but it needs to be done with great deliberation and a transition period, and an alternative needs to be provided. I gave an example of an alternative.

uecker · on June 29, 2024

This seems not entirely accurate. "malloc_usable_size" was only recommended for use for statistics before the change in man page: "Although the excess bytes can be overwritten by the application without ill effects, this is not good programming practice:" and "The main use of this function is for debugging and introspection". The new version makes it clearer. You can find the change here: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/...

Also no code is "actively broken" by a documentation change. If it happened to work before it still works today. Also I do not see how __builtin_dynamic_object_size is broken. It works as intended and can be used with conforming code just fine. It is simply not compatible with some questionable use of "malloc_usable_size".

o11c · on June 29, 2024

It's not the documentation change that's the problem. It's the fact that they changed implementation-defined behavior to undefined behavior in the first place (then changed the documentation to follow). Or equivalently, they changed "recommended" to "required".

In particular, the "without ill effects" is no longer true. It's possible to #ifdef to detect broken libc/compiler combinations, but I'm not confident that avoiding explicit use of __builtin_dynamic_object_size will prevent optimizations from taking advantage of false assumptions based on __attribute__((malloc)).