You could implement it as a generic 'application metadata' field in the IP header. From the perspective of IP, it one more length prefixed field in the IP header. Routers may interpret it in conjunction with the value of the protocol field; otherwise they are just required to leave it in unchanged in the header (including in all fragments).
For packets that don't want to use it, this is just 1 byte of overhead to set the size to 0.
You could design a network protocol that fragments by capturing a variable number of bytes from the next header, and ICMP already does something like that.
(None of this would fix the real problem with fragmentation, which is that you can't efficiently segment out a large frame without having some kind of reliability layer).
If I was revisiting, I'd probably eradicate the layer and pick a fixed number of flow types with distinct headers and state machines. The layers were a reasonable choice given the understanding of the time, but in hindsight I think you can make a strong case they're cut at the wrong places.
It's just a dumb mistake. All it takes is a "next layer header length" field. It would have been very simple.
You don't even really need that, and as proof, take ICMP ... which was designed as part of IP ... actually does do this. Routers are already required to copy and include the header of the packet that triggered an ICMP error.
If you always chop 100 add 100 then it's even more massively inefficient than the problem it solves. The router would at least need to have every protocol start with a header length value. Otherwise if you just take the first 100 bytes and stick it in the front of each packet and the header was only 57 bytes then you've suddenly got 43 bytes of garbage in the next layer's payload when you reassemble.
Keep in mind, most routers don't even bother supporting existing fragmentation because it's costly to implement in high speed hardware. So while you could theoretically have that dynamic next protocol header length value field it'd only be complicating something hardware makers already think is too complicated to be worth it. Making things unappealing complex is one of the common results of layering violations.
Actually, that's not a bad thing. UDP is small enough to have nearly no overhead, but complex enough to let firewalls do their job. Six of the eight bytes in its header would probably be in the header of any transport layer protocol anyways (only the checksum might be unnecessary).
Wikipedia lists over 100 assigned IP protocol numbers [1], and while it would break existing firewalls, adding a new protocol would certainly require less work than the transition from IPv4 to IPv6. But UDP is already simple enough that there's very little benefit in not just building on that.
For example, IP routers often peek at UDP/TCP port numbers to calculate ECMP flow hashing. This is technically naughty but it's read-only and it's only an optimization that isn't required for correct forwarding.
That decision alone would’ve made fragments so much simpler on network devices and appliances, and much less likely for them to get dropped.