Hacker Newsnew | past | comments | ask | show | jobs | submit | a_t48's commentslogin

Hi Guanming/Bill. Would love to chat about what you're doing for actually running the models. I'm in a similar space, speeding up the `docker pull` component of inference deployment on edge devices (among other things!) If you're interested, shoot me an email at kyle@clipper.dev

Does COPY —-link allow this with BuildKit? In principle it should, in practice I’d guess it ends up pulling the base image.


I can't comment on BuildKit, unfortunately, since I haven't used it. My experience comes from building bespoke systems (an image builder and a custom registry) fully from scratch, because we needed to have full control in order to achieve the performance we were aiming at.


I haven't gone full custom (yet), just forks (except for the pull client), but totally understand.


Controlling both the builder and the registry is super nice btw, because they can work together. If the builder knows some of the layers already exist in the registry, it merely has to create and push the remaining ones (without downloading any of the other layers, not even those from the base image). That gives you near-instant builds once the biggest layers are cached in the registry!

Since builds usually happen in CI, and pulls happen elsewhere (e.g., a kubernetes node), in the end layers are only downloaded when the resulting container image is actually used.


Base image independent layers is something I’ve pondered about, but doesn’t feel compatible with things like apt. This conversation is giving me more reason to go implement lazy/fuse base layers though. My exports are already pretty fast due not not using tar+deduplication with similar layers, but pulling the base can still take several minutes.


You can, by using FUSE and lazy pulling files as they are opened. I'm working on doing this, myself.


The existing suggestions are good, but what about Cribbage?


A little bit of a nit - when tracking progress you can ditch the mutex by using atomic integers for the download progress. Instead of using locks, your test for a chunk being done becomes something like "downloaded == chunk_size".

You also never call UpdateChunkProgress, FYI! The code is a little confusing to read because of that dead variable, at first I thought that chunks are never resumed. ProgressTracker and State/ChunkInfo do similar things, if I were in your shoes I'd make it so that there's clear ownership over where progress is actually tracked.

...I'd probably move State to be saved on the _start_ of the download, have it solely be responsible for tracking the arguments that the download was called with. Rename it to DownloadArguments, delete all the runtime state tracking, as it's already inferred by the downloader on start.


I've found that tar processing tends to dominate the time used to do anything with standard OCI layers. I have a more efficient format (that splits apart the layer into metadata+chunks) that I'm open sourcing soon if y'all are interested in using it.


interested. is the split for dedup, parallel pulls, or lazy loading specific files? maybe all.

we've played with some chunking ideas on our end but haven't landed on a format. drop a link when it's out.


All of the above, plus being able to reflink to skip copies of large files, plus not having to round trip from disk a few times for tar layers, plus a number of other side benefits. Only using lazy loading for buildkit right now, as it does require FUSE and I want it to be opt in (for robotics contexts, for instance, you never want to lazy load).


I wonder if they aren't using the macOS keychain, while Safari does.


A friend sent this to me yesterday - I was very disappointed that the video didn't show off Minecraft in Minecraft.


It is minecraft, even if you open Minecraft it will not work.


If you're the only person using the project...not really doing it wrong, your preference. If you have to share it, you can encode the supported python version(s), exclude-newer, etc, in pyproject.toml. Using a lockfile also helps against supply chain attacks - restricting the danger window to only when running upgrade rather than on any install. It also stops accidental breakages from occurring. If you lock once, you know that anyone else can install using the same lockfile, no matter what other versions have gone out in the wild. (Pyproject also can encode things like package groups, which IIRC doesn't work so well with requirements.txt)

I personally don't really use `uv add` and `uv lock --upgrade`, in the past I'd just hand edit the pyproject to pull forward my dependencies and let `uv lock` figure out the rest.

A good third of my last job was spent chasing after projects that weren't using pyproject. It typically turned multiple steps of "install python, upgrade pip, install this one special library, install requirements" inside of a Dockerfile or bash script into one `uv` command. And was more reliable afterwards, to boot!


This is a continual fight for me. At nearly every company I've had to compromise on using a graveyard repo for packages within a monorepo, even though git has the whole history already.


The problem with history is that you need to know when to look. If you're looking for some old code that you know existed but you don't know exactly what it was, you can't just browse to go and find it.


Sure, but beyond a certain point the code that's there isn't just drop in compatible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: