Hacker Newsnew | past | comments | ask | show | jobs | submit | fleabitdev's commentslogin

Experienced software developer, currently available for freelance work.

    Location: UK
    Remote: Yes
    Willing to relocate: No
    Resume/CV: By request
    Email: (my username) at protonmail dot com
Particularly good at cleaning up messes! Suitable contracts might include legacy module rewrites, performance improvements, or the daunting new feature which nobody on your team wants to tackle.

I can be flexible on time zones. No interest in relocating long-term, but I'd be happy to visit your team in person for a few days to break the ice.

My specialist skills:

- The Rust language, which has been my daily driver for more than a decade.

- Multimedia (video, audio, 2D rendering...)

- Performance optimisation (SIMD, GPGPU, parallel programming...)

Fields in which I'm highly experienced, but below specialist level:

- Web development, with a frontend bias (TypeScript, React, WebGL, WebAssembly...)

- Native development, especially low-level Win32.

- Communication and technical writing, both learned in a previous career.

I also have a modest level of experience in computer vision, greenfield R&D, game engine development, programming language development, and data compression. I can offer a 50% discount for any contract which seems highly educational; my current areas of interest include DSP, embedded programming, Svelte, functional languages, Swift, and React Native.

Thanks for reading, and I look forward to hearing from you :-)


Wikimedia Commons has this feature. Editors can manually bless certain combinations of traits as "subcategories".

For example, https://commons.wikimedia.org/wiki/Category:Paintings_of_cas... contains the subcategories "Paintings of castles by country" (nested hierarchy), "Frescos of castles" (a medium), "Paintings of Château de Chillon" (a subject), and "Young Knight in a Landscape by Carpaccio" (multiple views onto a specific item). Each item may appear in multiple subcategories. As far as I can tell, the UI won't let you search for frescos of Italian castles (unless somebody's made a subcategory for that), or view all paintings of castles regardless of their subcategory.

I'm not very fond of this approach. I'd prefer for each item to have an unstructured set of tags ("fresco", "depiction of a castle", "depiction of Italy"), with automatic derivation of parent tags ("fresco" implies "painting") and the option to search by multiple tags. It should be possible to automatically discover tags which best refine a search, so that the UI can still suggest them to the user, as it does today.


> I'd prefer for each item to have an unstructured set of tags ("fresco", "depiction of a castle", "depiction of Italy"), with automatic derivation of parent tags ("fresco" implies "painting") and the option to search by multiple tags.

It's definitely possible to do this. IMSLP (a large repository of freely available sheet music, which differs by cross-cutting features such as genre, historical period, contributors (composers and others), instrumentation etc.) is MediaWiki based and has a plugin that does exactly that. These days the would probably want to host all the tags on Wikidata so that they can be multilingual and queryable out of the box, though.


Which is actually done on commons, it just isn't very popular (on images, click the structured data tab and then look at depicts) [admittedly i think a big part of the problem is is implementation choices and UI decisions].


That's only "depicts" claims and is nowhere near comprehensive. It doesn't even come close to matching what's currently stated using categories. Running searches on that data is also hard compared to what IMSLP gives you for their own system.


The Library of Congress uses both approaches, to an extent.

The cataloguing system uses a hierarchical classification, based on one originally developed by Thomas Jefferson, on whose initial donation the Library of Congress is based. This is known as the Library of Congress Classification, and is used to specifically locate a given title or work within the stacks, that is, each item has one and only one location.

There are also subject headings which are more tag-based, though also on a controlled vocabulary. A given work is given a (relatively small number) of subjects to which it's associated. These are not hierarchical, though of course the listing of subject headings itself follows a sequence. Unlike the classification, which assigns a single location to each work, the headings are a search aid to patrons searching for a set of related works within a subject heading, or facilitate branching of a search to possibly related subjects.

Tagging systems, especially ad hoc tags supplied by untrained users, are popular but tend to produce numerous issues over time. Not that formal systems (as with the LoC systems mentioned here) are immune to same. One feature of the LoC systems is that they've evolved processes for managing change over time. Examples would be terminology or classifications which are now deprecated, or of regions and polities which have changed or no longer exist (e.g., the Austro-Hungarian empire, the USSR), or of changes in underlying classifications (e.g., of chemical elements or of biological classifications, both of which have evolved significantly over the life of the Library of Congress).

The history of hierarchical information classifications is long and IMO fascinating, dating at least to Aristotle and his Categories, as well as numerous variants used in classifications of knowledge (such as Francis Bacon's) or encyclopedias, including Diderot's and Britannica.


Good catch - looks like it's a PNG image, with an alpha channel for the rounded corners, and a subtle gradient in the background. The gradient is rendered with dithering, to prevent colour banding. The dither pattern is random, which introduces lots of noise. Since noise can't be losslessly compressed, the PNG is an enormous 6.2 bits per pixel.

While working on a web-based graphics editor, I've noticed that users upload a lot of PNG assets with this problem. I've never tracked down the cause... is there a popular raster image editor which recently switched to dithered rendering of gradients?


My reasoning is because once upon a time, I was using Macromedia Fireworks, and PNGs gave far far better results than JPGs did at the time, at least in terms of output quality. Nearly certainly because I didn't understand JPG compression, but for web work in the mid 2000s PNGs became my favourite. Not to mention proper alpha channels!

...and so it's stuck, two decades on haha


I only recently learned that JPEG and MPEG-1 were designed for near-lossless compression, so the massive bitrate reductions which came further down the road had nothing to do with the original design.

"Inelegant" is the right word; it's hard to shake off the feeling that we might have missed something important. I suspect the next big breakthrough might be waiting for researchers to focus on lower-quality compression specifically, rather than requiring every new codec to improve the state of the art in near-lossless compression.


> for researchers to focus on lower-quality compression specifically

JPEG-XL already does this because it uses VarDCT (Variable-size Discrete Cosine Transform) aka adaptive block sizes (2×2 up to 256×256). Large smooth areas use huge blocks and fine detail uses small blocks to preserve detail. JXL spends bits where your eyes care most instead of evenly across the image. It also has many techniques it uses to really focus on keeping edges sharp.


JPEG XL achieves about half the bitrate of an equal-quality JPEG, even at lower quality levels. That's a real achievement, but the complexity cost is high; I'd estimate that JPEG XL decoders are at least ten times more complex than JPEG decoders. Modern lossy image codecs are "JPEG, with three decades of patch notes" :-)

I think we're badly in need of an entirely new image compression technique; the block-based DCT has serious flaws, such as its high coding cost for edges and its tendency to create block artefacts. The modern hardware landscape is quite different from 1992, so it's plausible that the original researchers might have missed something important, all those years ago.


The discrete wavelet transform (DWT) compresses an image by repeatedly downscaling it, and storing the information which was lost during downscaling. Here's an image which has been downscaled twice, with its difference images (residuals): https://commons.wikimedia.org/wiki/File:Jpeg2000_2-level_wav.... To decompress that image, you essentially just 2x-upscale it, and then use the residuals to restore its fine details.

Wavelet compression is better than the block-based DCT for preserving sharp edges and gradients, but worse for preserving fine texture (noise). The DCT can emulate noise by storing just a couple of high-frequency coefficients for a 64-pixel block, but the DWT would need to store dozens of coefficients to achieve noise synthesis of similar quality.

The end result is that JPEG and JPEG 2000 achieve roughly the same lossy compression ratio before image artefacts show up. JPEG blurs edges, JPEG 2000 blurs texture. At very low bitrates, JPEG becomes blocky, and JPEG 2000 looks like a low-resolution image which has been upscaled (because it's hardly storing any residuals at all!)

FFmpeg has a `jpeg2000` codec; if you're interested in image compression, running a manual comparison between JPEG and JPEG 2000 is a worthwhile way to spend an hour or two.


I found a jpeg2000 reference PDF somewhere. It may as well have been written in Mandarin. I got as far as extracting the width and height. Its much more advanced than jpeg. Forget about writing a decoder.



What about JPEG XL or AVIF? Do they use DCT or DWT, or perhaps something else?


Both formats are DCT-based (except for lossless JPEG XL). JPEG 2000's use of the DWT was unusual; in general, still-image lossy compression research has spent the last 35 years iteratively improving on JPEG's design. This is partly for compatibility reasons, but it's also because the original design was very good.

Since JPEG, improvements have included better lossless compression (entropy coding) of the DCT coefficients; deblocking filters, which blur the image across block boundaries; predicting the contents of DCT blocks from their neighbours, especially prediction of sharp edges; variable DCT block sizes, rather than a fixed 8x8 grid; the ability to compress some DCT blocks more aggressively than others within the same image; encoding colour channels together, rather than splitting them into three completely separate images; and the option to synthesise fake noise in the decoder, since real noise can't be compressed.

You might be interested in this paper: https://arxiv.org/pdf/2506.05987. It's a very approachable summary of JPEG XL, which is roughly the state of the art in still-image compression.


Thanks. The paper is fascinating. I only skimmed around so far and it is full of interesting details. Even beyond compression. They really tried hard to make the USB of image formats, by supporting as many features and use cases as possible. Even things like multiple layers and non-destructive cropping. I like the section where they talk about previous image formats, why many of them failed and how they tried to learn from past mistakes.

Regarding algorithms: Searching for "learned image compression", there are a lot of research papers which use neural networks rather than analytic algorithms like DCT. The compression rates seem to already outperform conventional compression. I guess the bottleneck is more slow decoding speed than compression rate. At least that's the issue with neural video compression.


As I understand it, very small neural networks have already been incorporated into both VVC and AV2 for intra prediction. You're correct that this strategy is limited by decoding performance, especially when predicting large blocks.

In general, I'm pessimistic about prediction-and-residuals strategies for lossy compression. They tend to amplify noise; they create data dependencies, which interfere with parallel decoding; they require non-local optimisation in the encoder; really good prediction involves expensive analysis of a large number of decoded pixels; and it all feels theoretically unsound (because predictors usually produce just one value, rather than a probability distribution).

I'm more optimistic about lossy image codecs based on explicitly-coded summary statistics, with very little prediction. That approach worked well for lossy JPEG XL.


Everything after JPEG is still fundamentally the same, but individual parts of the algorithm are supercharged.

JPEG has 8x8 blocks, modern codecs have variable-sized blocks from 4x4 to 128x128.

JPEG has RLE+Huffman, modern codecs have context-adaptive variations of arithmetic coding.

JPEG has a single quality scale for the whole image, modern codecs allow quality to be tweaked in different areas of the image.

JPEG applies block coefficients on top of a single flat color per block (DC coefficient), modern codecs use a "prediction" made by smearing previous couple of block for the starting point.

They're JPEGs with more of everything.


I'd describe that as a trend, rather than a consensus.

It wasn't an entirely bad idea, because comments carry a high maintenance cost. They usually need to be rewritten when nearby code is edited, and they sometimes need to be rewritten when remote code is edited - a form of coupling which can't be checked by the compiler. It's easy to squander this high cost by writing comments which are more noise than signal.

However, there's plenty of useful information which can only be communicated using prose. "Avoid unnecessary comments" is a very good suggestion, but I think a lot of people over-corrected, distorting the message into "never write comments" or "comments are a code smell".


You've rediscovered a state-of-the-art technique, currently used by JPEG XL, AV1, and the HEVC range extensions. It's called "chroma from luma" or "cross-component prediction".

This technique has a weakness: the most interesting and high-entropy data shared between the luma and chroma planes is their edge geometry. To suppress block artefacts near edges, you need to code an approximation of the edge contours. This is the purpose of your quadtree structure.

In a codec which compresses both luma and chroma, you can re-use the luma quadtree as a chroma quadtree, but the quadtree itself is not the main cost here. For each block touched by a particular edge, you're redundantly coding that edge's chroma slope value, `(chroma_inside - chroma_outside) / (luma_inside - luma_outside)`. Small blocks can tolerate a lower-precision slope, but it's a general rule that coding many imprecise values is more expensive than coding a few precise values, so this strategy costs a lot of bits.

JPEG XL compensates for this problem by representing the local chroma-from-luma slope as a low-resolution 2D image, which is then recursively compressed as a lossless JPEG XL image. This is similar to your idea of using PNG-like compression (delta prediction, followed by DEFLATE).

Of course, since you're capable of rediscovering the state of the art, you're also capable of improving on it :-)

One idea would be to write a function which, given a block of luma pixels, can detect when the block contains two discrete luma shades (e.g. "30% of these pixels have a luminance value close to 0.8, 65% have a luminance value close to 0.5, and the remaining 5% seem to be anti-aliased edge pixels"). If you run an identical shade-detection algorithm in both the encoder and decoder, you can then code chroma information separately for each side of the edge. Because this would reduce edge artefacts, it might enable you to make your quadtree leaf nodes much larger, reducing your overall data rate.


Thanks for the feedback, and the interesting ideas. It's good to know that I was on to something and not completely off :-)

I'm mostly doing this for learning purposes, but a hidden agenda is to create a low-latency codec that can be used in conjunction with other codecs that deal primarily with luma information. AV1 and friends are usually too heavy in those settings, so I try to keep things simple.


There was a constraint - since 2009, the Joint Photographic Experts Group had published JPEG XR, JPEG XT and JPEG XS, and they were probably reluctant to break that naming scheme.

They're running out of good options, but I hope they stick with it long enough to release "JPEG XP" :-)


JPEG XP would have been a nice name for a successor of JPEG 2000, I suppose :)

There's also a JPEG XE now (https://jpeg.org/jpegxe/index.html), by the way.


Incidentally, JPEG Vista would be thematically appropriate.


They can tack on more letters, or increment the X, as required.


Good one - made me and a coworker both LOL (in the literal sense) :D


JPEG ME


The rods are only active in low-light conditions; they're fully active under the moon and stars, or partially active under a dim street light. Under normal lighting conditions, every rod is fully saturated, so they make no contribution to vision. (Some recent papers have pushed back against this orthodox model of rods and cones, but it's good enough for practical use.)

This assumption that rods are "the luminance cells" is an easy mistake to make. It's particularly annoying that the rods have a sensitivity peak between the blue and green cones [1], so it feels like they should contribute to colour perception, but they just don't.

[1]: https://en.wikipedia.org/wiki/Rod_cell#/media/File:Cone-abso...


Consider myself educated, thanks!


Protanopia and protanomaly shift luminance perception away from the longest wavelengths of visible light, which causes highly-saturated red colours to appear dark or black. Deuteranopia and deuteranomaly don't have this effect. [1]

Blue cones make little or no contribution to luminance. Red cones are sensitive across the full spectrum of visual light, but green cones have no sensitivity to the longest wavelengths [2]. Since protans don't have the "hardware" to sense long wavelengths, it's inevitable that they'd have unusual luminance perception.

I'm not sure why deutans have such a normal luminous efficiency curve (and I can't find anything in a quick literature search), but it must involve the blue cones, because there's no way to produce that curve from the red-cone response alone.

[1]: https://en.wikipedia.org/wiki/Luminous_efficiency_function#C...

[2]: https://commons.wikimedia.org/wiki/File:Cone-fundamentals-wi...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: