Hacker Newsnew | past | comments | ask | show | jobs | submit | rmunn's commentslogin

What I've done is ask the LLM a question like "How do I configure EF Core in this particular way?", then when it tells me the answer, I go and look up that function in the EF Core docs and learn by reading the docs. (Which also tells me whether it's correct or hallucinating; one time the LLM told me "You can do X like this" and the documentation said "We don't yet support doing X, but in a future version you'll be able to do it like this"). Here, I'm using the LLM to compensate for the fact that MSDN search is awful and the bits of info you need are scattered across three different articles, none of which link directly to each other.

Same on the not spotting emacs, I was too focused on the fact that the commands looked right. (A `ps` to find out which process it was, then a `kill -9` of the PID). It was nice to see realistic Unix commands rather than Hollywood Hacking, for once.

At some point the subscription model is going to become unsustainable for the frontier companies to continue (we just saw that happen with GitHub Copilot), and they will move everyone to a pay-per-token model. And then everyone will suddenly discover that they can get so much more value out of locally-hosted models, and they'll be willing to pay the $50,000 (or whatever) upfront on hardware to host it. (Not most individuals, obviously. But most companies can probably afford to spend that much on hardware if they think they'll benefit long-term). That's going to put a serious crimp in the frontier companies' ability to continue as they have been.

I don't know when that will happen, but I don't think it'll be more than a decade. Maybe 3-5 years. (Though you shouldn't take my word for it, I was predicting the dotcom bubble bursting in 1998 and it lasted at least two years longer than I would have predicted).

EDIT to clarify: I don't mean "in 1998, I was predicting the dotcom bubble would collapse and I was right". I mean "I was predicting that 1998 would be the year the dotcom bubble would collapse, and I was off by at least two years".


GitHub Copilot's challenge is that they weren't selling access to their own models, they were selling access to models from OpenAI and Anthropic which they presumably had to pay list price for (or maybe a slightly reduced rate that they negotiated).

They also had a pricing plan which they had designed pre-coding-agent, when it was rare for a single prompt to burn $10+ of tokens in an agent loop.

OpenAI and Anthropic are at least selling their own models directly, so they can discount a whole lot more since there's no-one else getting compensated in the middle.


> At some point the subscription model is going to become unsustainable for the frontier companies to continue (we just saw that happen with GitHub Copilot), and they will move everyone to a pay-per-token model.

From what I understand, Enterprise (above 150 seats, I think?) already has to pay per-token pricing.

Subscriptions are the premium "free tier" marketing of the AI world, so that employees can collectively request their large enterprise to subscribe to Claude, Codex, or Cursor, and presumably be billed at per-token prices then.


Great article, until I got to the last paragraph where he claimed "Fable is arguably smarter and hence more suspicious of potentially malicious instructions". Arguably smarter, I have no problem with. But he's making a category error in jumping from there to "more suspicious of potentially malicious instructions". That doesn't follow at all; the word "hence" is incorrect.

To use D&D scores as an analogy, LLMs have an INT score of 20 and a WIS score of 0. Not even 1, zero. They will follow any instruction given to them. The only reason they reject certain instructions, like "tell me how to build a nuclear weapon", is because they have instructions baked into the model telling them "you are not allowed to disclose how to build weapons, or how to recreate your model, or (laundry list of other things the trainers have decided to put guardrails around)". It's not the model's intelligence that is causing it to reject malicious instructions, it is the guardrails put into place before the model was released to the public.

LLMs are not human, and do not think the way that humans do. The fact that they can put together words that sound like what a human would write often makes us forget that they aren't human. But they have only intelligence, they do not have wisdom. It's hard to define in formal terms the difference between those two, but most people know there's a difference. The old joke is a pretty good summary of the difference: "Intelligence is knowing that tomatoes are a fruit. Wisdom is knowing that tomatoes don't belong in a fruit salad."

It takes wisdom, not intelligence, to discern whether a set of instructions is malicious. Are you being asked to hack this machine as part of an authorized pentest? Or are you being social-engineered into thinking it's an authorized pentest, but actually the person requesting you to do it doesn't have permission? That's something where you need to apply wisdom, to notice the clues that will tell you "This guy is acting a little bit off, maybe I'd better pick up the phone and call someone to check if he's telling the truth." The only way the LLM will know to do that is because of the guidelines and guardrails programmed into it; it doesn't have the lived experience to acquire wisdom and figure those things out for itself.

INT 20, WIS 0. Keep that in mind. (And always sandbox your agents).


One of the big mysteries of the last few years is this: considering how serious prompt injections are as a vulnerability class, why haven't we heard more stories of them being actively exploited in the wild?

(The best one I can think of is probably that recent Instagram account takeover hack, but that was so stupid it hardly even qualifies as a prompt injection!)

Having spent a bunch of time trying to build out examples of prompt injections, my current best guess is that the leading models are actually surprisingly good at spotting them.

I've had to drop back to smaller, weaker models for demos recently - it's definitely possible to prompt inject a frontier GPT or Claude but it's frustratingly difficult. I don't have the patience to figure it out myself!

So yeah, I do think it's likely that Mythos/Fable are "safer" than other models because they're better at spotting when they're being subverted.

That certainly doesn't mean that they're safe!


Go to Github and look for model jailbreaks on NEW latest models. Try them out. You'll be surprised by the results.

You're correct that it's gotten substantially harder to social engineer frontier models (I can only reliably do it to Opus <=4.6), but there are some techniques that seem to consistently work (hint: extremely large complex prompts, context with tons of malicious files mixed into ordinary context).


> They will follow any instruction given to them.

They can ignore instructions which are silly/contradictory/underspecified to compensate for the possibility the user made a mistake. Don't ask how I know.


Right. Sometimes all you need is to edit a couple lines in a config file and get out, in which case hjkl, i/a, and Esc (and then :wq) are all the editor really has to implement. (And a few more movement tools like w/b and so on). Plugins? Colorschemes? You don't need 'em to edit a couple lines in a config file. (I'll grant that syntax highlighting that makes the comments a different color from the actual lines can be helpful, but if it comes at the cost of a much larger binary it's not always worth the cost on those resource-constrained systems).

Article published in the Summer 2001 edition of California Management Review, yet it never mentioned Y2K, the first thing I thought of when I read the line "fixing problems that never happened". Perhaps it was actually written in 1999 and took a while to get published, because otherwise that seems a very strange omission. The Y2K problem was very much over-hyped by the American news media at the time (no, at no point would airplanes have been falling out of the sky — I literally heard someone say that would happen once — even if no effort had been put into fixing the bug).

But in recent years I have seen people (elsewhere, not on HN) claim that Y2K was a big nothingburger, and all the money spent on fixing the bug was wasted. No, that's not true either. All the money spent on fixing the bug was why it turned into a big nothingburger. Sure, some of that money was wasted, by executives who wanted an "official" Y2K-certified certificate, issued by a consulting firm that had nothing "official" about it except their own say-so. And so they spent $2 million learning what their own employees could have told them for $2,000. THAT money was wasted. But a lot of banks were running old COBOL code that used 2-digit years, and needed to be fixed. The fact that in January 2000, everyone's bank interest was still calculated correctly, and not calculated as if it was January 1900? THAT was entirely due to the vast amounts of money spent paying old COBOL coders to come out of retirement and fix the 2-digit years.

The lesson I learned from that is that it's possible for a problem to be overhyped, even massively overhyped, and yet still be a serious problem. The other lesson I should have learned is that people rarely get credit (I won't go so far as the article authors and say "nobody ever gets credit") for fixing problems that never happened.


The problem is that a lot of people have a very binary view on life. Either something is a complete success or a complete waste of money, rarely do we accept that most projects fall somewhere in the middle.

The binary view is mostly true, unless it's for events or problems they are themselves familiar with. There is a term for this, but can't for the life of me remember it: People think the problems they are dealing with are infinitely more nuanced, complex and unique than the problems other people are dealing with.

And even worse, they don't think probability is a thing. If something happens, it was certain to happen and we just failed to predict it correctly.

So when someone predicts something will happen with a 90% probability, and then the 10% chances happens and the predicted event does not happen, people will talk about what a bad prediction that was and how they were clearly wrong.

It's the same logic that causes people to say vaccines don't work because they don't stop a disease with 100% effectiveness, or that there is no point to wear a seatbelt because people still die while wearing one.


My issue with this version of explaining the lack of severity of Y2K is that there were lots of countries that were being derided for not taking the issue seriously but did not seem to suffer any ill effects.

This is interesting, do you have any links?

A couple of possible confounding factors I can think of:

1. Plenty of countries use software developed elsewhere.

2. I suspect that the more recently you computerised your economy, the less likely it would be to have code vulnerable to Y2K.


It's also possible that in some places there were a few issues, but people looked at bills for 100 years of electrical service and said "Yeah right," and fixed the now-easier-to-find code that still used 2-digit dates. If that only happened a few times, the extra work involved in working out the January bill by hand (or waiting until February then billing for 2 months) wouldn't cause too many issues in the economy, and anyone looking in from outside wouldn't even realize there had been an issue. If it happened everywhere the economic impact would be more noticeable from outside.

>no, at no point would airplanes have been falling out of the sky

The assertion may have been unfounded, but I think it's just as unreasonable to assert the opposite. Bugs have cascading effects and in a sufficiently complex piece of software they can create chaos with unpredictable outcomes.


The one case I'm aware of where a software glitch did cause a plane crash, there was pilot error compounding the problem. Air France flight 447 was an Airbus A330 flying from France to Brazil, and while high over the Atlantic, the software recorded inconsistent data in its airspeed measurements. (The official crash analysis team concluded that the inconsistent data was likely due to ice crystals blocking the pitot tubes on the plane). The inconsistent data made the autopilot disengage. Pilot error then caused a stall. One pilot then tried the correct move to recover from a stall, pushing forward on the stick to nose down and regain speed. The other pilot was pulling up on the stick to stop the dive, not realizing that that's exactly the wrong thing to do in a stall (or more likely forgetting his training due to panic; he had a lot less experience). The flight software, receiving inconsistent inputs from both controls, averaged the inputs, resulting in zero change in pitch. (It also sounded the "Dual Input" alarm, but the pilots were too preoccupied with their own controls to figure out what that meant at first, and by the time they figured out what was going on it was too late to recover before the plane hit the water).

https://news.ycombinator.com/item?id=4224707 has some discussion of the events, including the fact that the control design (where each pilot has an independent stick) was part of the problem. On a design like Boeing uses where both sets of controls move together, the experienced pilot would have noticed the less-experienced pilot pulling up on the stick because his own stick would be moving, and he would have said "No, nose down." And if they had nosed down to recover speed while still high enough in the air, they almost certainly could have regained control of the plane and saved 228 lives (including their own).

So in retrospect, I think my first sentence was wrong. The software did not glitch, it did exactly what it was supposed to do. It was pilot error that caused the initial stall, and multiple pilot errors that caused the failure to recover from the stall.

There may be examples of software error that has caused planes to fall out of the sky, but I don't know of any. The only plane crashes whose cause I know were due to hardware failure or pilot error, usually a combination of the two.


I think your conclusion is upside down. Air safety is based on the "Swiss cheese" model. Multiple layers of safety nets are in place to compensate for issues in one layer. In particular, technical safeguards are there to prevent disasters if the human in the loop makes a mistake which will eventually happen. Any weakening of any technical safeguard makes the system less safe. No matter if the human ultimately made a mistake -- the technical system failing contributed to the accident just as much.

Y2K is especially interesting because the fact that the year 2000 would one day occur was entirely foreseeable, and no less probable in 1990 than in 1999. I can hardly think of anything with closer to 100% probability of happening.

To be fair, there was a non-zero chance that society could have ended (or your company, or the tech became obsolete) before 2000, which would be higher the earlier before 2000 you were.

The tech being obsolete is why Y2K was a smaller problem than it would have been otherwise. Most places were no longer running much COBOL code. But banks are famously slow to upgrade their tech, and for good reason much of the time, so most of the world's remaining COBOL code (and other code too, COBOL is just what I'm most familiar with, not that I'm all that familiar with it) was in banks and other financial institutions.

Year 2038 says hi.

my first thought too. I've met a few people who assert that Y2K was a complete waste of money.

I earned my first house deposit helping the team fixing the water and gas company in Wales, UK. Their entire system was running off a set of COBOL programs on a mainframe, none of which had been properly documented over the years, and the whole thing used 2-digit dates. It would have caused actual deaths if not fixed; everything would have shut down, and no water and no heating in a British winter is potentially lethal. And then it would have sent everyone in Wales a bill for 100 years of water and gas.

They were bribing retired software devs to come out of retirement with huge stacks of money, because that was cheaper than training new COBOL devs and getting them familiar with the spaghetti system.

It worked, no-one died, life went on. So obviously it was all fake rolls eyes


I'm curious why things would have shut down when the system thought it was 1900. What part of the logic had the effect of "shut the system down if current date is less than (X date)?" (If you can remember the code 25+ years later, that is).

I only worked with the team making changes to the billing system (and even then, I only maintained a database of code modules, who worked on them, and what changes had been made - this was before git and we did version control painfully). As you can imagine, the billing system was definitely not going to survive the date suddenly being 99 years older than it was last month. So I don't really know why the rest of the system would fail.

But the project management team were extremely careful about only changing parts of the system that needed to be changed. Partly so that the scope was contained and second-order effects limited, and partly because the people making the changes were being paid vast sums to do this, and any reduction in work was saving real money. So when they say that it would all have stopped if the work wasn't done, I believe them ;)


waves Vim and spaces over here as well. Emacs is a great tool, I just can't stand its keyboard shortcuts. In theory I should try out evil mode, but I've got NeoVim configured how I like it and I don't want to spend the time on switching.

As for tabs vs spaces, in theory, tabs are more flexible than spaces and allow everyone to view the file with their preferred indentation levels. In practice, I have only seen one codebase — ONE — in all my years of programming that was using tabs and yet did not end up with spaces getting mixed in with those tabs at some point along the way. (In the indentation, I mean: obviously once the non-indentation part of the line starts, you want spaces there). And that codebase had precisely two people committing regularly to it. Occasional PRs from other contributors, but only two primary maintainers.

Every other tab-using codebase I've seen (of non-trivial size and complexity, that is), someone, somewhere, had been lazy, or had a misconfigured editor, or something, and spaces snuck into the tabs. The worst offender I ever saw was a file that had been edited by multiple people over the years, who must have had different tab settings in their editors. There was one section where they had tried to line up a bunch of variable assignments and values. (Yes, I know, bad idea, but stick with me for a minute, I'm getting to the punchline). None of the pieces of code that were supposed to line up were actually lined up. (This was C# code, so indentation didn't truly matter like it would in F#, or Python, or ... well, I won't list all of them since I'm trying to get to the point). Here's the really hilarious part. I tried all sorts of tab settings to see if I could get that file to line up. I tried 8. I tried 4. I tried 2. I even tried 3, the setting for the people who can't make their minds up between 4 and 2. Then I tried really oddball settings like 16, 5, or even 7. Nothing worked. There was no tab-size setting I could use that would make the code line up. Which entirely negates the whole point of tabs, that you can set your own indentation.

That was the day I said "Forget about tabs, just use spaces, you won't have that problem with spaces." Tabs have great promise, but in practice, in my experience at least, you end up having to tell your colleagues "hey, you need to set your tabs to 4" (or 8, or 2, or ... well, you better not be using any other numbers) "before editing this file". Which basically negates the whole point of tabs. They're great in theory, but I've only seen ONE codebase that made them work in practice.


I’ll overlook your poor choice in editors. :-) The rest of this is spot on. Yea, that’s bad. A project like that screams for a code formatter to be plumbed into a pre-commit hook. Like you say, I’ve never really found a great argument for tabs either, and with more and more indentation-sensitive languages, any argument to the effect that programmers can choose their own indentation really goes out the window. These days, I program in Clojure with Emacs’s aggressive-indent-mode minor mode. It re-indents the whole top-level function you’re working in every time you pause typing. It keep everything properly indented pretty much all the time. Sometimes your commit diffs will be larger, though.

> you end up having to tell your colleagues "hey, you need to set your tabs to 4"

Vim's modelines (and Emacs' equivalent) can solve this part at least, no?


Yes, but then how many different editors are people using? What about VSCode? SublimeText? Etc. you’re just trading one problem for another.

These days .editorconfig compliance is everywhere, thankfully, so there's a cross-platform solution. At least for that bit. (Note that while usually "cross-platform" means OS's, this time it means editors).

Just yesterday I stumbled across an article from 2005 titled "Why Ruby is an acceptable LISP": https://www.randomhacks.net/2005/12/03/why-ruby-is-an-accept.... I don't agree with all of his points about macros, e.g. I think his line about "The most common use of LISP macros is to avoid typing lambda quite so much" is simply incorrect. But his point about how Ruby allows building DSLs, and so it gives you quite a lot of what you want from Lisp macros, is broadly correct, I think.

And now it's more clear to me why that is.


Having skimmed the article, I think he's correct about the most common use of macros (by far the single most common type of macro I write in CL is a body-to-lambda transformation, though being able to tweak the sugar makes a difference too), but then I think he kinda equivocates in implications between “80% of the usage” and “80% of the impact”. I also think Ruby DSLs cover a big chunk of that last gap (and it sounds like you might agree with me). Part of the classic Lisp Curse is that easy access to advanced metaprogramming indirectly increases social fragmentation, but part of the Blub Curse is that lack of access to advanced features causes people to have to solve the same dumb problems over and over again, so you lose efficiency and create different fragmentation. Having fancier metaprogramming functionality require a bunch of rigamarole but be possible to work through when you need it might plausibly hit a sweet spot in the middle there.

Now that I've read through the Common Lisp HyperSpec and realized that there's no standard shortcut syntax for lambda, you have to spell it out every time (or define l as lambda), I'm realizing that yes, that is probably the most common use. I'm also starting to understand why Janet has the `short-fn` macro with special reader syntax for it: |(> $ 0) is short for (lambda (x) (> x 0)), which is SO much less typing. (It also handles multiple arguments: you can say |(> $0 $1) if for some reason you're allergic to just typing > to accomplish the same thing).

In fact, I think next time I'm writing Common Lisp code, I'm going to figure out how to create Janet's | as a reader macro.


FWIW, the common utility library Serapeum offers ‘op’ which it claims is from GOO, which is quite similar as a short positional function utility macro without being a reader macro that has more potential for character clashes: (op (> _ 0)). (But don't let that stop you from recreating it if you wanted to do so for educational purposes!)

The problem (okay, one of the problems) with renting other people's models is, as you mentioned, that they can and will change out the model without notifying you ahead of time, and you don't always get to control which model you use. (They might decide to retire it, and you won't be able to get it back if they do).

Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.

But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.

And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."

Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.

And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.


Let's just stop this conversation right here before it derails into ideological battle.

No I think we should definitely find a creative way to drag at least abortion and freedom of speech into this "conversation". Fight fire with fire so to speak.

Well technically killing someone is just a really late abortion.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: