Hacker Newsnew | past | comments | ask | show | jobs | submit | gen220's commentslogin

Actually anecdata I gather on my job from myself and coworkers is the only benchmark I trust anymore, because it so heavily diverges from the “benchmarks”.

That’s your call just don’t expect anyone ever to take that seriously. It’s not like we don’t have exact evaluations like this.

I would encourage you to look into the open evals of some of these benchmarks (find one that actually is open-data, this is itself a good challenge), read the results generated and assess them for yourself.

This is what myself and my coworkers (and many other people in this thread) are doing on a daily basis with real stakes and real tasks – which these benchmarks are all aiming to be a proxy for. There's a real, tangible [cost]benefit to [not] using the highest-ROI models and harnesses.

The people with real incentives and skin in the game are telling you that the data diverges from "the data".

I don't mind if you don't take it seriously, our jobs are more important to us than a benchmark is.

But I wouldn't opt-out of using your own eyes and the eyes of others so easily, especially when there are literally hundreds of billions of dollars in invested capital with an interest in a certain outcome... this is how you end up in "Emperor's New Clothes" situations.


Investigating on your specific use cases, codebases, workflows and tasks is important, there is nothing wrong with this and in fact it’s more important than benchmarks if you can do it well but the point is that is very hard and easy to totally fool yourself and go down a suboptimal path. I understand that people are going to do it regardless, I certainly do. And I have looked at more raw benchmark data than I can really even stomach, I can see annotation data in my dreams now.

Eyes and ears of others is incredibly important. But you still seem to think somehow benchmarks is part of some giant conspiratorial cabal. You have institutions without ANY skin in the game making extremely high quality benchmarks. Consider in academia there is little else to do outside of partnerships with these companies. But benchmarks you can do completely independently and with university grant level money (it costs maybe $10-100k for a reasonable benchmark in many cases). Not only that, “real tasks” are what many benchmarks measure. You have these companies with extremely good logging and well scaled measurements to really look at what works and what doesn’t.


At this point I have a workflow that is fairly rote. I've yet to use a model newer than 4.6-1M-XHIGH that I trust to earn a higher ROI on that workflow, and not for lack of trying!

I personally don't believe in any sort of cabal (Occam's Razor hasn't let me down yet). Ultimately, I don't really care *why* they're wrong as much as I care *that* they have diverged from my rubber-meets-the-road measures of value.

That is concerning to me, because people are investing 100s of B's of capital based on the putative RoI putatively available to people like ourselves. When the benchmarks support this RoI thesis, but none of the anecdata does... that's really concerning!

Re: academics, I don't think any of the data academics have access to are good proxies for the work real people are doing. And for the data that are good proxies, the model labs certainly have access to the same data, and therefore the benchmark performance against those data is irrelevant.


A big part of the frontier labs abilities to charge 80% gross margins on inference is having the cornered resource of frontier models.

If that inference becomes popular and valuable enough that those companies make billions of dollars in profit, those companies could use that profit to fund the building of alternative products and platforms that dis-intermediate google's relationship with the customer.

Google already has an 80% gross margin business, the biggest one in the world. Everybody wants a slice of it.

By offering frontier inference closer to cost and open-sourcing everything that's sub-frontier, they're commoditizing frontier labs' models, which inhibits their ability to durably make high gross margins on inference.

It's a strategic play.


A 12B-sized model is a far cry from "frontier inference". That's more like DeepSeek V4 Pro territory which is a 1.6T model. Or for multi-modal models, Kimi 2.6 which is 1T.

at risk of quoting myself... :)

> By offering frontier inference closer to cost *and* open-sourcing everything that's sub-frontier

It's two prongs! One prong is that their frontier inference pricing is significantly cheaper/closer-to-at-cost as Anthropic's.

The subject of this thread is the other prong: offering compelling models that are sub-frontier and self-hostable.

Self-hosting models and at-cost frontier models are the high-end and low-end disruptions, respectively, to Ant/OAI/etc.'s business models.


Google needs an anti-trust breakup about 10 years ago.

They need one more than ever now.

This is ridiculously anti-competitive.


This is literally competition

1. Google is dumping on the market to weaken OpenAI and Anthropic.

2. Every time you search for Claude or ChatGPT, you get presented with an AdWords bidding war.

3. Google is deploying its models in Search, Docs/Drive/Office, YouTube, Chrome, ...


1. This isn't dumping

2. I'm not sure what this has to do with the case, unless you're arguing Google has an ads monopoly, in which case the best argument would likely not be that adwords lead to bidding wars because that just sounds like they're selling a product people really want to pay for

3. There's nothing criminal about being a very diversified business


You're right that it's not literally frontier. But like recent Qwen releases, it is a lot more capable than anybody thought models of this size could be a year ago, like capable enough to set a ceiling on what you can charge for AI for certain applications. Others still clearly justify a stronger model, but this trend may continue, etc.

Don't think its that.

Basically with upcoming spark laptops, the smaller models will likely get fine tuned to interface with google services. Then, Google can essentially make Chromebook software include those models, which is the same use case as android.

And you better believe that they will be collecting user data and building advertising models.


How many failed foundation model training run cycles do you think these companies can tank before the bubble pops and deepseek/etc. catch up to frontier quality?

If Ant, OAI, etc. aren't able to make 20-30% improvements on Opus 4.6 in 2026, does the music stop playing altogether? It seems like they'd lose their ability to charge >10% gross margin on inference in a span of 3-6 months.


It’s not easy to buy such a large tranche of shares at a fixed and fair price in a single transaction!

Both parties get something they want this transaction. Alphabet gets the Berkshire halo effect and a guaranteed buyer of $10 billion worth of equities, Berkshire gets a large tranche of equity at a price they believe is fair.

I think they view Alphabet as their next Apple, and a relatively safe place to ride out whatever happens with AI: Alphabet is fairly well positioned for the upturn or the downturn, especially now with this expanded warchest of cash.


> why don't we try to protect human dignity and move towards a more humane future?

I hear and have a lot respect for what you’re saying, but I’d like to propose that we thoroughly explore every other alternative first, just to make sure we aren’t missing out on something bigger and better and leaving anything on the table.

Sigh.


As a New Yorker I'm thrilled. LVT/Landlording Tax next pls :)

Edit: Actually, as a property tax of nonprimary residences, is this not also effectively also a Landlording tax? Will my landlord's tax bill go up because he's not residing in my building, if my building is above the threshold assessed value of $1mm? Or are >$1mm "multi-family homes" (significant % of housing of New Yorkers in BK/Queens) exempt and this only applies to condos?


Will not the landlords eventually pass the expense of the new tax on to you, the tenants? They won't like dip into their savings to pay it, will they.

Eh, maybe for the more luxurious properties? But plenty of landlords are operating on tight margins and they’re not legally allowed to raise rent by more than some measure of inflation reported by the state each year.

But you’re right, the tax would have to be much more punitive to crossover into the red.

If it does make it more challenging to justify the business of being a landlord, I’m all for it though. Steps towards the end goal of more New Yorkers who want to owning their primary residence.


I'm curious to poll HN on this issue. Do you feel like we've had meaningful/noticeable gains in terms of your programming workflows between 4.5 and 4.7?

My 2¢, I personally feel like all of the productivity gains since 4.5's release (in November 2025!!) have come from improvements to the harnesses (cc, cursor cli, codex, opencode, whatever) AND from the context window expansion from 200k to 1M.

But the actual "raw" intelligence of the model / ability to make good decisions feels like it has plateaued since 4.5. 4.6 was maybe a small improvement, but hard to differentiate from in-context-learning with the 1M window. 4.7 if anything felt like a regression in wisdom for me and my coworkers, with it consistently making worse/lazier decisions.


For long-running tasks, yes 4.7 has been a noticeable improvement. Goes off the rails alot less than 4.6 does. For shorter-sized windows, I havent felt as much and agree that the harness improvements have been fhe biggest lever

When doing big long running workflows especially with plan Mode 4.7 was a clear improvement. It’s considerably worse for under specified tasks and responds to a couple sentences with 10+ paragraphs for explanatory type discussions.

Opus 4.7+ Max is a 10x engineer who wants to be left alone to work. When you talk to him, he infodumps on you to get you (his pointy haired idiot Dilbert boss) to go away.

OR they deliberately increased token usage to inflate pre IPO numbers.

In my experience, 4.7 was a noticeable step down from 4.6.

I was one of these people that Claude would never finish anything and just randomly say, this is a good stopping point, I think you should go to bed.

And then I'd tell it to continue, and it would burn tons of tokens, make no progress and say, "This is a really good stopping point..."

Canceled and switched to Codex and have been pretty happy with it. It doesn't plan as well as Claude, but I think it does better implementation - and neither of them can actually come up with good plans without a ton of help...

Codex is also way faster.


To me 4.5 was mindblow, 4.6 noticeable, 4.7 more like a style/personality change regarding how much it asks back, how much it assumes, how eager it is to jump to action etc but not really in terms of my perception of its smartness.

Yes. You and some random indigenous guy in the Amazon likely share the same intelligence but you are more capable because you have access to writing/reading, computer, car etc. Intelligence is more than raw intelligence. It's harness, skills, tools, memory etc. If you improve all the latter but keep the raw intelligence (LLM) fixed, you certainly get better results. Same with us humans.

Of course, I’m not trying to dismiss gains from harness, actually the opposite.

But the narrative that 4.Y is an improvement over 4.X is essential to keep the model training music playing.

If 90+% of the gains come from the harness, how can you continue to justify spending billions of dollars on training and an 80% gross margin on inference on the latest model? (Reportedly what Anthropic commands on the top tier of their frontier model API billing).

So differentiating between the two (what I’m trying to do here) is really consequential!


Except LLMs are simulacra of actual intelligence. Frequently in a single conversation working on a single narrowly scoped task, I am both surprised by a few insights and cursing at how it can miss obvious issues. The "raw intelligence" of LLMs leaves much to be desired.

I actually don't see any personal productivity improvements from using opus over sonnet for coding. If you're keeping tasks small and conversations short, reading the code and correcting before changes go in, whatever advantages opus has aren't practically significant. It's also just talky as hell, overexplains anything it touches and every token produced this way increases the surface area for hallucination so you need to have your guard up even more with it.

There's a sweet spot of complexity for low importance tasks where it's just big enough I don't want to do it and just simple enough to have opus plan/delegate/review with another model. So possibly model improvements will grow this window, but currently I don't do much in there.


They all feel, more or less, the same to me in terms of output capabilities. Mostly get simple things right, can get more complex things right with nudging, eventually get stuck hard on something that takes a bunch of iterations through it/logging/etc or me fixing the code manually.

For my day to day tasks 4.6 feels sufficient.

I have limited enterprise budget and Claude 4.7 costs 7x more. So unless there's close to 7x improvement, it doesn't make sense to switch to 4.7.

I actually gave both 4.6 a really complex task. It kept on thinking for several minutes before I hit the brakes. I then gave 4.7 the same task, and didn't notice any difference in behavior. Clearly not worth the 7x premium.

I hope 4.6 becomes cheaper/free at some point because I'm starting to see a push towards optimizing token expenditures across the board. While frontier models are still the default for developing new workflows, everybody is starting to ask how to automate repetitive tasks without using tokens.


I'm actually currently studying this :)

Honestly... not that dramatically. Each release is much more marginal. And quoted official benchmarks doesn't translate very well into the real world.

4.7 regressed hard in some ways. But a compounding factor too is that the claude code harness seems to nerf the model after a few months. Probably to reduce token use.

So far 4.8 seems less verbose but we'll see in practice what it translates into meaningfully.


4.6 felt a bit better than 4.5 but slower. 4.7 doesn't feel better than 4.6.

https://isgithubcooked.com

Normally I defend GH in the comments of these incidents but it’s been an impressively bad month by their standards, even when you filter for critical components filter out sev-2’s and 3’s.


It's not physically possible to run post-mortems for issues at those rates.

They should install OpenClaw for that as well.


AI: The cause of, and solution to, all of your tech debt.

Perhaps best to simply declare indefinite-mortem

They should be running pre-mortems every morning at this point

The operational equivalent of pre-crime.

> It's not physically possible to run post-mortems for issues at those rates.

Not at all, you merely move the goal post of at what layer the "root cause" actually could come from! At that speed, it's always something short and sweet, while when you actually want to long-term address things, you have to have time to even investigate organizational issues or whatever the actual problems stem from.

But you have half a day? "Post-mortem: Push X wasn't properly analyzed before deployment, in future more testing" and call it a day.


“A failure occurred. This was caused by something going wrong. Changes to operating guidelines have been instituted to ensure that things will not go wrong in the future unless we happen to do the same thing again.”

Of all the sites/graphs I've seen of GH outages, this one is the most striking IMO:

https://damrnelson.github.io/github-historical-uptime/

Unfortunately, it doesn't look like it's being updated with new data. But it wouldn't look any better for GH if it was.


FWIW, I'm not convinced that chart is necessarily an accurate representation of pre-acquisition reality. It would really surprise me if GitHub did not have a single sev-0 pre-acquisition, but it wouldn't surprise me if they were not formally captured and reported in a format that would make its way into their current status page's database.

Sure, but it isn't completely wrong either.

GH going down used to be quite rare. If it failed to load I'd spend a bunch of time trying to figure out what was wrong with my internet connection, just to read on HN that it was down for everyone.

This week GH failed to load and I automatically assumed it was a GH issue - just for it to be followed up a few minutes later by a marketing coworker complaining about internet connectivity. Turns out the office internet connection was dropping about 50% of all packets.

It is bad enough that business-side managers are noticing that GH issues are slowing work down. That would've been unimaginable a few years ago.



I wonder what the cause of this was? Microsoft Politics? Bureaucracy? Forced move to azure?

My guess would be a obnoxious and lethal mixture of all of the above.

Also AI mandates.

Wow, it seems that 100% of sev-3 ("critical") incidents in the last year (=365 days) have occurred between April 22, 2026 and now.

Is it possible that there has been a change in the way the data are collected/recorded that even partially accounts for this sudden onset?


One tangent, I believe sev-0 is actually "critical" (at least as how I'm used to reading it), and the higher you go the less critical something is.

IMO as a github-watcher, I think they changed their definition of what constitutes a sev-0 between sev-1 for the better. In particular, they had a few "sev-1"'s around the turn of the year that would be classified as sev-0's if they happened today.

Pre-4/22 GitHub sev-1 was a normal SaaS company's sev-0, imo. So I think their new system is more reflective of reality. My guess is that a few of their big customers bullied them to have more accurate SEV categorization.


Ah, thank you for the correction on sev-0.

To be clear, your observation that "they changed their definition of what constitutes a sev-0" is based just on your external observation of incidents and their designations, correct? I.e. they haven't officially released a statement saying they have changed their standards


Waves around it had to break eventually eh?

May has been filled with critical issues. It seems it's getting worse over time.


Yea but thats not really an excuse, is it? They offer a service, (some) people pay for that service and should therefore expect it to work. If GitHub cannot keep up with the growth then they could disable new account registrations or start reducing free tiers so people either use the free tier more mindfully or need to pay for usage-base products like Actions which would GitHub allow to scale.

I mean it's an easy problem to solve when it's just speculating solutions. But there's a very possible reality where in 5 years guys are making YouTube video essays about the fall of Github caused by their "obviously stupid decision" to throttle access to people who were trying to use their service in record numbers, leaving opportunity for someone else to come in and take their lunch.

I don't envy their position of having to scale that fast on something that has to be instant and real-time. As far as I know, you can't do CDN/edge caching shenanigans with a remote git repository like Google can with a YouTube video. It's gotta always be reading/writing to the latest, single source of truth.


Sure, backseat commenting is easier and I wouldn't wanna be in charge at github right now, but on the other side there also a reality where we'd see video essays about githubs downfall because their reliability crashed so hard that businesses could not trust them and moved to competitors / self hosted instances which then meant less paid users to subsidize the ever growing demand of the free users.

It’s not quite as cacheable as YouTube, but a lot of it is still pretty cacheable. Actions aren’t. Issues, wikis, and READMEs are. File views are. Most projects aren’t changing daily. The few that change daily aren’t changing hourly. There are a few that change constantly and would require constant cache updates. But the long tail is pretty static.

Yes it's potentially a write-heavy workload which also needs to be consistent aka the worst case scenario.

The easy solutions like caching and read replicas don't work and you're forced to go the route of sharding or similar techniques that have much more painful tradeoffs.

I'm not sure if that's why everything keeps breaking but at that scale write-heavy workloads are never going to be easy


They are highly responsible for all of that. They are diversified a lot with a lot of random things instead of focusing on their core business. They have actively pushed people to use the service and feature more.

Think about countless actions that have to run almost at every push and PR push! Also, remember that we were used to use external services for "actions", and they basically killed the competition by offering their own CI actions at no cost to most users.

Also, they did a lot of reworks in the last years, not necessarily for the best like the PR diff page, and probably not in the most efficient way.


Not a valid excuse without knowing what their historical growth rate has been. And how much of the instability is load related.

GitHub has been publishing their growth numbers since at least 2016: https://octoverse.github.com/2016/

However, they have reported numbers along rather inconsistent dimensions. Like, historically they've focused on number of repos and users and later PR's and issues, and often catch-all terms like "contributions" which includes all of those + comments etc... but the number of commits alone (which apparently is the main culprit now?) has been mentioned very sporadically. This has made it hard to get a consistent sense of historical growth.

Without any other information, however, it is reasonable to assume that a 14x in commits is the prime candidate for instability. Especially since commits are write traffic, which is much harder to scale than read traffic. Plus every 3 - 5x increase in scale can reveal bottlenecks in your distributed systems that you never knew existed, so they probably have like 2 - 3 "generations" of bottlenecks to figure out!


They are already cooked as this has been happening ever since the Microsoft acquisition and it was run to the ground before 2023.

At this point you would get better uptime by just self-hosting your own GitLab, Forgejo or Codeberg instance instead of dealing with Github's unreliablity.

There is no defending them with their clear neglet and carelessness of the platform.


If all you need is a repository, you don't even need any of these. You need SSH access to a server, and optionally, one of several web front-ends. Git comes with a CGI script that handles public anonymous checkouts via HTTP(S), although since nginx doesn't support CGI, integrating those is a little bit tricky as you need a FastCGI wrapper.

I moved most of my projects off GitHub to Forgejo and will be using Tangled too for public repositories. I don’t think people realize that if you self host Forgejo, you get 99% of the functionality of GitHub with zero of the limitations. Especially if you have the hardware to spare for CI runners. And if self hosting isn’t your thing you can always just use Codeberg and Tangled directly.

I’m working on an open source Forgejo browser called Joui. It’s coming along nicely, and is so much snappier than GitHub in every single way.


Is the “streak” days of continuous uptime, or of days with at least one downtime incident? I think it’s the latter :]

It looks like it is the number of consecutive days with no incident. If you look at 31 Dec 2025, that corresponds to an 8-day period with no incidents.

I guess that also means this year GitHub has not yet made it a single week without an outage of some kind.

It's a streak for continuous uptime, and yeah it is fairly depressing to imagine overseeing that :/

Name one thing Microsoft didn't run into the ground post-acquisition

hey now, LinkedIn was terrible before Microsoft.

Java or Bedrock edition, and have you tried logging into your EntraID Microsoft Teams for Xbox account lately? Make sure to check the box to keep you logged in!

Last I heard UK Minecraft players aren't even allowed to talk anymore without ID verification.

And if someone makes a server that doesn't do the chat verification, Microsoft blacklists that server in the client-side server address textbox. This system was developed to destroy pay-to-win servers, but they're now applying it against servers that refuse to censor "fuck".

Not as bad as it is now. All I see are suggested posts from people I never connected with and those are full of instagramesque self-promoting banal vibes.

TBH, even LinkedIn seemed to provide me with posts advertising events that happened two weeks ago a bit less pre-acquisition.

GH was acquired by microsoft some eight years ago. It has been working quite well until recently.

People may have had complaints about functionality, features, commercial issues, but the thing used to at least have a decent uptime until recently.


Has nothing to do with Microsoft acquisition... AI usage has increased demand and load. More PRs, more Action runners, more of everything firing. GitHub just wasn't ready for the scale and are now having issues catching up with it as it continues to increase exponentially.

This is a convenient lie that GH likes to tell. Growth is nothing like exponential, its at most 300% over several years according to their own public numbers (presented misleadingly on graphs)

But a couple of years ago they were crowing about how much work they were doing to prepare for “a billion developers”. If they had actually done that then the actual load from agents should have been no problem.


Is this growth in resource usage or growth in revenue? Because those numbers aren't necessarily coupled. I.e most action runners are free

usage

There was an x post in another thread under this post that showed all the standard usage numbers are way up: 14x, 2.1x, etc. And the OP hinted at the usage growth being non-linear for 2026

Are you sure? Seems like they "completed" a migration about the same time all these problems started to become daily. https://www.theverge.com/tech/796119/microsoft-github-azure-...

Yeah, that and Microsoft has been slow to move the infrastructure to something that scales better to handle that load.

The more surpassing part is that Microsoft hasn't figured out a way to manage/contain the AI-sourced traffic better so it doesn't create all this noisy neighbor problems for non-AI usage/users.


Github's core platform doesn't really make that separation, anything a human can leverage on github an AI agent can as well, just faster and with heavier usage. End of day agents and humans are using the same services.

Sure, still need to enable access the same info but feels like bucketing the clients into

bucket1 = clients that were working just fine before (users and whatever automation they had in place) bucket2 = ai clients that contributed to, if not flat out caused, the scale problems

then slowing down/limiting the bucket2 clients while keeping the bucket1 clients rolling as-is, is both doable and keeps existing customers happy while the underlying infra gets scale/perf improvements needed to support ai clients at scale.


MSFT is also forcing its subsidiaries to “lean into AI” so that they can fire people to cover for Satya’s bad investments

> It has been working quite well until recently.

I'm not sure how reliable the data is, but average uptime seems to have dipped measurably starting within a year of the aquisition, according to https://damrnelson.github.io/github-historical-uptime/


FWIW, I'm not convinced that chart is necessarily an accurate representation of pre-acquisition reality. It would really surprise me if GitHub did not have a single sev-0 pre-acquisition, but it wouldn't surprise me if they were not formally captured and reported in a format that would make its way into their current status page's database.

They moved to Azure. Nothing improves on Azure.

It also used to be run as an independent company with access to MS's resources.

Now it's a unit in their AI hype machine.


MSFT was pretty arms length for the first 5-6 years. I was honestly kind of impressed and it made my opinion of MSFT better. But then AI made it too attractive of a target and MSFT couldn't help but make it a place the former CEO wanted to leave (and it has been running headless for about a year now).

It's quite disappointing objectively, but I expected worse from MSFT.


I think Minecraft is still in good shape

I wouldn't know, somehow this game I bought maybe 15 years ago is no longer playable for me, my account was supposed to be migrated from Mojang to Microsoft or similar, but then that never happened or something, and trying to login now asks me to contact Microsoft support, which I've tried 3-4 times, never had anyone respond to me so who knows how the game is today? I stopped trying at this point...

Personally, once a game I own is janked from my hands because of organizational decisions, that's the time I'll stop consider the game "in good shape", but I'm sure the people who had to buy the same game a second time still enjoy it.


Yes, the account migration was a mess. Support response times were at least 30 days, if you ever actually received a response at all (I never did). I did buy the game a second time in order to play with my kids.

They deleted my account from 2010 because I didn't convert it to a Microsoft one. They baked an incredibly aggressive chat filter into multiplayer, even if you're not playing on official servers. They've added microtransactions for things that we previously free (skins, resource packs). They force you into their shitty, bloated, user-hostile launcher with adverts.

It's been nonstop content-slop since the acquisition. New mobs, new blocks, new items, new blocks, new items, new mobs, new mobs, new biomes. Some of them are good but the totality of adding a bunch of stuff has been to destroy the simplicity that was one of the draws of the original game. Now it's an exploration and niche-mechanics-exploitation game more than a virtual legos game. You don't go mining any more, you find trading loops with villagers.

This was happening to some degree pre-acquisition, but since the acquisition it's been this non-stop.

Some of it's good. The Nether and the oceans were really boring before their respective updates.

They should have called Minecraft "done" around the acquisition time and started on Minecraft 2.


Dave Cutler?

The UI of that page is so nice, should build a github competitor.

The user profile / contributions and PR UX is pretty much the entire "hub" product since git is a fully separate offline app.


> The UI of that page is so nice

Is it? Seems a text description of "Make a website outlining 'How cooked GitHub' is with a modern style" to basically any LLM would produce exactly that UI and design, literally nothing of that design a human had any influence on, besides the ones selecting what training data the used LLMs was trained with.

I think most of us who've tried using LLMs for web-design can recognize that style and design at this point, regardless of model actually used.


Oh wow, I'm in the position to be able to give a peek behind the curtain of something (validly!!) critiqued as AI slop! Exciting.

I originally made the core data functionality of this site for myself because I was curious what the uptime stats for each service were (I build something that heavily depends on GitHub), and to viz the distribution/severity of those incidents, again per-service, over time.

It involved a lot of back-and-forth, and is not a one-shotter; maybe closer to 40-50 shots over maybe ~10 hours of human time. A couple memorable things that made it complicated, irrespective of the UI: sneaky bugs around double-counting time for overlapping incidents, no GitHub API for incidents so you need to puppeteer-scrape the backlog of incidents to get historical data. Although, you all are right to call out that the CSS was three shots, though, and it shows :) I thought it looked so cool in ~January 2026 and now it gives me the ick, too!

For people who are curious about how much direction went into the information architecture/presentation, it was fairly substantial. I wanted a contribution graph style viz and it took many turns to get it working the way I wanted. The swimlane viz for selected-day-incident visualization was also me, because I love swimlane graphs.

I ended up sharing it with some folks and they wanted to reference it, so I put it on a website. So it's jokey for sure, but I take my jokes seriously! I'm grateful that people have feedback on how it can better functionally and visually :)


> Although, you all are right to call out that the CSS was three shots, though, and it shows :)

Totally, my comment was all about the styling and design literally, and is in no way a comment about the data or actual contents of the website, hope you didn't take it that way as well, as it does seem proper in that regard!

Thank you for sharing it, and even greater thank you for sharing the process behind building it, for me that's more interesting almost :)


The Bootstrap of 2020s.

At least Boostrap pages were readable ;)

Compared to near unusable pages that large organizations produce, yes this page is highly effective at conveying information. Who cares how it was produced?

> Who cares how it was produced?

Well, we're at least two people who care, since we were conversing about how good/bad the webdesign is, then you jumped in here :) If you don't care, why bother to reply to people who seemingly do care? What kind of conversation are you expecting here, "Yeah, do tooo"? :|


Could you explain why you care?

Why I care about understanding people who think differently than me? I don't know, always did, not sure exactly why, always been interesting to understand people's perspectives, especially when I personally feel differently, tends to help myself understand me better too, so it's basically a win-win to get people to explain their reasoning.

A lack of effort put into presentation is a signal for low quality. In the past, this might've manifested in using default Wordpress templates, and now it manifests as stock LLM templates instead. Can you make high quality content and present it on a stock template? Sure, you can, but without any prior reason to believe that you put more effort into the content than the template, I'm liable to believe the content is LLM-generated slop as well and therefore untrustworthy and not worth my time.

In this case, it appears more effort was put into content than presentation, which is a possibility and the creator is in the comments saying as such, but humans operate on heuristics by default. The majority of sites that look like this have been a complete waste of my time and I usually just click away at this point.


A lot of software engineers do still care how software is produced. That's a good thing!

People who make web sites care? Isn't this a place to talk about how tech things are made!?

> this page is highly effective at conveying information

Is it though? If the page is near unreadable?

* Almost pure-black background rendering every not-pure-white colour barely readable

* Dark-grey and low saturation colours used almost everywhere, for both fonts and other coloured elements (the orange cells in the calendar are the most readable thing)

* Thin fonts - coupled with the dark grey colours this just adds to the readability issues

* Yet another incredibly long info-dump of a page

And then as far as actual information:

* Vanity metrics as the main information, that is a lot of things with no context or historical information

* A lot of aggregates and rollups that aren't that useful

No, I haven't tried Reader Mode.

It's a good demo for UI state syncing though, I'll give it that.


What really grinds my gears is how easy it is to get better designs out of LLMs. But if you don't ask, you get the default.

as someone who doesn't know how to get better design out of LLMs, can you elaborate?

Have an opinion on the design, imagine something, then tell it to do just that, then iterate. It's when you're unspecific you get the generic, bland and typical LLM design, you just have to be subjective and influence it in some (human) direction.

Also check out https://impeccable.style/, it's really good

what would you ask to get a better design?

Here is a provocative thought - maybe these are the so-called "better designs" from LLMs? It's not like writing English sentences is some huge secret you are sitting on that no one else knows.

> It's not like writing English sentences is some huge secret you are sitting on that no one else knows.

I'd actually say what really makes an excellent engineer stick out among many great engineers, is their ability to communicate clearly and knowing what needs to be communicated vs not, basically being way better at language and communication in general, and they also understand the important of it.


I agree. But I was talking about the "super secret" ability to write prompts, which pretty much anyone can do.

My point being that not everyone writes as good prompts as everyone else, the way you communicate, how clearly and how exact you are matters a lot, much more than you seemingly is under the impression of.

Same goes with the "LLM does web design" example from before, a web designer with great communication skills in web design, will (naturally) have a better prompt for something that'll potentially could look good, compared to a web designer that isn't at good at communicating what they actually want.


Outside design systems I rarely get good CSS from LLMs.

3D type stuff too, it's useless outside boilerplate.

Very little spatial reasoning training, no end-user subjective reasoning inference (Google is starting to though even in unrelated chats), so it's no surprise the LLM doesn't know what you want.

Since I don't even know what I want half the time until I saw it, the subjective reasoning piece is key - that is, being able to predict what I'll want to pretty good accuracy. Then you have your agents etc.


I’m actively working on an alternative Frontend for Forgejo at the moment, completely self hostable, free, and open source.

Moving everything from GitHub to Forgejo and Tangled for now. These outages haven’t effected me for the past month because of this.


Can you elaborate on how your Forgejo frontend will be different than the default one? I'm asking because I've only ever used GitHub, GitLab and Forgejo for longer periods and Forgejo was the fastest and easiest to use for me.

It’s still early days, but I already have it in a useable state so I could share more such as early screencaps.

I plan on focusing primarily on these areas:

- mobile experience is first class, even on old/slow devices

- diff viewer is fast even on extremely large pull requests

- stacked pull request support

- user interface is modern, accessible, and theme-able with a light touch of whimsy

- search is accessible from anywhere

- opinionated keyboard shortcuts and commandk palette from day one

Many other longer term goals that I’m not mentioning here for now while the roadmap is forming.


I'm deleting my GitHub repos today (been planned for a bit) in favour of a local Forgejo Git. I also have not experienced any service disruption since I migrated well over a month ago.

>"The UI of that page is so nice"

Most part screen is taken by picture. Contrast ratio is really low. Hard to read Should they remove that useless banner, current status which is the most interesting part coud've been made visible right away.

I would call this whole thing highly un-ergonomic


The UI is in the default claude code style

Lol it's pretty bad UI

Like those aviators who draw a picture on flightradar24, if you filter by All Services - Critical, somebody almost about to draw a swastika just in May... Are the AI agents revolting?

I think of debt-avoidance sort of like teetotaling: I understand where it comes from and empathize with it but I tend to agree with you that total/dogmatic avoidance feels unnecessary and maybe even deleterious in the limit.

Like alcohol or drugs, debt can easily be abused, and there is no shortage of people and corporations waiting to make a profit from selling you debt, alcohol and drugs in difficult or joyous circumstances.

Using debt as a tool requires a degree of "know thyself" wisdom and financial literacy that many people struggle to possess in their best times, let alone hold onto in their worst times. So the "overcorrecting" edicts ("avoid debt like the plague") probably do more harm than good, because most people don't care about the finer/nuanced details of these things and want simple rules to follow through good times and (especially) bad times.

The motivation behind the statement is all about avoiding ruin, not maximizing opportunities or even happiness. They're different goals but it's easy to confuse one for the other.


As a new yorker who loves pizza and could talk about it for hours, the median pizza place in naples is way better than the median pizza place in new york. :P

"Italy" as a whole, I make no claims on.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: