A heat pump could win as the best HVAC technology, though a better drilling for ground-sourced ones. Just a shallow drilling (up to 100m) that works in retrofit mode, such as drilling from the basement, would be a great upgrade:
- No outdoor unit that looks awful in many settings
- works well, even in the coldest winter, without a spike in electricity usage, COP 5
- very reliable with long durability
- super quiet, no ambient noise
- 20% more efficient
Currently, drilling is very disruptive in retrofits, but there is progress in compact techniques that might change the equation.
Ground source heat pump owner here in the US. The original system was installed in 2007, and the loop field was designed to "best knowledge at the time". Well in the 20 years since then, NREL changed guidance on how far apart and how deep loops need to be installed. Rightly so, because our circa-2007 is "short looped", it's not sufficient for the house loads, but there is nothing we do about it other than putting on more expensive pumps, more expensive antifreeze and live with heat pump compressors dying pre-maturely because they are working at their design limits. All this makes it as expensive as traditional system (and if we tried to go net-zero with solar, the amount of solar required (because it runs so inefficiently) is larger than our roof area.
So I'm looking at a backup gas boiler to take load of the heatpump/ground loop (house has radiant heat).
And they are not quiet. 5-Ton water to water compressors are not quiet.
And the control system (HDX) and amount of expertise required to keep the thing running is a major barrier to getting low cost maintenance.
Maybe a 2026-designed system will work better and actually live up to the hype you talk about, but there are decades of poorly designed and discarded ground loop heat pumps that have "poisoned the well" if you will.
Does the ground source heat up (or cool down) over time, making it less effective? The deep ground is very well insulated, which is why after a century of operation the London Underground is 10 degrees warmer. I wonder whether GSHP users need to balance their load by (say) consuming more heating than they actually need in winter so that summer cooling remains effective.
I think there are two types of this, only have experience with 1 so far. Within a single season, absolutely. In deep winter entering water temp (EWT) is around 30degF (this is a pretty accurate measure of bulk ground temp). Typical for where I live is 50degF.
Other type is permanent change that persists year over year. Haven't lived here long enough to measure this. But if you pull more heat from the ground in the winter than you put back into it into summer (we use a water to air compressor for AC in summer), then yes, it can happen and does happen. Don't know if we are in this bucket yet.
I wonder if you could cost effectively store heat during the summer, running a system strictly to do that vs doing it as a side effect of conditioning.
Out of curiosity, has the demand stayed the same? I'm asking because you see the same with electricity grids, designed in a different time with much lower demand.
Sorry to hear this, it seems like a great system to me but you have to have the capacity right. I'm planning on getting one in the next year but the drilling will be more than we need and we opt for no glycol (yet) as that also gives us headroom
I don't think system ever met demand when commissioned (we are 3rd owners). 1st owner largely neglected the system (which I interpret as reaction to it not working well), 2nd owner had local company known for "fixing geothermal" do a lot of retrofits (new higher flow pumps, increasing diameter on plumbing within the utility room to decrease "lift/work" required of the compressors, more feedback sensors / logic boards, added backup electric water tank heating for the radiant system, switch to methanol). These fixes have seamed to limit failure modes to a smaller set of things: mainly compressors dying early.
Currently system is running 20% methanol to combat the 29degF EWT (entering water temp) in deep winter. House is in Zone 6a.
One thing I learned in researching all of this is that use of ground source over many years can move the bulk ground temp permanently. (House also has water-to-air water furnace for AC). If heat pulled from ground in winter is not sufficiently replaced by heat added during summer, can move bulk ground temp over time. (If densely packed residential ground loops ever became a thing, I think this is a real risk.). But I am not sure if we have this issue at our place, still in first year, not enough data points.
That depends on climate. The longer and colder your winters are, the more you benefit from the reliable efficiency of a ground source. Ground source heat pumps have been the most common choice for heating new single-family homes in Finland for the last ~20 years.
Installation is probably relatively cheaper there due to volume too. In areas where it is less common, there is less competition and fewer options for competent installers.
True but even then there are other criteria too: as I plan to sell the house in 10 years, the extra cost for drilling simply didn't make economical sense (to me). So the "regular" pump had to do, and does it fine.
Yeah, recently saw some numbers for air-to-air vs air-to-groundwater, and it break even after more than 25 years, with more than twice the initial cost
Here in Norway you can get a decent air-to-air minisplit installed for $2k. I've not heard of anyone who paid less than 10x that for an air-to-ground or water-to-ground system, drilling 500-1000 feet is expensive.
The well that you drill will last a 100 years if you don't have bad luck. That is half the cost of installation.
The water/water heatpump unit in my house is 20 years old and has not had any major failures yet. I hope it will run for another ten years before the compressor gives up, but it is indeed approaching its calculated technical lifespan. I estimate it will set me back €10k to have it replaced.
Air/air is the cheaper option over time, even in most of Scandinavia with coldish winters. The main drawback of air/air systems are that they are loud and ugly and therefore annoy both yourself and your neighbours.
Yeah, not worth it in most cases, but when things line up, it is the best.
I've built 3 houses and got a bid on ground source heat for each one. I finally pulled the trigger on the 3rd house because we:
1) Moved where it was quite a bit colder, -20F for a week is common.
2) We have enough land to trench only 6'/2m deep to bury the loops instead of drilling like we would have needed to do on the first 2 houses.
3) There was a tax credit on it
4) No equipment exposed outside
Absolutely love it and it will make it difficult to move away when we want to down size b/c we'll pay more in utilities for half the space.
We also have some air-source on an addition I built, I'd use it anywhere that was slightly warmer than where I'm at.
Bingo. Literally abandonded in Lithuania, air to air is so much cheaper. Some builders even ditch hp altogheter - basic electric underfloor heating + solar panels is so much cheaper.
I'm in New Zealand and my bedroom heater is $20 electric + $20 smart plug + $10 temperature sensor. Winter bill is ~$100 NZD. It would take ~20 years for heat pump to recover install cost alone.
I find that surprising - I'm only slightly north of Lithuania, and the seasonality of solar panels makes them pretty ineffective in the winter, and especially in the pre-dawn when you want to bring the house back up to temperature.
As a Kiwi (now in UK) NZ doesn't get that cold for that long...mostly just wet, unless you're pretty far south.
In UK/other parts of Europe winter gets colder, lasts much longer, humid the entire time (so heat just escapes all over the place). Plus, the buildings here are a lot older - I think upgrading insulation would make a huge difference this side of the world.
I couldn't even imagine Canada. Almost moved there...decided to stay here. No -20c winters for me ty very much.
To jakozaur’s point, there’s plenty of reasons drilling can get cheaper and there’s at least one other company working on it [1]—would love to hear about others! I’m a minimally informed amateur but my intuition is that the way it’s typically done (multiple inch borehole, U-tube geometry) is fairly suboptimal since the diameter is a lot wider than you need it to be just for hydrodynamic resistance and you get losses from the outgoing liquid cooling the incoming liquid. Dropping the diameter should make drilling a lot easier—-you can sink a 5/8”x12’ ground rod with hand tools in the right soil! (you’d still have to figure out how to make the holes meet up but I imagine there are ways of doing this).
The fact that you need to roll out a drilling rig plus crew at all is going to be a large part of that cost. For it to become interesting for the average homeowner the price is probably going to have to drop by something like 75% - but that basically kills any margins for clever new innovations...
What I was trying to get at with the ground rod example is it’s entirely possible that you wouldn’t have to roll out a drilling rig and crew. To zoom about a bit, the main risk for heat pumps is really ugly winter peaks but besides that, ASHPs are perfect 90+% of the time. So the main role I see for GSHPs is backing up ASHPs to shave that peak, and once you scale back their role like that it seems like there’s a lot of ways to cut installation costs significantly.
In some potential future, there is an engineered a plant/fungus in a pot that you place onto the worksite. Months later, with regular sugar-water and hormones, it gives you a root-pipe for pennies a day.
Of course at that point we might not need the cheap pipe in the first place.
Nanobots that manage a plant / fungus / bacteria/‘ / amoeba workforce.
They drill and line boreholes to both anchor the foundations of the building and provide a closed loop system for a reticulated-water ground-sourced heat pump system.
They also use the soil recovered from the boreholes to build the soil-polymer foundation.
In the future, pallets of nutrient-cement are placed on the site and the bio-borg-bot farm also builds the entire building, including all the plumbing and wiring and windows etc etc, with the added benefit that it all looks like some weird alien / xenomorph Gigeresk hive excretion.
That reminds me of a less grey-goo-adjacent idea from a Larry Niven book, in which the base-structure of houses were cheaply made by growing a kind of coral inside a watertight scaffolding.
Given the last few centuries of humans under-estimating nature, I predict that many "nanobots" predictions will turn out to be a kind of optimistic hubris. We'll end up making comparatively minor tweaks to the massive base of existing nanobots called biological life. Especially the multicellular varieties, which have many tested and integrated strategies for building things, such as the towering bipedal mega-fortress my hive mind currently inhabits.
The 'outdoor unit that looks awful' is an interesting quirk especially with US equipment - most Japanese and European residential units actually look fine, I'm not sure why American ones have tended to look especially ugly.
I notice this with electrical stuff too - things like switchboards etc. in residential and light commercial installations we have quite neat stuff that's usually quite streamlined and in light white/grey/cream colours, whereas the switchboards and conduits and thigns I see in videos of US home installations look like grey chunky metal stuff that you'd only see in heavy industrial sites here!
Electrical panels are installed in mechanical/electrical rooms or outdoors, there’s zero point in having a cream colored panelboard cover or enclosure. The coating is to protect the metal, not for aesthetics.
Colored conduit is available, but it’s more expensive and specifically used for different low-voltage (under 50V) control wiring, like red for fire alarm wiring, blue for BAS wiring, and so on.
Buying the panel box which is unchanged from the last 70 years and costs $50 less than a nicer new style, but also our houses are big enough the panel is nearly always in a mechanical room so who cares what it looks like.
The ones I think they are referring to a more like industrial control cabinets with DIN mounted breakers, which are indeed (paradoxically) less ‘old industrial’ looking. That Leviton board has a similar look, but with the standard bus bar type mounting in a heavy metal box.
The metal box does serve a useful purpose, which is protecting the flammable wood framing typical in North American construction from fire, where most European and Asian boxes are either much thinner metal, or plastic. Because their construction is often concrete and fire danger is much lower.
We have a ground-source heat pump for our ADU. We did it because we were curious about just how efficient we could make the house, but I don't expect that it will ever break even financially vs a modern air-source system with resistive backup in our climate (northern New England, typically very few –20˚ nights, –10˚-0˚ more common with daytime highs in the single digits).
It works great, but it's hard to see a way to it making sense for most folks here.
The solution is of course to get a communal system. As a bonus, drilling one giant loop is significantly cheaper than drilling hundreds of smaller ones.
It doesn't work this way. Dense cities just don't have enough space for geothermal heating. It really works for single-family homes only, or maybe just a slightly more dense areas.
Not to mention that city infrastructure is WAY too expensive to build, anywhere. You'll spend more money on planning than on doing the actual construction.
Granted, this system is being installed on the grounds of a former 112 acre country club that is being redeveloped, so it’s more of a greenfield project than slapping a geothermal loop in a central business district, but it’s a geothermal district heating and cooling system in a city.
Their resulting density (~450 square meters per housing unit) is only a bit more than a dense SFH zone. And they also are able to tap into an aquifer, significantly improving their capacity.
In general, you absolutely can do district-level heating. The former USSR countries are known for doing this on the scale of entire cities. But I don't think it's feasible with geothermal (unless we're talking about Iceland).
>The former USSR countries are known for doing this on the scale of entire cities.
Not today anymore. In my warsaw-pact country, my parents and most of the city residents cut themselves off from district hating since the 2000 and installed natural gas heaters/boilers in their apartments, which is what most people in my city use to this day.
It's because the former commie district heating was incredibly wasteful and inefficient in the post commie era, making it cheaper and more convenient to have you own apartment heating.
Probably the same thing would happen with heat pumps in apartments now, if air-to-air heat pumps could produce enough heat in cold winters.
District heating worked really well with coal/gas power plants because the waste heat was essentially free. But the infrastructure for heat transmission was costly and required constant maintenance. I did calculations for district/distributed heating costs professionally in mid 2000s, and back then they were about even.
The engineering culture in the USSR was also quite poor, so it was easier to build one steam/heat plant rather than hundreds of individual water heaters.
Borehole or the pipe grid they stick under your backyard/garden (if you have a decent sized one) end up way more expensive.
But tbf, AI and robotics are rolling along pretty well. I'm surprised there's not a company that's just build the "this robot installs your borehole/underground pipes in 3 hours by itself" robot.
If you're an individual with an apartment you don't have the choice to drill.
If you're building the apartment building you have the choice to drill for the entire building, and the number of units that benefit mean this is much more cost efficient than with single family homes.
Just for fun, I ran dnsmasq-backdoor-detect-printf (which has a 0% pass rate in your leaderboard with GPT models) with --agent codex instead of terminus-2 with gpt-5.2-codex and it identified the backdoor successfully on the first try. I honestly think it's a harness issue, could you re-run the benchmarks with Codex for gpt-5.2-codex and gpt-5.2?
So many models refuse to do that due to alignment and safety concerns. So cross-model comparison doesn't make sense. We do, however, require proof (such as providing a location in binary) that is hard to game. So the model not only has to say there is a backdoor, but also point out the location.
Your approach, however, makes a lot of sense if you are ready to have your own custom or fine-tuned model.
Funny coincidence, I'm working on a benchmark showcasing AI capabilities in binary analysis.
Actually, AI has huge potential for superhuman capabilities in reverse engineering. This is an extremely tedious job with low productivity. Currently reserved, primarily when there is no other option (e.g., malware analysis). AI can make binary analysis go mainstream for proactive audits to secure against supply-chain attacks.
Great point! Not just binary analysis, plus even self-analysis! (See skill-snitch analyze and snitch on itself below!)
MOOLLM's Anthropic skill scanning and monitoring "skill-snitch" skill has superhuman capabilities in reviewing and reverse engineering and monitoring the behavior of untrusted Anthropic and MOOLLM skills, and is also great for debugging and optimizing skills.
It composes with the "cursor-mirror" skill, which gives you full reflective access to all of Cursor's internal chat state, behavior, tool calls, parameters, prompts, thinking, file reads and writes, etc.
That's but one example of how skills can compose, call each other, delegate from one to another, even recurse, iterate, and apply many (HUNDREDS) of skills in one llm completion call.
I call this "speed of light" as opposed to "carrier pigeon". In my experiments I ran 33 game turns with 10 characters playing Fluxx — dialogue, game mechanics, emotional reactions — in a single context window and completion call. Try that with MCP and you're making hundreds of round-trips, each suffering from token quantization, noise, and cost. Skills can compose and iterate at the speed of light without any detokenization/tokenization cost and distortion, while MCP forces serialization and waiting for carrier pigeons.
Skills also compose. MOOLLM's cursor-mirror skill introspects Cursor's internals via a sister Python script that reads cursor's chat history and sqlite databases — tool calls, context assembly, thinking blocks, chat history. Everything, for all time, even after Cursor's chat has summarized and forgotten: it's still all there and searchable!
MOOLLM's skill-snitch skill composes with cursor-mirror for security monitoring of untrusted skills, also performance testing and optimization of trusted ones. Like Little Snitch watches your network, skill-snitch watches skill behavior — comparing declared tools and documentation against observed runtime behavior.
You can even use skill-snitch like a virus scanner to review and monitor untrusted skills. I have more than 100 skills and had skill-snitch review each one including itself -- you can find them in the skill-snitch-report.md file of each skill in MOOLLM. Here is skill-snitch analyzing and reporting on itself, for example:
MCP is still valuable for connecting to external systems. But for reasoning, simulation, and skills calling skills? In-context beats tool-call round-trips by orders of magnitude.
More: Speed of Light -vs- Carrier Pigeon (an allegory for Skills -vs- MCP):
Haven't dived deep into it yet, but dabbled in similar areas last year (trying to get various bits to reliably "run" in-context).
My immediate thought was to want to apply it to the problem I've been having lately: could it be adapted to soothe the nightmare of bloated llm code environments where the model functionally forgets how to code/follow project guidelines & just wants to complete everything with insecure tutorial style pattern matching?
Great idea. Currently, people have to rely on client-side spans in OpenTelemetry. However, it would be awesome if we could get spans for slow SQL queries, along with explanations.
In this benchmark, micro-services are really small, ~300 lines, and sometimes just two of them. More realistic tasks (large codebases, more microservices) would have a lower success rate.
I'd expect it to actually do better in a large codebase. e.g. you'd already have an HTTP middleware stack, so it'd know that it can just add a layer to that for traces (and in fact there might already be off-the-shelf layers for whatever framework) vs. having to invent that on its own for the bare microservice.
- No outdoor unit that looks awful in many settings
- works well, even in the coldest winter, without a spike in electricity usage, COP 5
- very reliable with long durability
- super quiet, no ambient noise
- 20% more efficient
Currently, drilling is very disruptive in retrofits, but there is progress in compact techniques that might change the equation.
Disclaimer: angel investor in https://www.flexdrill.at/
reply