> I [...] didn't really rug pull on your response so I think it's fine.
No you didn't. All good. And I learned a lot from the extended answer. So I am thankful for the explanation.
> Developing heuristics for this is a bit of a hobby horse of mine. It feels tantalizingly almost doable with just a little bit more resources and time than I have.
I can totally understand the feeling. There are quite a few things that I'd like to go deeper into either at work or in private. But alas time.
> Now this is a proper difficult problem with (probably) fairly subjective answers.
I agree. And I don't have answers ready. A lot boils down to preference. Personally, for example I prefer written content over video. Except in a few areas were I like (some) explanatory videos. To me it comes down to the question of how easy I can skim the content when I am looking for an answer.
On the other hand - for deep immersion into a topic I use multiple media formats.
In terms of web search I sadly nowadays need to sift through a lot of seo-fied content that is there either to build a (personal) brand or to attract clicks for advertising revenue/affiliate revenue.
So in principle I agree with you on the noise problem. Still I also believe that there are real great gems to be found in the long tail. When I still feel like I came late to the party, but when I started out in the web in '97 there were so many lovely, quirky sites. So many places that people had put a lot of time, energy and thought into. And sites so packed full of information that I came away not only with more knowledge, but in awe that somebody would give this knowledge away for free.
There also were quite a number of horrible sites (my first ones probably included). So there was a noise vs. signal problem back then. Maybe not to the extent today, though.
> The machine it's on is a Ryzen 3900X with 128 Gb RAM. Most of the index is on a single 1 Tb consumer grade SSD.
Call me impressed. Sounds absolutely cool.
So even with a raid setup for redundancy this is doable.
May I ask how you decide to add me content? Do you follow links? Do you use other search engines' results as a starting point?
I could probably shoot many more questions, but don't want to be a nuisance.
> May I ask how you decide to add me content? Do you follow links? Do you use other search engines' results as a starting point?
I initially did basically a DFS-walk originating at a few websites I liked, with some filtering criteria that deprioritized websites that didn't look too interesting. Now that I have a fairly comprehensive mapping of the space I want to index, I use a few factors like frequent outbound links from highly ranking domains to inform which new sites to index.
> I could probably shoot many more questions, but don't want to be a nuisance.
No you didn't. All good. And I learned a lot from the extended answer. So I am thankful for the explanation.
> Developing heuristics for this is a bit of a hobby horse of mine. It feels tantalizingly almost doable with just a little bit more resources and time than I have.
I can totally understand the feeling. There are quite a few things that I'd like to go deeper into either at work or in private. But alas time.
> Now this is a proper difficult problem with (probably) fairly subjective answers.
I agree. And I don't have answers ready. A lot boils down to preference. Personally, for example I prefer written content over video. Except in a few areas were I like (some) explanatory videos. To me it comes down to the question of how easy I can skim the content when I am looking for an answer.
On the other hand - for deep immersion into a topic I use multiple media formats.
In terms of web search I sadly nowadays need to sift through a lot of seo-fied content that is there either to build a (personal) brand or to attract clicks for advertising revenue/affiliate revenue.
So in principle I agree with you on the noise problem. Still I also believe that there are real great gems to be found in the long tail. When I still feel like I came late to the party, but when I started out in the web in '97 there were so many lovely, quirky sites. So many places that people had put a lot of time, energy and thought into. And sites so packed full of information that I came away not only with more knowledge, but in awe that somebody would give this knowledge away for free.
There also were quite a number of horrible sites (my first ones probably included). So there was a noise vs. signal problem back then. Maybe not to the extent today, though.
> The machine it's on is a Ryzen 3900X with 128 Gb RAM. Most of the index is on a single 1 Tb consumer grade SSD.
Call me impressed. Sounds absolutely cool.
So even with a raid setup for redundancy this is doable.
May I ask how you decide to add me content? Do you follow links? Do you use other search engines' results as a starting point?
I could probably shoot many more questions, but don't want to be a nuisance.
Thanks for your time already.