Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a former Wikia employee, I am somewhat of a MediaWiki insider. I sped Wikia's search engine up by several orders of magnitude and then went on to pilot a number of NLP/machine learning initiatives in the company.

Jimmy Wales' already tried to make a "Google Killer" ten years ago. It was tilting at windmills to say the least. Letting individuals help manage algorithmic search results was harder than you could imagine. Let's not even get into the difficulty of building an effective crawler.

One of Wikia's former CEOs, Gil Penchina, notoriously undervalued search as a result of this very public gaffe. By the time I came in, it took over five seconds to do a simple on-wiki search. Searching across wikis took so long they actually just sent the search to Google and had you abandon the site. I personally fixed a lot of these problems, and that part was pretty cool.

So now let's get to the subject at hand, which is a search feature based on an authoritative knowledge graph. Something like this should adequately surface factual information in an intuitive manner -- optimally based on natural language. Wikia already tried this, too. They brought on a very seasoned advisor who played a crucial role in the semantic web movement far back into the early oughts. I remember going to semantic web meetups in Austin when I was in grad school quite some time ago now to hear this guy talk.

This guy was essentially the SF-based manager or lead for a small team located in Poland whose job it was to take some of the "structured data" at Wikia and attempt to build some kind of knowledge graph on top of it. This project was unsuccessful.

So why did it fail? We'll start with a lack of product direction. Wikia had and probably still has a very junior product organization that is mostly interested in the site's UI and (recently) a focus on "fandom" (yuck). The team allocated to the project was based in Poland (Poznan, to be exact), and primarily kids coming out of a technical school on their first job. Your assumption about communication being a problem would be correct. However, the subject matter expert was so entrenched in his area of specialization, the problem was even more compounded on the native English-speaker side. There was too much getting in the weeds, and not enough focus on incremental progress.

To make things worse, they tried using a proprietary, not-ready-for-primetime data store because it most closely matched the SME's preconceptions on how the data should be structured. There was absolutely not an existing business use case for this data store, and problems getting it to work turned even building a simple demo into a death march.

Either way, what I'm saying is, $250,000 is not enough to solve this problem. We have attempted to solve this problem before in the MediaWiki world. It's not going to magically get better. To make something like this work, you need:

1) Best-in-class UX people who would know how a knowledge graph provides a significant improvement over existing solutions 2) Leadership that can bridge the gap between SMEs and implementers 3) Very skilled engineering resources with backgrounds in less conventional technologies

This is a massive investment that no one is willing to spend on what is essentially a media play.

About six months later, I had built a proof-of-concept that sucked data out of MediaWiki Infobox templates into Neo4j, a well supported graph database. I was able to answer questions like, "Which cartoon characters are rabbits", and "What movie won the most Oscars in 1968" using the Cypher query language.

At that point in time, Wikia had decided they were tired of investing in structured data, and wanted to re-skin the site for a third time in as many years to make it look more like BuzzFeed.

Structured data is cool. In many cases, unsupervised learning may be what you're actually looking for. But in the end it has to satisfy a real user's needs.

Wikipedia has five million English articles. Wikia has over 20 million. As far as capitalizing on this wealth of knowledge, the devil is truly in the details. But it's a real shame that all of that information isn't put to better use than to encourage the socially maladjusted to take quizzes over which anime character they're more like.



How did you arrive at 20 million? This sounds like one of those "technically true" facts that are cooked up for investors. http://wikis.wikia.com/wiki/List_of_Wikia_wikis puts the combined total of the top 1,000 wikis (in all languages) at 12.4m.


20 million pages, not wikis -- sorry if I mistyped?


There aren't 20 million pages. Read my comment again.


There are over 300,000 wikis. I usually worked with the top ten thousand English wikis, which had over 15 million pages.

Or I'm just making the number up. Doesn't really matter to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: