I don't understand how this is considered fair. AlphaGo has been trained on a database that includes every recorded game Sedol has ever played while Sedol is seeing AlphaGo's play style for the first time. Sedol should have been allowed to play against AlphaGo for a few months before the match so he could study its style.
Go AIs weren't expected to reach this level for at least another 10 years.
Before AlphaGo, Zen and Crazy Stone (the previous Go AIs) could only play against top-level professionals with a significant 4-5 stone starting handicap, and this was less than 3 years ago. A 4-5 stone handicap is basically taking control of half the board before the game has even started.
It really shows how the neural network approach made a huge difference in such a short time.
Part of this timing jump is Google throwing hardware at the problem with a large 280 GPU + 1920 CPU cluster. I would venture this is almost 100x bigger than most of the Go AI hardware we've seen to date. The nature paper suggests without this cluster it would be playing competitively with other single workstation Go AI, but nowhere near top level players.
> throwing hardware at the problem with a large 280 GPU + 1920 CPU cluster.
You have a trillion connection neural net wrapped in 2 pounds of flesh inside your head. This is a massively larger amount of hardware compared to just about every animal out there. Throwing hardware at a problem is a solution to intelligence.
I'm not comparing brain wetware to hardware. The parent's post was interested in how we achieved such great AI go performance today that was supposed to take 10 years. If you look at the components that fueled this, the performance of the system was advanced significantly by having additional hardware; both in training the policy and value networks with billions of synthetic Go games and at runtime.
I don't like the biological comparison, but using your metaphor it would be like God saying "Hey I've created a brain but only have 10 billion synapses. Evolution would normally take 10 years to get to human-scale at our current organic rate but if I throw money at building a bigger brain cavity I can squeeze in the 1 trillion to get there today!"
Extrapolating Deep Blue's 11GFLOPs supercomputer to today with Moore's law would be equivalent to a 70TFLOPs cluster. AlphaGo is using 1+PFLOPs of compute. While they likely aren't actually achieving that compute throughput, to put this in perspective this is the compute scale used to model huge geophysics simulations covering a 800km x 400km x 100km volume with 8+ billion grid points around the San Andreas.
At the very least, it's interesting to see how much more accessible computation has become. Back when I was in school I could only dream of having a cluster of 280 GPUs. When sometimes the dream would come true and you had access to a cluster you would have to wait your turn in the job queue and hope you had enough compute in our resource quota to prevent your job from being terminated.
Now I could spin up a 280 GPU cluster on AWS (after dealing with pesky utilization limits) for only $182/hour. If researchers at Google have been doing this non stop for the past year they have "racked up" $1.6M on just compute. This is a drop in the bucket for a marketing department and the publicity they have achieved. I don't think normal Go AI developers have access to those resources :)
Don't underestimate algorithmic improvements. Today's chess engines running on DeepBlue hardware outperform DeepBlue running on today's hardware.
Modern chess engines are built on a testing infrastructure that makes it possible to measure how each potential change affects the playing strength. This "Testing revolution" has brought massive improvements in playing strength.
For AlphaGo, it's probably the training that requires the most computational resources. The 'distilled knowledge' could perhaps run on a desktop PC. The program would search fewer variations and would be weaker, but if AlphaGo improves further, that version might still be stronger than any human.
My understanding is that the significant part was that before this, throwing more hardware at the available Go AIs still didn't make them competitive against high level players.
Also, it feels like training the AI against many games with lots of hardware is somewhat equivalent to a human progressional who engrossed themselves in the game and trained since childhood.
One member of the deepmind team responded to this very question during the interview at the beginning of part 3.
He said that the training data set size is much, much larger than the number of lee sedol games. It is like a drop in the ocean and not enough to significantly influence the resulting policy network.
Perhaps the computer didn't know that it was playing against Lee Sedol, but from Wikipedia "As of February 2016, he ranks second in international titles (18), behind only Lee Chang-ho (21)."
I don't know the details of the algorithm and perhaps it doesn't give more explicit weight to his games. But I wouldn't be surprised if after some iterations the algorithm decided to give more implicit weight to the games of the world leaders.
They mentioned what the training input to AlphaGo was in the paper; a database of a few hundred thousand games from dan ranked players on KGS. This means mostly amateur players, though there are a few professionals who play on KGS as well.
However, this only gets you so far, and the training set is fairly small compared to what you want to really train both the policy and value networks well. So then they had it play millions of games against different versions of itself, training both a new policy network and the value network based on that.
It's unlikely that Lee Sedol's games made much if any impact on AlphaGo's training. It was bootstrapped off of high-level amateur, and some casual pro, play, but from then on it just trained against itself.
The implication is that because Sedol wins more tournaments than almost anyone else, if you feed "Tournament winner" or "ELO" etc. as a feature, his games will be much more weighted than others.
So even if AlphaGo isn't explicitly designed to play against his style, it's implicitly trained in it.
But it's pretty subtle and I'm guessing that the volume of high-level tournaments overwhelms any effect, as the AlphaGo team said.
I don't think this is unfair, but I think other people are replying to a suggestion that AlphaGo might be optimized to beat Lee that I don't see in the parent comment.
What does seem right is that whatever strategic insights are in Lee's play are reflected in current games--his and the younger generation who came up in his shadow. Whatever strategic novelties shape AlphaGo, they are totally new to Lee.
I don't think that would make the difference: 1) there's no trick to be learned and 2) the same thing happens with human players to some extent--when Lee Changho appeared on the international go scene, his style was misunderstood and underestimated, even when his games were public.
However, it is true that there is a real asymmetry--AlphaGo may not know Lee from other players, but it has had the opportunity to "study" the best games of the current players, and no one outside of DeepMind has had an opportunity to study its games.
As it happened in chess, machines will beat humans in Go no matter how much knowledge about their inner workings will be provided to the human. (I've watched this story unfold in chess and the various hopes of how humans are still somehow better. $1K sez Go is exactly the same story. Can't beat a machine in a formal universe with a defined goal.)
It's actually entirely possible that if the program were unsupervised, i.e. had to learn Go "from scratch" without relying on any human games, it would be even stronger than it is now.
Deepmind is going to work on that next (according to one of the staff interviewed on the broadcast). It will be interesting to see if other playing styles develop.
I can't help but feel training it with previous human games is fair as that seems the equivalent of how humans are taught. You don't just explain the rules of Go to somebody and leave them to learn on their own without playing anyone or picking up tips that have been passed along for centuries.
Even more importantly, the policy network that chooses which move to explore must choose human like moves in order to function correctly because it must choose to explore the correct moves of Alphago's opponent.
That's not right. It just needs to choose equally good or better moves in playouts. It doesn't need to anticipate when its opponent plays bad moves, that's just a bonus. Basically: if you're good enough you don't need psychology, you just play the winning move.
I don't think that follows. To beat the machine the move must be both unpredicted and profitable. Random moves are not profitable. Training purely by reinforcement learning rather than on humans could create a policy network that ignores more subtrees that are profitable than the current one does. In short, it isn't good enough for the AI to be good at playing itself, it has to be good at playing every possible player, and while it is playing humans it is sufficient for it to be good at playing every human player.
That depends on the reasons that humans make flaws. If human flaws are mostly errors related to failures in our meat (stress, lack of focus, jittery nerves) that keep us from looking in depth then the algorithm will easily use the good points in each game and with its superior ability to look deep into the future the results are predictable that the machine will win every time.
So what? I could watch every game Roger Federer played in any tennis tournament and still lose all sets to love.
It used to be that computers could only do combinatorics better, but that where 'intuition' played a strong part, there was still hope for us humans...
Well, guess I will have to start playing Calvinball...
Also assuming you are a tennis player...if you do not study your opponents' shots during warm up, you are doing it wrong.
The way the would play against a lefty is different than a righty. Someone with lots of topspin vs someone who hits flat, a pusher vs power hitter, etc.
Lee Sedol didn't complain about this. In fact, even a few days before the match Lee Sedol was predicting a 5-0 win for him.
If someone is clearly better than you, you can study it's style all you want, it won't make a difference. You probably wouldn't even understand it's style.
That's a very amateurish and closed minded comment.
There's a reason professional sports team and players study the styles of their opponent. Teams employ statisticians for this exact role. Before games, teams study the style of their next opponent. After every after game teams go and study the recordings of their last game.
Which is why I qualified my comment with "clearly better". You can study Usain Bolt's running style all you want, it will still leave you in the dust. Studying makes sense only when the difference is small.
But I think a computer as a training buddy is indeed a good idea to improve Lee's skill. I don't know how Lee feel right now, he's defeated, there must be an enormous pressure inside of him, but I think he will appreciate the challenge because he needs more challenge! He has played against top players all over the world now for his 20+ years since earning 1 dan rank. Also, I think the computer play against itself and many random moves. Unlike human players, one would expect computers to play unconventionally since the computer can predict so far ahead about the probability of maximizing winning.
It's probably not that different from some new hotshot kid that has no recorded history of playing, but got really good playing on the street and studying the masters. Kid gets discovered by some credible major promoter who challenges Sedol to a $1M game.
sedol's game actually has zero bias over how alphago plays. the team lead for alphago himself said that its like a drop in the ocean/bucket for alphago.