I don't understand how this is considered fair. AlphaGo has been trained on a da...

clarle · on March 12, 2016

This is more than fair.

Go AIs weren't expected to reach this level for at least another 10 years.

Before AlphaGo, Zen and Crazy Stone (the previous Go AIs) could only play against top-level professionals with a significant 4-5 stone starting handicap, and this was less than 3 years ago. A 4-5 stone handicap is basically taking control of half the board before the game has even started.

It really shows how the neural network approach made a huge difference in such a short time.

emcq · on March 12, 2016

Part of this timing jump is Google throwing hardware at the problem with a large 280 GPU + 1920 CPU cluster. I would venture this is almost 100x bigger than most of the Go AI hardware we've seen to date. The nature paper suggests without this cluster it would be playing competitively with other single workstation Go AI, but nowhere near top level players.

papapra · on March 12, 2016

Apparently it is still strong on a single machine https://twitter.com/demishassabis/status/708489093676568576

pixl97 · on March 12, 2016

> throwing hardware at the problem with a large 280 GPU + 1920 CPU cluster.

You have a trillion connection neural net wrapped in 2 pounds of flesh inside your head. This is a massively larger amount of hardware compared to just about every animal out there. Throwing hardware at a problem is a solution to intelligence.

emcq · on March 13, 2016

I'm not comparing brain wetware to hardware. The parent's post was interested in how we achieved such great AI go performance today that was supposed to take 10 years. If you look at the components that fueled this, the performance of the system was advanced significantly by having additional hardware; both in training the policy and value networks with billions of synthetic Go games and at runtime.

I don't like the biological comparison, but using your metaphor it would be like God saying "Hey I've created a brain but only have 10 billion synapses. Evolution would normally take 10 years to get to human-scale at our current organic rate but if I throw money at building a bigger brain cavity I can squeeze in the 1 trillion to get there today!"

jonknee · on March 12, 2016

FWIW Deep Blue was also a Supercomputer at the time it beat Kasparov. The phone in your pocket can now beat a Grandmaster.

emcq · on March 13, 2016

Extrapolating Deep Blue's 11GFLOPs supercomputer to today with Moore's law would be equivalent to a 70TFLOPs cluster. AlphaGo is using 1+PFLOPs of compute. While they likely aren't actually achieving that compute throughput, to put this in perspective this is the compute scale used to model huge geophysics simulations covering a 800km x 400km x 100km volume with 8+ billion grid points around the San Andreas.

At the very least, it's interesting to see how much more accessible computation has become. Back when I was in school I could only dream of having a cluster of 280 GPUs. When sometimes the dream would come true and you had access to a cluster you would have to wait your turn in the job queue and hope you had enough compute in our resource quota to prevent your job from being terminated.

Now I could spin up a 280 GPU cluster on AWS (after dealing with pesky utilization limits) for only $182/hour. If researchers at Google have been doing this non stop for the past year they have "racked up" $1.6M on just compute. This is a drop in the bucket for a marketing department and the publicity they have achieved. I don't think normal Go AI developers have access to those resources :)

andromeduck · on March 12, 2016

Yeah but Moore's law is dead now so that really doesn't say much anymore.

codehotter · on March 13, 2016

Don't underestimate algorithmic improvements. Today's chess engines running on DeepBlue hardware outperform DeepBlue running on today's hardware.

Modern chess engines are built on a testing infrastructure that makes it possible to measure how each potential change affects the playing strength. This "Testing revolution" has brought massive improvements in playing strength.

For AlphaGo, it's probably the training that requires the most computational resources. The 'distilled knowledge' could perhaps run on a desktop PC. The program would search fewer variations and would be weaker, but if AlphaGo improves further, that version might still be stronger than any human.

seanwilson · on March 13, 2016

My understanding is that the significant part was that before this, throwing more hardware at the available Go AIs still didn't make them competitive against high level players.

Also, it feels like training the AI against many games with lots of hardware is somewhat equivalent to a human progressional who engrossed themselves in the game and trained since childhood.

krackers · on March 12, 2016

One member of the deepmind team responded to this very question during the interview at the beginning of part 3.

He said that the training data set size is much, much larger than the number of lee sedol games. It is like a drop in the ocean and not enough to significantly influence the resulting policy network.

gus_massa · on March 12, 2016

Perhaps the computer didn't know that it was playing against Lee Sedol, but from Wikipedia "As of February 2016, he ranks second in international titles (18), behind only Lee Chang-ho (21)."

I don't know the details of the algorithm and perhaps it doesn't give more explicit weight to his games. But I wouldn't be surprised if after some iterations the algorithm decided to give more implicit weight to the games of the world leaders.

lambda · on March 12, 2016

They mentioned what the training input to AlphaGo was in the paper; a database of a few hundred thousand games from dan ranked players on KGS. This means mostly amateur players, though there are a few professionals who play on KGS as well.

However, this only gets you so far, and the training set is fairly small compared to what you want to really train both the policy and value networks well. So then they had it play millions of games against different versions of itself, training both a new policy network and the value network based on that.

It's unlikely that Lee Sedol's games made much if any impact on AlphaGo's training. It was bootstrapped off of high-level amateur, and some casual pro, play, but from then on it just trained against itself.

sdenton4 · on March 12, 2016

Well, I'm not even sure if they attached individual identities to the games fed in as features...

arcticfox · on March 12, 2016

The implication is that because Sedol wins more tournaments than almost anyone else, if you feed "Tournament winner" or "ELO" etc. as a feature, his games will be much more weighted than others.

So even if AlphaGo isn't explicitly designed to play against his style, it's implicitly trained in it.

But it's pretty subtle and I'm guessing that the volume of high-level tournaments overwhelms any effect, as the AlphaGo team said.

hyperpape · on March 12, 2016

I don't think this is unfair, but I think other people are replying to a suggestion that AlphaGo might be optimized to beat Lee that I don't see in the parent comment.

What does seem right is that whatever strategic insights are in Lee's play are reflected in current games--his and the younger generation who came up in his shadow. Whatever strategic novelties shape AlphaGo, they are totally new to Lee.

I don't think that would make the difference: 1) there's no trick to be learned and 2) the same thing happens with human players to some extent--when Lee Changho appeared on the international go scene, his style was misunderstood and underestimated, even when his games were public.

However, it is true that there is a real asymmetry--AlphaGo may not know Lee from other players, but it has had the opportunity to "study" the best games of the current players, and no one outside of DeepMind has had an opportunity to study its games.

yosefk · on March 12, 2016

As it happened in chess, machines will beat humans in Go no matter how much knowledge about their inner workings will be provided to the human. (I've watched this story unfold in chess and the various hopes of how humans are still somehow better. $1K sez Go is exactly the same story. Can't beat a machine in a formal universe with a defined goal.)

pixl97 · on March 12, 2016

>Can't beat a machine in a formal universe with a defined goal.

That's why the next goal AI developers are pushing towards is incomplete information games.

asdfologist · on March 12, 2016

It's actually entirely possible that if the program were unsupervised, i.e. had to learn Go "from scratch" without relying on any human games, it would be even stronger than it is now.

jonknee · on March 12, 2016

Deepmind is going to work on that next (according to one of the staff interviewed on the broadcast). It will be interesting to see if other playing styles develop.

arcanus · on March 12, 2016

I've very, very suspicious of this statement. Unsupervised learning is generally speaking not as advanced as supervised.

While unsupervised learning might be the holy grail, this victory is really about deep learning, which is an advanced supervised learning technique.

seanwilson · on March 13, 2016

I can't help but feel training it with previous human games is fair as that seems the equivalent of how humans are taught. You don't just explain the rules of Go to somebody and leave them to learn on their own without playing anyone or picking up tips that have been passed along for centuries.

moultano · on March 12, 2016

Even more importantly, the policy network that chooses which move to explore must choose human like moves in order to function correctly because it must choose to explore the correct moves of Alphago's opponent.

hyperpape · on March 13, 2016

That's not right. It just needs to choose equally good or better moves in playouts. It doesn't need to anticipate when its opponent plays bad moves, that's just a bonus. Basically: if you're good enough you don't need psychology, you just play the winning move.

asdfologist · on March 12, 2016

By that logic, AlphaGo would falter against an opponent who plays moves at random.

moultano · on March 12, 2016

I don't think that follows. To beat the machine the move must be both unpredicted and profitable. Random moves are not profitable. Training purely by reinforcement learning rather than on humans could create a policy network that ignores more subtrees that are profitable than the current one does. In short, it isn't good enough for the AI to be good at playing itself, it has to be good at playing every possible player, and while it is playing humans it is sufficient for it to be good at playing every human player.

asdfologist · on March 12, 2016

But in this case, the training data consists of human games, which are flawed. Supervising with flawed data can have unpredictable results.

pixl97 · on March 12, 2016

That depends on the reasons that humans make flaws. If human flaws are mostly errors related to failures in our meat (stress, lack of focus, jittery nerves) that keep us from looking in depth then the algorithm will easily use the good points in each game and with its superior ability to look deep into the future the results are predictable that the machine will win every time.

seanwilson · on March 13, 2016

I think flawed is a strong word. It's likely humans are stuck in locally optimised strategies though.

plank · on March 12, 2016

So what? I could watch every game Roger Federer played in any tennis tournament and still lose all sets to love. It used to be that computers could only do combinatorics better, but that where 'intuition' played a strong part, there was still hope for us humans... Well, guess I will have to start playing Calvinball...

fma · on March 13, 2016

There's a difference between having skills and having knowledge.

In your case you have knowledge but lack skills.

When Federer was losing to Nadal he changed his practice and game to counter Nadal's style. He took knowledge and applied it to his skills.

When Federer came out with SABR people were like wtf. Now they know of it and put more umph on their second serve. See. Knowledge + skills

philovivero · on March 13, 2016

Okay, I have $10k says Plank beats the computer in Calvinball until at least 2020. At that point, I'm only willing to bet $1k/game.

fma · on March 13, 2016

Also assuming you are a tennis player...if you do not study your opponents' shots during warm up, you are doing it wrong.

The way the would play against a lefty is different than a righty. Someone with lots of topspin vs someone who hits flat, a pusher vs power hitter, etc.

21 · on March 12, 2016

Lee Sedol didn't complain about this. In fact, even a few days before the match Lee Sedol was predicting a 5-0 win for him.

If someone is clearly better than you, you can study it's style all you want, it won't make a difference. You probably wouldn't even understand it's style.

fma · on March 13, 2016

That's a very amateurish and closed minded comment.

There's a reason professional sports team and players study the styles of their opponent. Teams employ statisticians for this exact role. Before games, teams study the style of their next opponent. After every after game teams go and study the recordings of their last game.

21 · on March 14, 2016

Which is why I qualified my comment with "clearly better". You can study Usain Bolt's running style all you want, it will still leave you in the dust. Studying makes sense only when the difference is small.

andromeduck · on March 12, 2016

But that wasn't based on the performance of AG but sumlsr AIs

yeukhon · on March 12, 2016

I did not downvote you, FYI.

But I think a computer as a training buddy is indeed a good idea to improve Lee's skill. I don't know how Lee feel right now, he's defeated, there must be an enormous pressure inside of him, but I think he will appreciate the challenge because he needs more challenge! He has played against top players all over the world now for his 20+ years since earning 1 dan rank. Also, I think the computer play against itself and many random moves. Unlike human players, one would expect computers to play unconventionally since the computer can predict so far ahead about the probability of maximizing winning.

fma · on March 13, 2016

I wouldn't say Lee is unchallenged. Ke Jie defeated Lee just before Lee played AlphaGo.

But I would say he might feel like crap having lost his title to a 19 year old, and now losing to a machine all in the same week.

yeukhon · on March 14, 2016

Yes, I am aware of that, but I think he will appreciate more new opponents.

zeroxfe · on March 12, 2016

It's probably not that different from some new hotshot kid that has no recorded history of playing, but got really good playing on the street and studying the masters. Kid gets discovered by some credible major promoter who challenges Sedol to a $1M game.

est · on March 13, 2016

> Sedol should have been allowed to play against AlphaGo for a few months

It's not whether AI can outperform human, it's a matter of when.

randomguy7788 · on March 12, 2016

sedol's game actually has zero bias over how alphago plays. the team lead for alphago himself said that its like a drop in the ocean/bucket for alphago.