Dopamine and Temporal Difference Learning

BSVogler · on Feb 9, 2020

I am currently working on my master's thesis on the topic of applying this type of knowledge to build a reinforcement learning system with spiking neural networks. The role of dopamine is crucial in learning.

By combining spike-timing-dependent plasticity (STDP) with the reward (R-STDP), it is possible to address the spatial eligibility trace (which synapse caused the reward?). However, the design of the reward/utility function is one core issue, where this paper probably advances the field. I am currently looking into ways to balance the reward. Without balancing each trial causes more long term depression than long term potentiation, so that after a while the end the network dies. A different reward function results in the opposite. Another issue is the distal reward problem: Which event correspondents to which activity? I would be very glad to discuss related questions or exchange ideas with other experts or newcomers to this growing field. My mail is in my profile.

This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.

longtom · on Feb 10, 2020

> This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.

The link between predictability and art has been raised various times in the literature, even before Schmidhuber, but he had particularly interesting ideas about it: http://people.idsia.ch/~juergen/beauty.html

jcims · on Feb 10, 2020

Great episode with Schmidhuber on Lex Fridman’s podcast.

https://youtu.be/3FIo6evmweo

I’m coming at this from a career in infosec, never heard of the guy but really enjoyed his thought processes.

temporaryPotato · on Feb 10, 2020

> This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.

I think that this explains my style to some extent [1]. One of the main themes behind what I write is that it should sound 1) original (less familiar) and 2) there needs to be an interplay between musical closure and tension, especially with half tones.

[1] https://soundcloud.com/phillip-janvanzyl/spiral

so_tired · on Feb 10, 2020

Is there any research that spiking/dopamine type learning is good at "animal level" behaviour, but abstract and complex thinking is enabled by different mechanisms ?

deephony · on Feb 12, 2020

On the AI side of the fence the approach has been "let's see just how far Reinforcement Learning can take us, and then start making up stories (hypothesis) about what the secret ingredients are that are missing." On the neuroscience side of things my sense is that that's not a question that can be empirically answered any time soon. This experiment was interesting because they new what they were looking for going in. "What algo are these cells running?" is a hard question, "are these cells' firing activities consistent with this given algo is comparatively easy. Inference vs hypothesis testing.

cs702 · on Feb 9, 2020

Great blog post on great research. Worth reading in its entirety.

Summarizing at a very high level abstraction: This work compares a mechanism used for learning probability distributions of expected rewards in deep reinforcement learning systems to the dopamine reward mechanism in mice brains.

This passage near the end, in particular, caught my eye :

> ...our final question was if we could decode the reward distribution from the firing rates of dopamine cells [in mice brains]. As shown in Figure 5, we found that it was indeed possible, using only the firing rates of dopamine cells, to reconstruct a reward distribution (blue trace) which was a very close match to the actual distribution of rewards (grey area) in the task that the mice were engaged in. This reconstruction relied on interpreting the firing rates of dopamine cells as the reward prediction errors of a distributional TD model, and performing inference to determine what distribution that model had learned about.

In other words, mice brains seem to be using the same mechanism, and it appears we can decode the probability distribution of expected rewards learned by those brains by measuring only the firing rate of dopamine cells.

Very exciting!

keenmaster · on Feb 9, 2020

Can we use brain scanners and ML to A/B test online lectures to perfection?

For example, you can show the top 20 Calculus 2 courses to groups of 50 people each, all dawning brain scanners, and create a “brain activation map” for each class from each professor. Among students of the top 10% of professors (as measured by exam results and brain activation), we can analyze the most engaging moments in each course, and hybridize them into a master course. Furthermore, we can analyze differential learning outcomes in males, females, students of different races, and K-clustered psychographic profiles (based on DMN activity and other neurological measures taken before the course).

If the learning outcomes are significantly different, then it may be more appropriate to create several different master courses for the people that showed different learning outcomes. A class can be recommended, Netflix style, based on your demographics, neural activation patterns, and learning velocity from past coursework.

Everyone would have the best Calculus 2 class, into perpetuity, after one cycle of experimentation and master class creation. The same can be done for all canonical coursework, from Kindergarten through the top undergrad majors to Med/Law/MBAs. The aforementioned learning outcome differentials would be lessened by the “no child left behind” effect of each kid getting a solid grasp of math and science early on with neurologically tailored coursework.

sjg007 · on Feb 9, 2020

You could probably just scan facial or eye movement reactions.

taurath · on Feb 9, 2020

Why not brain scanners to optimize ads? Oh right they are :(

keenmaster · on Feb 10, 2020

Neural marketing uses EEG headsets and eye tracking. I searched for EEG + focus detection, and found the 2018 study linked below. Excerpt: “ the best obtained classification accuracies were 77% and 83%, respectively, using SVM binary classifiers.” They used ML for attentiveness classification. I’m sure greater accuracy can be achieved with time, but that’s a good start. It would be even better if we had wearable MRIs.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6263653/#!po=0....

taurath · on Feb 10, 2020

Maybe instead, we just freaking shouldn’t allow applications of this sort of research at all. This is stuff that would have been a scandal if the CIA did it, just because it makes money doesn’t mean it’s okay.

keenmaster · on Feb 10, 2020

Don’t get me wrong, I am not a fan of neural marketing. I just mentioned it in response to your comment, and only to indicate that the same scanning hardware can be used in the education space.

zeristor · on Feb 10, 2020

Will this end up with Max Headroom's blipverts?

keenmaster · on Feb 10, 2020

Without regulations, the question isn’t “will” but “when?”

undergrowth54 · on Feb 9, 2020

I can partly understand some this based on an EE101-level control theory, a High School level model of how neurons work, and 3blue1brown's intro to neural nets[1]. However, I have a strong personal interest in developing a much deeper understanding of dopamine neurons and their role in Executive Function.

Can anyone recommend a good curriculum which can take a random web developer from "Knows what a myelinated axon, a sigmoid function, and a feedback loop are" to having a solid enough background to dive into the research on this?

[1] https://www.youtube.com/watch?v=aircAruvnKk

westoncb · on Feb 10, 2020

Kind of an odd title and book, but I think it may do a good job of what you're looking for:

"Principles of Neural Design": https://www.amazon.com/Principles-Neural-Design-MIT-Press/dp...

undergrowth54 · on Feb 10, 2020

That looks like exactly the sort of thing I'm after.

scribu · on Feb 9, 2020

Well, there's a major called Behavioral Neuroscience [1], which sounds like what you're after.

Or you could do a whole undergraduate program just on neuroanatomy. [2]

There's also this subfield called Cognitive Neuroscience [3]

(I'm not an expert either.)

[1] https://psychology.nova.edu/undergraduate/behavioral-neurosc...

[2] https://neuro.ucr.edu/courses/nrsc200a

[3] https://en.wikipedia.org/wiki/Cognitive_neuroscience

shmageggy · on Feb 10, 2020

Others mentioned the neural side, but I think that's ancillary to the main idea of the paper which is about RL. David Silver's lectures on YouTube are excellent: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

undergrowth54 · on Feb 9, 2020

This seems to so far only address learning in a context where the reward appears shortly after the behavior which caused it. That is valuable to understand, but seems like it fails to yet explain how Executive Functions work.

> What happens if an individual's brain “listens” selectively to optimistic versus pessimistic dopamine neurons? Does this give rise to impulsivity, or depression?

My intuition is that impulsivity would arise as a result of giving much greater weight to the signals of a very recently-trained network than to a less-recently trained network.

This all raises a few questions for me:

1) How does a brain recognize reward in order to fire the signal which trains a dopamine network? It seems straightforward for the taste of food, a hug from a fellow-tribesman, or a bell from playing a video game for 5 hours straight (in a simulated Atari environment).

How does a brain recognize reward while it is (for example) writing a teacher-assigned essay for an unclear audience or a python program without Test-Driven Development?

What does the brain use as its leading-KPIs?

2) How does a brain select how much it listens to different networks which predict rewards from different actions so it is "robust to changes in the environment or changes in the policy". How does the brain adjust its attention in response to changing context?

pizza · on Feb 9, 2020

Mere speculation but maybe there are multiplexed temporal-difference-ish networks that correlate different reward frequencies per basis (like the different x[k] for the Fourier transform of a signal x[n])

bradknowles · on Feb 9, 2020

Is anyone else having problems with just getting a blank page when trying to load that site?

Or is it just me on iOS?

riwsky · on Feb 9, 2020

I hit this, and the solution for me was to turn off my content blocker.

keyle · on Feb 9, 2020

"content" blocker :)

alexcnwy · on Feb 10, 2020

Awesome work!!