I am currently working on my master's thesis on the topic of applying this type of knowledge to build a reinforcement learning system with spiking neural networks. The role of dopamine is crucial in learning.
By combining spike-timing-dependent plasticity (STDP) with the reward (R-STDP), it is possible to address the spatial eligibility trace (which synapse caused the reward?). However, the design of the reward/utility function is one core issue, where this paper probably advances the field. I am currently looking into ways to balance the reward. Without balancing each trial causes more long term depression than long term potentiation, so that after a while the end the network dies. A different reward function results in the opposite.
Another issue is the distal reward problem: Which event correspondents to which activity?
I would be very glad to discuss related questions or exchange ideas with other experts or newcomers to this growing field. My mail is in my profile.
This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.
> This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.
The link between predictability and art has been raised various times in the literature, even before Schmidhuber, but he had particularly interesting ideas about it: http://people.idsia.ch/~juergen/beauty.html
> This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.
I think that this explains my style to some extent [1]. One of the main themes behind what I write is that it should sound 1) original (less familiar) and 2) there needs to be an interplay between musical closure and tension, especially with half tones.
Is there any research that spiking/dopamine type learning is good at "animal level" behaviour, but abstract and complex thinking is enabled by different mechanisms ?
On the AI side of the fence the approach has been "let's see just how far Reinforcement Learning can take us, and then start making up stories (hypothesis) about what the secret ingredients are that are missing." On the neuroscience side of things my sense is that that's not a question that can be empirically answered any time soon. This experiment was interesting because they new what they were looking for going in. "What algo are these cells running?" is a hard question, "are these cells' firing activities consistent with this given algo is comparatively easy. Inference vs hypothesis testing.
Great blog post on great research. Worth reading in its entirety.
Summarizing at a very high level abstraction: This work compares a mechanism used for learning probability distributions of expected rewards in deep reinforcement learning systems to the dopamine reward mechanism in mice brains.
This passage near the end, in particular, caught my eye :
> ...our final question was if we could decode the reward distribution from the firing rates of dopamine cells [in mice brains]. As shown in Figure 5, we found that it was indeed possible, using only the firing rates of dopamine cells, to reconstruct a reward distribution (blue trace) which was a very close match to the actual distribution of rewards (grey area) in the task that the mice were engaged in. This reconstruction relied on interpreting the firing rates of dopamine cells as the reward prediction errors of a distributional TD model, and performing inference to determine what distribution that model had learned about.
In other words, mice brains seem to be using the same mechanism, and it appears we can decode the probability distribution of expected rewards learned by those brains by measuring only the firing rate of dopamine cells.
Can we use brain scanners and ML to A/B test online lectures to perfection?
For example, you can show the top 20 Calculus 2 courses to groups of 50 people each, all dawning brain scanners, and create a “brain activation map” for each class from each professor. Among students of the top 10% of professors (as measured by exam results and brain activation), we can analyze the most engaging moments in each course, and hybridize them into a master course. Furthermore, we can analyze differential learning outcomes in males, females, students of different races, and K-clustered psychographic profiles (based on DMN activity and other neurological measures taken before the course).
If the learning outcomes are significantly different, then it may be more appropriate to create several different master courses for the people that showed different learning outcomes. A class can be recommended, Netflix style, based on your demographics, neural activation patterns, and learning velocity from past coursework.
Everyone would have the best Calculus 2 class, into perpetuity, after one cycle of experimentation and master class creation. The same can be done for all canonical coursework, from Kindergarten through the top undergrad majors to Med/Law/MBAs. The aforementioned learning outcome differentials would be lessened by the “no child left behind” effect of each kid getting a solid grasp of math and science early on with neurologically tailored coursework.
Neural marketing uses EEG headsets and eye tracking. I searched for EEG + focus detection, and found the 2018 study linked below. Excerpt: “ the best obtained classification accuracies were 77% and 83%, respectively, using SVM binary classifiers.” They used ML for attentiveness classification. I’m sure greater accuracy can be achieved with time, but that’s a good start. It would be even better if we had wearable MRIs.
Maybe instead, we just freaking shouldn’t allow applications of this sort of research at all. This is stuff that would have been a scandal if the CIA did it, just because it makes money doesn’t mean it’s okay.
Don’t get me wrong, I am not a fan of neural marketing. I just mentioned it in response to your comment, and only to indicate that the same scanning hardware can be used in the education space.
I can partly understand some this based on an EE101-level control theory, a High School level model of how neurons work, and 3blue1brown's intro to neural nets[1]. However, I have a strong personal interest in developing a much deeper understanding of dopamine neurons and their role in Executive Function.
Can anyone recommend a good curriculum which can take a random web developer from "Knows what a myelinated axon, a sigmoid function, and a feedback loop are" to having a solid enough background to dive into the research on this?
This seems to so far only address learning in a context where the reward appears shortly after the behavior which caused it. That is valuable to understand, but seems like it fails to yet explain how Executive Functions work.
> What happens if an individual's brain “listens” selectively to optimistic versus pessimistic dopamine neurons? Does this give rise to impulsivity, or depression?
My intuition is that impulsivity would arise as a result of giving much greater weight to the signals of a very recently-trained network than to a less-recently trained network.
This all raises a few questions for me:
1) How does a brain recognize reward in order to fire the signal which trains a dopamine network? It seems straightforward for the taste of food, a hug from a fellow-tribesman, or a bell from playing a video game for 5 hours straight (in a simulated Atari environment).
How does a brain recognize reward while it is (for example) writing a teacher-assigned essay for an unclear audience or a python program without Test-Driven Development?
What does the brain use as its leading-KPIs?
2) How does a brain select how much it listens to different networks which predict rewards from different actions so it is "robust to changes in the environment or changes in the policy". How does the brain adjust its attention in response to changing context?
Mere speculation but maybe there are multiplexed temporal-difference-ish networks that correlate different reward frequencies per basis (like the different x[k] for the Fourier transform of a signal x[n])
By combining spike-timing-dependent plasticity (STDP) with the reward (R-STDP), it is possible to address the spatial eligibility trace (which synapse caused the reward?). However, the design of the reward/utility function is one core issue, where this paper probably advances the field. I am currently looking into ways to balance the reward. Without balancing each trial causes more long term depression than long term potentiation, so that after a while the end the network dies. A different reward function results in the opposite. Another issue is the distal reward problem: Which event correspondents to which activity? I would be very glad to discuss related questions or exchange ideas with other experts or newcomers to this growing field. My mail is in my profile.
This paper also got me thinking if this mechanism might explain why we enjoy listening to music. Music follows and breaks rules, which our brain continuously tries to predict and most of the time it succeeds.