Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Understanding Convolutions (2014) (colah.github.io)
129 points by headalgorithm on March 2, 2019 | hide | past | favorite | 21 comments


Hmm, so I'm reading The Hundred Page Machine Learning Book [0] which gives an overview of the field, and one of the sections is on convolutional neural networks, and it was very crisp and clear (like the whole book, really; highly recommended). I understand since it's a brief summary it's probably eliding stuff, but it's so much shorter and more understandable than this blog post that it makes me wonder if it's really eliding a lot of crucial stuff, or if this post is just... not clear.

In the book, it walks you through the convolution of an input matrix with a smaller filter matrix. You start with lining up the smaller matrix in the top left corner of the input matrix, and you multiply each overlapping square together and add them all up. That's the value of the top left element of the result (the result is a matrix). Then you slide the filter to the right, repeat the calculation, and that's the value one to the right in the result matrix. You repeat, panning and scanning across the input matrix and in this fashion fill in the the whole result matrix.

I thought I had it, but this blog post just confused things. I do remember learning about convolutions back in undergrad and being confused then, too, so that sounds about right. When I read the ML handbook, I had a "wait, is that it?" thought and was confused why I was confused back in the day. But I guess that's just one particular way of doing it, or one use case or something.

[0] https://www.amazon.com/Hundred-Page-Machine-Learning-Book/dp...


Convolution is essentially polynomial multiplication. That's how I think about it.

You remember (a + b)(c + d) = ac + ad + bc + bd?

Or (2x^2 + 1)(4x^3 + x) can be calculated by doing conv([0,2,0,1], [4,0,1,0]) (the numbers represent the coefficients). The results are the coefficients of the resulting polynomial.

As the polynomials grow larger this gets harder and harder computationally. Convolution theorem lets you use Fourier to you convert this multiplication into a pairwise multiplication so that

fft(f * g) = fft(f) .* fft(g)

where * is convolution and .* is pairwise multiplication.


It’s because you looked at the wrong post. You want this one: http://colah.github.io/posts/2014-07-Conv-Nets-Modular


your understanding is not wrong, for the basic version of convolution typically used in neural networks. however, in practice the most optimized neural net libraries don't actually implement the operation in that way.

as mentioned at the end of the post, if you understand things in terms of functions rather than just multiplying with a sliding window, you can then make use of Fourier transforms, and this enables much faster algorithms to compute the same result.


If anyone's interested in learning more, the clearest explanation I have ever seen of convolutions is here:

http://www.dspguide.com/ch6.htm

Pedagogically speaking, that book is one of the best technical books I have ever read.


I agree. That book is excellent.


I would think that a tutorial on convolutions this extensive would mention the convolution theorem. Definitely the easiest way to visualize for most applications. Or is Fourier space not assumed to be known by the reader?


I had a similar thought the first time I read this a few years back. I mean, I get what the author is trying to say here (e.g. the ball dropping/rolling probability thing), but it’s just a little more convoluted (ouch) than it needs to be, IMO.


The introduction to convolution via signals and systems is definitely simpler. You can basically introduce the Dirac delta (impulse), introduce the idea of the impulse response, and then point out that every function is just a series of impulses scaled by the function value. Then convolution is just the sum of the impulse responses. (This description is probably leaving mathematicians quietly screaming, but it's just a short skip from that understanding to a more formal one.)

I found the article a bit roundabout. Perhaps I'm not part of the intended audience; I understand convolutions, but have only limited familiarity with the internals of neural networks.


> Fourier space not assumed to be known by the reader

This sounds like a big assumption. I think this is a deep-learning oriented blog post, and not for people with a background in systems & signals (basically information theory). I don't understand mathematical convolutions super well (I know the vague derivation, of integration of f(t)g(t-T)dT, and the metaphor of multiplication in frequency space), but I do get how convolutions work with neural networks, and maybe this is a failure on my part, but it doesn't feel like there's a super strong connection between the two. Is there something I'm missing out on?


I didn't understand this sentence and stopped reading:

"How likely is it that a ball will go a distance c if you drop it and then drop it again from above the point at which it landed?"


Okay so I was confused by this too. I think it is especially especially deceiving because it specified that the ball only moved in one axis of motion, but it's actually two (falling, and then rolling over). I also tend to get very frustrated with articles when they do a terrible job of explaining things, because it can feel like an impassable wall for the reader, where they just stop and don't continue. But in this case, they made up for their (really really) terrible wording with a (really really) clear image describing exactly what they meant. It did take like a minute of staring at the image to figure out what they meant, but sorry, this is a blog post about math, that's what it's going to take regardless. It's unfortunate that they couldn't communicate their thoughts more, but it's kinda brutal to just stop reading because of one sentence, when they put all this effort into making a diagram to explain that one sentence


The example problem just seems ridiculously underspecified at first. It shouldn't be necessary to read between the lines in order to decipher what the author's mental model was.


“well, it’s like autocorrelation, but upside down and backwards!”


Convolutions show up in a lot of places. Multiplying polynomials is a convolution. This provides an interesting technique to create all the partial sums of a sequence. If you have a generating function for the sequence, divide it by (1-x). You'll get a new generating function where the nth coefficient is the sum of the first n coefficients of the original function. Remember 1/(1-x)=1+x×x^2+... ?


I recently published a new crazy convolution formula at https://arxiv.org/abs/1902.08982 (see last formulæ of the paper). I just submitted it to a journal and am waiting for the answer. Discussion is welcome!


For easier reading and referencing you could specify the total big-O cost of your algorithm right in the beginning (or even in the abstract). And an indication on which range of input sizes it is likely to outperform karatsuba multiplication before being outperformed by the fft.


Could you provide a PoC convolutional neural network code using your method? So we can then compare compute/memory intensity?

p.s. I’m assuming of course that your method is functionally equivalent to the standard convolution. Is it?


Convoluted explanation only to cite the epic wikipedia gif

Fouri-ous there's no mention of Fourier https://en.wikipedia.org/wiki/Convolution_theorem


so a convolution is simply a method to calculate a total likelihood?

why do we say convolution instead of total likelihood, which is so much more to the point the goal that is trying to be achieved?


Convolution is a tool with many applications, so naming it for just one of those applications would make it harder to see its usefulness in the others




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: