Isn't it just for inference? Also, differentiating thru an analog circuit looks... interesting. Keep inputs constant, wiggle one weight a bit, store how the output changed, go to the next weight, repeat. Is there something more efficient, I wonder.
Definitely, if your analog substrate is implementing matrix vector multiplications (one of the most common approaches in this area). Then your differentiation algorithm is the usual backpropagation, which has rank 1 weight updates. With some architectures this can be implemented in O(1) time simply by running the circuit in "reverse" configuration (inputs become outputs and vice-versa). With ideal analog devices, this would be many orders of magnitude more efficient than a GPU.
It is already common practice to deliberately inject noise into the network (dropout) at rates up to 50% in order to prevent overfitting.