From the CNTK vs. TensorFlow page: https://docs.microsoft.com/en-us/cognitive-to...

benjismith · on June 12, 2017

No, if you re-read the first sentence there, it says that the different results are attributed to differences in "pre-processing and augmentation". The choice of NN framework is essentially irrelevant.

dgacmu · on June 13, 2017

Note, though, that the preprocessing and augmentation is (at least in TF) done within the framework itself. I helped debug the pure-TensorFlow version of the Inception input pipeline, and getting it to match the earlier DistBelief version was agonizing -- it really shows all of the differences (and bugs) in the image processing ops. And there can be subtle effects -- differences in which image resizing algorithm you use, for example.

But it's worth noting that this code is all released:

https://github.com/tensorflow/models/blob/master/inception/i...

It may be hard to replicate that across all platforms, though -- as an example, the distortions include using four different image resizing algorithms.

Some of it was true preprocessing, i.e., cleaning up the imagenet data. I wrote a bit about that here: https://da-data.blogspot.com/2016/02/cleaning-imagenet-datas...

(tl;dr - there are some invalid images and bboxes, etc., and some papers chose to deal with the "blacklisted" images differently.)

sdenton4 · on June 12, 2017

Should be irrelevant... It's still worth testing though, to see that the default implementations are actually doing what it says on the box...

sirfz · on June 12, 2017

That's not really related to the network. When you build a NN, you shouldn't expect different accuracy using different frameworks (given that you're using the same hyper parameters and training/evaluating on the same data).

ntenenz · on June 12, 2017

I mean this in the nicest possible way (so please don't take offense), but I think you're missing what is being said there. The framework itself should have no impact, positive or negative, on model accuracy. That being said, it can be extremely challenging to reproduce results given the stochasticity of random batches and asynchronous updates. Furthermore, precisely specifying the methods of data augmentation can be tedious, thus the protocol is often only partially detailed in published work which further exacerbates the challenges of reproducing results.

jacquesm · on June 12, 2017

We could introduce a simple rule to get rid of this kind of problem: if it can't be reproduced with the data supplied then it isn't true so you can't publish.

nerdponx · on June 12, 2017

Not to mention specifying the random seed and PRNG algorithm.