Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the CNTK vs. TensorFlow page: https://docs.microsoft.com/en-us/cognitive-toolkit/reasons-t...

> TensorFlow shared the training script for Inception V3, and offered pre-trained models to download. However, it is difficult to retrain the model and achieve the same accuracy, because that requires additional understanding of details such as data pre-processing and augmentation. The best accuracy that were achieved by a third party (Keras in this case) is about 0.6% worse that what the original paper reported. Researchers in the CNTK team worked hard and were able to train a CNTK Inception V3 model with 5.972% top 5 error, even better than the original paper reported!

This suggests that an improvement in accuracy is possible by switching to CNTK, hence why I included an accuracy metric from both frameworks. (and also for sanity checking, as you note)



No, if you re-read the first sentence there, it says that the different results are attributed to differences in "pre-processing and augmentation". The choice of NN framework is essentially irrelevant.


Note, though, that the preprocessing and augmentation is (at least in TF) done within the framework itself. I helped debug the pure-TensorFlow version of the Inception input pipeline, and getting it to match the earlier DistBelief version was agonizing -- it really shows all of the differences (and bugs) in the image processing ops. And there can be subtle effects -- differences in which image resizing algorithm you use, for example.

But it's worth noting that this code is all released:

https://github.com/tensorflow/models/blob/master/inception/i...

It may be hard to replicate that across all platforms, though -- as an example, the distortions include using four different image resizing algorithms.

Some of it was true preprocessing, i.e., cleaning up the imagenet data. I wrote a bit about that here: https://da-data.blogspot.com/2016/02/cleaning-imagenet-datas...

(tl;dr - there are some invalid images and bboxes, etc., and some papers chose to deal with the "blacklisted" images differently.)


Should be irrelevant... It's still worth testing though, to see that the default implementations are actually doing what it says on the box...


That's not really related to the network. When you build a NN, you shouldn't expect different accuracy using different frameworks (given that you're using the same hyper parameters and training/evaluating on the same data).


I mean this in the nicest possible way (so please don't take offense), but I think you're missing what is being said there. The framework itself should have no impact, positive or negative, on model accuracy. That being said, it can be extremely challenging to reproduce results given the stochasticity of random batches and asynchronous updates. Furthermore, precisely specifying the methods of data augmentation can be tedious, thus the protocol is often only partially detailed in published work which further exacerbates the challenges of reproducing results.


We could introduce a simple rule to get rid of this kind of problem: if it can't be reproduced with the data supplied then it isn't true so you can't publish.


Not to mention specifying the random seed and PRNG algorithm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: