This is the same issue as #117, except that now the problem arises only when max-pooling layer uses cuDNN engine. The reason is probably the same: cuDNN's max-pooling compares inputs and outputs of the layer to perform backward pass.
How to reproduce: train @mavenlin's network for CIFAR-10 using cuDNN-enabled Caffe. The training gets stuck after few thousand iterations. However, if pool1 layer is switched to Caffe engine, the training proceeds just fine.