The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This talk reviews recent work in the area of unsupervised feature learning and deep learning, focusing on advances in understanding the probabilistic and geometric (manifold) aspects of regularized auto-encoders. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning. Finally, the talk will briefly discuss the important question of why training deep or recurrent networks is difficult (and important to scale them towards AI) and recent advances in this regard.