Identifying the (near) optimal program variants an optimizing and parallelizing compiler should generate is known to be difficult. Au- totuning is the best solution to navigate the often high-dimensional space of possible options. However, to be practical an autotuner should (a) have high convergence speed and (b) be robust in face of varying inputs. Current techniques for offline tuning, where convergence speed is less important, provide solutions only for known inputs, whereas online tuning can be input sensitive but currently lacks in convergence speed. In this paper, we present hierarchical online-autotuning, a novel technique to exploit struc- ture in the search space and the underlying tuning problem to increase convergence speed during online tuning. By modeling symmetries and redundancies in configurations and by exploiting domain knowledge to predict performance we reduce the search space size by orders of magnitudes. Combining our tuner with a polyhedral parallelizing compiler for GPUs, we show that the performance of a GEMM GPU kernel generated with default pa- rameters is increased by 6× and that the convergence speed of the tuning process is increased by a factor of up to 1.7 compared to OpenTuner. With hierarchical tuning we make the deployment of always-on online-autotuning practical.