Accurately estimating the retrieval effectiveness of different queries representing distinct information needs is a problem in Information Retrieval (IR) that has been studied for over 20 years. Recent work showed that the problem can be significantly harder when multiple queries representing the same information need are used in prediction. By generalizing the existing evaluation framework of Query Performance Prediction (QPP) we explore the causes of these differences in prediction quality in the two scenarios. Our empirical analysis demonstrates that for most predictors, this difference is solely an artifact of the underlying differences in the query effectiveness distributions. Our detailed analysis also demonstrates key performance distribution properties under which QPP is most and least reliable.