This paper describes how, based upon item response theory (IRT) and its differential item functioning (DIF), two studies were designed to address two important issues – adopting effective items or inviting proper respondents – involved in the identification of successful new concepts, to test new concepts with different levels of newness. Study One shows that some items in a multi-item scale better discriminate among concepts with low or high viability, and that tailored selections of items are necessary when testing major innovation or minor improvement concepts. Study Two pinpoints that choosing an effective source of respondents is important to identify popular movies. Although evaluations from ordinary moviegoers are generally more discriminating among movies with different popularity, those from professional critics are more effective for movies of unfamiliar genres. The implementation of IRT and DIF in both studies demonstrates an effective two-step benchmarking procedure for picking up winners for different new concepts.