Multi-item rating scales are the accepted solution for achieving reliable and valid measures in the social sciences. Issues not fully resolved include the optimal number of response categories, choice of semantic rating versus Likert form, and the appropriateness of mixing positively and negatively expressed items. While there is considerable empirical research on these issues, it addresses the scaling of respondents and is yet to produce consensus as to the most appropriate practice. In marketing, multi-item scales are not only used to scale consumer respondents, they are used to scale marketing stimuli. This article examines these response format issues when the primary objective is to scale marketing stimuli rather than consumers using generalisability theory criteria for data quality. G-study website assessment data using different response formats are used to compare their effects on the observed variance components and G-coefficients for websites. Conclusions are drawn for the most appropriate response format to use in marketing studies that scale marketing stimuli.