Tuesday, September 20, 2005

South African J. Psychology, 1979, 9, 104- 107.

Is the dogmatism scale irreversible?

J.J. Ray

School of Sociology, University of New South Wales, Australia

A review of various attempts to produce balanced versions of the Rokeach Dogmatism Scale ('D' scale) shows that the highest positive-negative correlation obtained on a general population sample was -0.32. This was obtained with the Ray (1974a) 'BD' Scale Mark II. An attempt to improve on this figure was made by administering only the best eight items of this scale to a community sample of 87 Sydney people. The positive-negative correlation obtained was -0.37. It was pointed out that Rorer's (1965) work showed only that there was no such thing as a general tendency to acquiesce. It did not rule out acquiescence as a response to ambiguity (Peabody 1966). Balanced scales are therefore still necessary. It was concluded that the 'D' Scale may be inherently and unacceptably ambiguous.

J.J. Ray
School of Sociology, University of New South Wales, P.O. Box 1, Kensington, New South Wales, Australia.

Accepted August 1979

The title of this paper is a deliberate allusion to the influential article by Christie, Havel and Seidenberg (1956) wherein the history of attempts to balance the 'F' Scale up to that time was reviewed and the conclusion drawn that the `F' Scale in fact was irreversible. Subsequent research has shown that conclusion to be in fact far too pessimistic (Lee & Warr 1969; Ray 1972a), but it will nonetheless be the contention of the present paper that what turned out not to be true of the 'F' Scale may be much more true of the 'D' Scale.

The essence of the approach that led to a successful balancing of the 'F' Scale appears first to have been suggested by Byrne and Bounds (1964). It was simply that reversed items should be selected not on any a priori criterion but rather on the empirical criterion of whether they did in fact correlate in the expected way with positive items.

Although not as extensive a literature as that concerned with the 'F' Scale, there is nonetheless also a history of attempts to balance the Rokeach (1960) 'D' Scale. Peabody (1961), Haiman and Duns (1964) and Stanley and Martin (1964) all used the a priori method of item selection that had failed with the 'F' Scale and found that it also failed with the 'D' Scale. Ray (1970) however used empirical methods of item selection and produced a balanced scale which showed correlations between its two types of item of up to -0.71. This might seem the sort of result that one could hardly cavil at but unfortunately this 'D' Scale version was constructed using second-year psychology students as subjects and it was reported in the same article that when the scale was applied to a more representative sample in the community the correlation dropped to only - 0.22. This was, of course, precisely the sort of result that had led the 'F' Scale to be viewed as "irreversible". Kirton (1977) has also confirmed that this balanced 'D' Scale does not work with a general population sample.

An obvious question, then, is: `Does this balanced "D" Scale at least work with other student samples?. Ray (1974a) reported a replication study wherein the correlation between the two halves was -0.40. One interesting aspect of this result however is to be found in Table 1 of Ray & Martin (1974). It is there shown that when the sample used to get this overall result is broken up into first, second and third year psychology students the correlations as respectively -0.34, -0.37 and -0.51. Evidently, then, the balancing is only really satisfactory with the more sophisticated respondents.

A further question that does then obviously arise is what the correlation would be like with students who had no background in academic psychology at all? Rigby (1978 pers. comm.) examined this using 155 students at the South Australian Institute of Technology. He found a correlation of -0.077. Even for most students, then, the Ray (1970) Balanced Dogmatism scale may not be satisfactory.

This does, however, leave one remaining claimant for the title of an adequate balanced `D' Scale. In addition to the replication study mentioned above, Ray (1974a) also reported two other attempts to balance the 'D' Scale. The Mark II version showed correlations between its two halves of -0.32 on a general population sample and -0.34 on a first-year psychology student sample. The Mark III version showed a correlation of -0.27 on a general population sample. In a further replication not previously reported, the Mark II Scale was administered to 117 Teacher's College students. The pos.-neg. correlation there observed was -0.38. While this correlation of the Mark II Scale is not therefore high, it does at least seem to be fairly stable.

It was therefore decided to use the Mark II Scale as the starting-point for a fourth attempt to construct a balanced 'D' Scale by empirical methods.


It was felt that there might be in the Mark II 'BD' Scale at least some positive and negative items that did show a high correlation between themselves. It was decided therefore to select a small sub-set of the Mk II items which had shown the highest item-total correlation in previous item analyses. Ten such items were selected, five negative and five positive. Since the most general of the usual internal reliability coefficients (Cronbach's (1951) 'alpha') may be calculated as average inter-item correlation weighted by number of items (Ray 1972b), this procedure might seem bound to reduce the reliability of the scale by virtue of reducing the number of items. Since it was however hoped that the procedure would increase the pos.-neg. correlation, it seemed possible that a rise in the other element in the weighting (average inter-item correlation) might substantially offset this effect and leave a scale with still acceptable reliability. Selecting only 10 items, then, was designed to maximize the pos.-neg. correlation while still leaving some variety in item content.

The 10 selected items were administered together with items of other scales to a community sample of 87 Sydney people. This sample was gathered by asking students to give the questionnaire to people they knew under the constraint that the people selected were not to be students and were preferably to be in manual occupations. The sample so gathered turned out to show no significant differences in demographic characteristics from other samples gathered in the Sydney area by more usual random door-to-door methods. The present sample may then be regarded as an adequate quota sample of the Sydney area.


On item analysis, the ten-item scale showed two negative items which had virtually no correlation at all with the scale total. This again shows the difficulty of obtaining successful 'D' reversals. When these were deleted, the remaining eight-item scale (hereafter to be referred to as the Mark IV BD Scale) showed an internal reliability of 0.53 and a pos.-neg. correlation of -0.373. Again, then, a high pos.-neg. correlation could not be produced.

Interpretation of this result may be assisted if it is placed in the context of results obtained with the Ray (1972a) 'Balanced F Scale'. As reported in Ray (1972a) this scale showed an initial pos.-neg. correlation on a community sample of - 0.71, a pos.-neg. correlation on students of -0.53 and a pos.-neg. correlation on a random doorstep sample of -0.56. In other results not previously reported, a random cluster sample of the Sydney area (N = 95) gave a pos.-neg. correlation of -0.65 and a sample of 47 University of Sydney first-year psychology students gave a pos.-neg. correlation of -0.70. A 14 item short form of the BF Scale was also included in a national mail sample by a commercial polling organization, giving a pos.-neg. correlation of -0.50 and a reliability of 0.80 with an N of 200. Given the present results, it must appear that the F Scale is 'much more reversible' than the `D' Scale.

The 'expected value' of the pos.-neg. correlation (rPN) for an adequate balanced scale is not inherently obvious but one approach would be to compare the obtained value with the correlation to be expected from a random split of the scale into two halves. How does the particular split giving rise to rPN, compare with an average split into any two halves? An r for such an average split can in fact be rather easily obtained by 'de-correcting' the reliability coefficient alpha' with the Spearman-Brown formula applied in reverse, i.e. r = alpha/2-alpha.

Applying this formula to the alphas for the balanced 'F' and 'D' (Mk. II) Scales on their norming samples (in fact the same sample) gives rs of 0.76 and 0.71. This compares with observed values for rPN of 0.71 and 0.32 respectively. Clearly the correlation between positive and negative items of the 'BF' Scale is typical of the correlation between any set of items in that scale but this is not at all true of the Mark II 'BD' Scale.

Applying the same formula to the present Mk IV `BD' Scale gives a value of 0.36 for average r. While this is very close to the observed value (0.37) of rPN, the convergence occurs only because the level of alpha is on this occasion unacceptably low. In addition, the structure they do have could reflect the fact that most of them would have some use in conventional discussions about the tenability of religion. They are listed below with reversed items marked 'R'.

1. Do you agree with the saying: `Eat, drink and be merry for to-morrow we may die'? R

2. The `one true faith' is a myth. R

3. Man on his own is a helpless and miserable creature.

4. Do you think it is possible to really live without believing in any great cause? R

5. Of all the different philosophies that exist in the world, there is probably only one which is correct.

6. When it comes to differences of opinion in religion, we must be careful not to compromise with those who believe differently from the way we do.

7. It is annoying to listen to a speaker or teacher who seems unable to make up his mind about what he really believes.

8. It is only when a person devotes himself to an ideal or cause that life becomes meaningful.

Another way in which a balanced scale can be evaluated is in terms of its reliability as a measure of acquiescence. If the two halves of opposite meaning are scored as if they were alike (i.e. without reversals), the total score becomes an index almost entirely of acquiescence with item meaning controlled for. The more meaningful the items (i.e. the greater the oppositeness in the way they are responded to) the less internally consistent the scale so scored should be. The Mk II `BD' Scale so scored generally shows internal reliabilities (`alpha') of around 0.50 - indicating that substantial acquiescence is generated by its items. Is this also true of the Mk IV version in the present study?

With 10 items scored without reversals, the coefficient 'alpha' reliability observed was 0.00. This indicates that the initial form of the Mk IV Scale was a successful selection of acquiescence-free items.

In several ways, then, the characteristics of the Mk IV Scale were a vindication of the thinking that led to its creation. These characteristics were achieved, however, only by a severe sacrifice in reliability and by abandoning the original goal of perfect equality between the number of positive and negative items.


A tempting response to the above findings may well be 'So what?'. Has not Rorer (1965) shown that acquiescent set or style does not matter anyhow? If so, whether or not balancing is achieved is also immaterial. We could conclude as Kirton (1977) does that one might as well go on using the original 'D' Scale.

To do this would however be negligent indeed. Rorer showed that there was no such thing as acquiescent style by pointing out that different measures of acquiescence generally show very little correlation. What he overlooked, however, and what Peabody pointed out in his reply to Rorer (Peabody 1966), was the possibility that acquiescence might be a response. It might be elicited as a non-meaningful response to particular items with particular subjects. In Peabody's terms, it might be a characteristic response to ambiguous items. If the items are not as meaningful to the subject as they are to the scale constructor, the subject may simply say `Yes' to anything.

Thus it must come as no surprise that the Wilson C-Scale which normally shows pos.-neg. correlations of around -0.70 (Wilson 1974), might on some occasions show very different characteristics. Ray (1971, 1972c), for instance, shows that when administered to Army conscripts and students, the C-Scale shows on some occasions both much reduced reliability and a correlation between its two halves may even be in the wrong direction altogether. That this is due to acquiescence is shown by the reliability of the C-Scale scored as a measure of acquiescence (i.e. without reversals). On the sample of 110 conscripts mentioned in Ray (1972c) this was 0.61. The pos.-neg. correlation on the same sample was -0.199 (See Ray 1979). Thus while acquiescence may not be general between scales, it may show up as a consistent and distorting tendency within a particular scale on a particular occasion. The only way this possibility can ever be examined is by having a balanced scale to start with.

In this light, the failure of so many attempts to balance the 'D' Scale may suggest that the 'D' Scale is ambiguous - not only for some samples on some occasions but for almost all samples on all occasions. Even empirical methods may be incapable of selecting reversed items of recognizably opposite meaning. Such items just may not exist in general. The only explanation for this would surely be that the positive items meant very little in the first place. What the original 'D' Scale has been measuring all along then is probably little more than the tendency to acquiesce when faced with an ambiguous task. This may be of some use and interest in itself but it is certainly quite different from what Rokeach thought he was measuring.

In general, then, the indicated conclusion from the results so far would appear to be that the 'D' Scale should be used in future only with extreme caution.

As some qualification to this rather severe conclusion, however, it could perhaps be argued that the validation available for the 'D' Scale as a measure of dogmatism is impressive and that although the characteristic pos.-neg. correlation of just above -0.30 for the Mk II 'BD' Scale is not high, it is nonetheless highly significant statistically. Some ability to elicit meaningful responses has been demonstrated for 'D' Scale items. If this argument seems impressive, it might be noted that any balanced scale, no matter how low the correlation between its halves, does control out the effect of acquiescence. The Mk II 'BD' Scale could at least perform this service for prospective 'D' Scale users. The low pos.-neg. correlation is, in other words, a reflection on the scale's validity rather than on its adequacy at controlling out the effects of acquiescence.


Byrne, D. & Bounds, C. The reversal of F Scale Items. Psychol. Rep. 1964, 14, 216.

Christie, R., Havel, J. & Seidenberg, B. Is the `F' Scale irreversible? J. Abnorm. Psychol. 1956, 56, 141 - 148.

Cronbach, L.J. Coefficient alpha and the internal structure of tests. Psvchometrika 1951, 61, 297-334.

Haiman, F.S. & Duns, D.F. Validations in communicative behaviour of attitude scale measures of dogmatism. J. Soc. Psychol. 1964, 64, 287-297.

Kirton, M.J. Ray's balanced dogmatism scale re-examined. Brit. J. Soc. & Clin. Psychol. 1977, 16, 97-98.

Lee, R.E. & Warr, P.B. The development and standardization of a balanced 'F' Scale. J. Genetic Psychology 1969, 81, 109-129.

Peabody, D. Attitude content and agreement set in scales of authoritarianism, dogmatism, anti-semitism and economic conservatism. J. Abnorm. & Soc. Psychol. 1961, 63, 1 - 11.

Peabody, D. Authoritarianism scales and response bias. Psychol. Bull. 1966, 65, 11 - 23.

Ray, J.J. (1970) The development and validation of a balanced Dogmatism scale. Australian Journal of Psychology, 22, 253-260.

Ray, J.J. (1971) "A new measure of conservatism" -- Its limitations. British Journal of Social & Clinical Psychology, 10, 79-80.

Ray, J.J. (1972) A new balanced F scale -- And its relation to social class. Australian Psychologist 7, 155-166.

Ray, J.J. A new reliability maximization procedure for Likert scales. Australian J. Psychology, 1972b, 7, 40-46.

Ray, J.J. (1972) Are conservatism scales irreversible? British J. Social & Clinical Psychology 11, 346-352.

Ray, J.J. (1974) Balanced Dogmatism scales. Australian Journal of Psychology 26, 9-14.

Ray, J.J. & Martin, J. (1974) How desirable is dogmatism? Australian & New Zealand J. Sociology, 10, 143 - 144.

Ray, J.J. & Pratt, G.J. (1979) Is the influence of acquiescence on "catchphrase" type attitude scale items not so mythical after all? Australian Journal of Psychology 31, 73-78.

Rokeach, M. The open and closed mind. New York: Basic books, 1960.

Rorer, L. The great response style myth. Psychol. Bull. 1965, 63, 129

Stanley, G. & Martin, J. How sincere is the dogmatist? Psychol. Review, 1964, 71, 331-333.

Wilson, G. Evaluation of the conservatism scale: A reply to Ray. New Zealand Psychologist, 1974, 3,27.


Replication is one of the cornerstones of science. A new research result will normally require replication by later researchers before the truth and accuracy of the observation concerned is generally accepted. If a result is to be replicated, however, careful specification of the original research procedure is important.

In questionnaire research it has been my observation that the results are fairly robust as to questionnaire format. It is the content of the question that matters rather than how the question is presented (But see here and here). It is nonetheless obviously desirable for an attempted replication to follow the original procedure as closely as possible so I have given here samples of how I presented my questionnaires in most of the research I did. On all occasions, respondents were asked to circle a number to indicate their response.


No comments: