How accurate are Personality Tests?

Gary Blissett
May 14, 2024
10 min read

The following article was written by Sandra Nunes and is published below with her permission. if you would like to know more about how Everything DiSC and how it can support the development of the people within your organisation, please do contact us.

"Having studied People Management and Psychology at Work, I have been working in Human Resources and Hotels for the past 23 years. I decided that was time to give myself a challenge in March 2024 and enrolled in an Organisations and Business Psychology Masters with Wolverhampton University. My first Module was about Personality and Individual Differences with an essay assignment, How accurate are Personality Tests. Having used the Everything DiSC profile for so many years I did not doubt that this will be the tool I would like to showcase in my Essay."

Introduction

Personality assessment is the use of instrumentation to capture prominent motivational qualities of the individual (Piedmont, R. 2024, p.p5146). There are a broad number of personality tests that assess a diverse array of constructs or abstract qualities that we can’t touch or see. They measure an individual’s characteristics or traits that will remain relatively stable over time such as personality, abilities, coping social factors, health and well-being (Maltby et al., 2022, p.p646 -647). Personality tests play a major role in fields like neuropsychology, medical psychology, industrial organisations psychology, and forensic psychology (Wheeler, E. & Archer, R., 2023, p.p756-760).

Ensuring personality tests’ accuracy is crucial for making informed decisions based on test scores. Accuracy of test scores is predominantly measured via its validity. Does the personality test accurately assess the intended psychological construct (Newton, Paul & Shaw, 2013, pp. 301-309). This report, will focus on construct validity. Test scores accuracy is also measured via its reliability. Are the test results stable over time and across different conditions (DeVon et al., 2007, pp. 155– 164). This report, will focus on the test-retest reliability method. Social desirability responding (SDR), refers to the individual's tendency to present themselves in a positive light when responding to self- report measures (Paulhus, D. L., 2017, p.p1-5). This report, will analyse how SDR impacts the results of personality tests.

Test industry data show that the personality-testing industry is an expanding market, with about 2,500 personality tests administered a few million times every year. Furthermore, the reaction of individuals to personality tests in the workplace depends largely on the individual and their specific organisational context- some are critically reflective and others are happily embracing it (Lundgren et al., 2019, pp. 176-177).

This report, will analyse DiSC, a self-report tool, utilised to assess individual personality types in the organisational life context. Practical examples will be shared as a consequence of utilising DiSC for several years in the workplace. We will analyse DiSC personality test’s reliability, validity and social desirability response.

DiSC Personality Test

DiSC assessment is one of the most popular assessments. It is used by over one million individuals in businesses and organisations to support individuals to improve leadership skills, communication and productivity in the workplace (Tunnel, J., 2022).

Wiley (2013) mentions four primary personality profiles: Dominance (D) these individuals tend to be confident, assertive, and focused on achieving the bottom line. Influence (I): these people are more open, emphasise relationships, and are skilled at influencing others; Steadiness (S), these individuals are dependable, cooperative, and sincere in their interactions; and Conscientiousness; (C) they prioritise quality; accuracy, expertise, and competency. For most individuals, either one or two predominant personality styles stand out. This gives us 12 common combinations or styles. The first letter indicates the most prominent style, followed by the second style of influence.

The father of DISC was the American psychologist William Moulton Marston (1920). Walter Clarke created the first DISC assessment in 1950 (Owen et al., 2020).

Participants are requested to respond to 24 to 30 statements on a five-point ordered response scale, indicating how much they agree with the statement. The responses are used to form scores on eight scales. The eight scales are: D, Di, i, iS, S, SC C, CD, (Wiley, J. 2013). Furthermore, the participant is presented with additional items for that scale, if the variance on a particular scale is above a predetermined cut-off. This allows the assessment to gain certainty concerning the respondent’s true score.

DiSC test provides valuable insights for leadership development, fostering self-awareness, and effective team interactions. DiSC, can be very helpful, as it supports leaders in discovering how their strengths could become potential weaknesses in challenging situations (Tunnel, J., 2022). Consequently, stronger and better leadership will bring benefits to the organisation, including effective working relationships, stronger workplace culture, communication and teamwork.

DiSC - Reliability, Validity and Social Desirability Responding

Reliability

When considering the personal perspective, having completed the DiSC profile four times over three years both professionally and recently for the purpose of this report, the outcome showed that my predominant DiSC style remained fairly consistent despite answering the same questions slightly differently due to my state of mind, and tiredness on the day. Thus, we could establish that the test allows for a margin of error.

Test-retest reliability is estimated by administering the same test to the same group of respondents at different times. The correlation between the two scores, and often between individual questions, indicates the stability of the instrument (DeVon et al., 2007, pp. 155–164).

How reliable is DISC?

Having reviewed the research report from Wiley (2013), the larger provider of DiSC, the eight scales of the DISC assessment have been measured for their test-retest reliability over two weeks, using a sample of 752 individuals. The results suggested that DiSC assessments are quite stable over time. Consequently, test takers and test administrators should expect no more than a small change when the assessment is taken at different times. Even when the test does not produce the exact same results, the fundamental interpretation of the results will generally be the same. (Price, L., 2015)

In summary, considering both the personal experience completing the DiSC test and the research outcome, DiSC demonstrates a high level of stability. However, “the longer the time period between two testings, the lower we would expect test-retest reliability to be” (Wiley, J., 2013).

Validity

When undertaking the DiSC test assessment personally, the results across tests have very consistent delivering I (Influence) as my predominant style. The I style of DiSC describes an individual as (i.e. outgoing, enthusiastic, and optimistic). On the opposite of the scale, we find the C (Consciousness) style (i.e. analytical, reserved, private). The style I scored higher is also according to my self-analysis the one with which I do identify myself. The style I scored the lowest, the C style on the opposite side of the scale has the traits that are least related to my personality. Based on my experience we can establish that the DiSC test has some level of validity.

“Validity is the extent to which a test measures what it claims to measure” (Newton, Paul & Shaw, 2013, pp. 301-309). Construct validity is fundamental to evaluating a psychometric test’s overall validity and it relates to how well a test measures the concept it is designed to evaluate (Bhadari, P., 2022). Assessing construct validity is important when researching a construct that can’t be measured or observed directly, such as intelligence, self-confidence, self-esteem or persistence for example (Bhadari, P., 2022).

Interscale correlations provide valuable insights into the relationships between scales and contribute to the overall validity of the measurement instruments. In the case of a validity study, a researcher aims to examine how similar a scale is to another scale to establish the correlation between the two scales (Lyons-Thomas, J., 2014, p.p 3352-3353).

How well is DiSC measuring what it says it’s measuring?

The Di scale of DiSC instrument for instance measures a particular construct (i.e., the tendency to be bold, adventurous, and fast-paced). Individuals scoring high on this construct Di should score relatively low on a scale measuring cautiousness, forming part of the SC scale. This means someone who is very bold will not be particularly cautious in nature (Wiley, J. 2013).

The researcher studied 752 students who completed the DiSC assessment. The correlations between all eight scales show moderate positive correlations among adjacent scales and strong negative correlations were witnessed between opposite scales (Wiley, J.,2013).

In summary, considering both the personal results of DiSC test and the research outcome, DiSC assessment demonstrates a good level of validity. We can support that the instrument does measure what it claims to be measuring. Furthermore, we can identify a clear relationship between the construct at a theoretical level and the measure that has been developed (Maltby et al., p.p 666).

Social Desirability Responding (SDR)

When responding to the personality test questions, there was a temptation on several occasions to answer in a way that could be aligned with socially desirable behaviours. For questions like “I put people under pressure “, “Patience is one of my major strengths”, and “I tend to be very receptive to other people's ideas”, there was a level of internal conflict between the honest answer and the answer that will be perceived to be more popular as a leadership profile.

Social desirability responding is the tendency for people to present themselves in a generally favourable fashion (Holden,2001, p.441). SDR has recently been conceived of as both, a response distortion and a tendency of the test respondent to select as self-descriptive the response options for items that are more desirable than warranted by his or her corresponding traits or behaviours (Paunonen & LeBel,2012, pp.158-159).

Self-report measures or personality tests in general seem to have limitations regarding response styles, and arguably the most threatening is the socially desirable response. Consequently, it is important to distinguish the valid content of personality tests from the misrepresentation caused by SDR. Social desirability bias (SDB) is considered to be one of the most common and pervasive sources of bias affecting the validity of experimental and survey research findings in psychology and social sciences (Peltier & Walsh 1990).

A variety of approaches have been presented for assessing desirable responding in self-report measures (Jin et all. 2023, pp. 221-236). Different SDR measures such as MCSDS (Marlowe-Crowne Social Desirability Scale) in 1960, BIDR 6 (Balance Inventory of Desirable Responding) in 1991 and many more have been developed. The key limitation of the SD scales is that it is impossible to differentiate between candid respondents who have (virtuous) traits they claim to have and dishonest respondents who actively lie to present themselves in an overly positive fashion (Tourangeau & Yan, 200, pp. 859-83), and none of the SDR measures approaches have so far demonstrated to be effective.

In summary, DiSC test like all the other self-report measures, may lead to biased results due to social desirability. Individuals may respond to questions in a manner they perceive to be socially acceptable. Nonetheless, when using the tool within my professional environment for several years we have found the prediction of the behavioural style to be relatively accurate.

Conclusion

The fact that psychological instruments are used to “measure abstract qualities that we can’t touch or see” (Maltby et al., 2022, p.p 646 -647) presents some challenges in terms of accuracy. To demonstrate the accuracy of a personality instrument we must determine, how reliable and valid the tool is. Reliability and validity are seen as “matters of degree on continuous scales rather than reliable- unreliable and valid - not valid” (Wiley, J., 2013). For a personality test to be accurate, it must exhibit fairly good levels of validity and reliability.

In the DiSC research report, Wiley (2013) demonstrates that DiSC validation evaluation exhibits strong evidence to support the reliability and validity of the tool.

When considering this alongside my personal experience in the workplace, it has been noticed that the tool could show small changes in the test results when taken at different times, however, the fundamental interpretation of the results would usually be the same, evidencing a level of stability and thus a good level of reliability. The result of our personal experience also agrees with the researcher in terms of the tool's validity. It demonstrates a strong correlation between the eight scales, exhibiting moderate positive correlations among adjacent scales and strong negative correlations between opposite scales, providing strong indications of the test’s construct validity.

In summary, it is possible to analyse and establish the level of accuracy of a personality test. However, equally important is to evaluate if the instrument is fit for purpose. For example, using DiSC within the workplace, we could argue that DiSC may not be suitable for all workplaces or job roles that require flexibility and adaptability rather than strict adherence to specific behavioural styles. While this may be true, DiSC has also demonstrated it can be a very useful vehicle for increasing self- awareness for leaders, enabling them to make conscious choices about their behavioural and therefore leadership style in a determined moment, which can identify development needs. The reflection and insight gained by the individuals through the use of personality tests have proved to be positively associated with job contentment, enthusiasm and improved communication and relationship building.

References

Backstrom, M., & Bjorklund, F. (2013). Social desirability in personality inventories: Symptoms, diagnosis and prescribed cure. 54(2), pp.152–159. doi.org/10.1111/sjop.12015.

Cherry, K. (2023). Validity in Psychological Tests - why measures live validity and reliability are important. https://www.verywellmind.com/what-is-validity (Retrieved 15 April 2024).

DiSC, E. (n.d.). Profile Assessments. https://www.profileassessments.com/what-are-the-benefits-of-disc/ (Retrieved 15 April 2024).

Discprofile . (n.d.). https://www.discprofile.com/what-is-disc (Retrieved 16 April 2024).

Leary, M., Hoyle, R., Rick, H. (2009). Handbook of Individual Differences in Social Behaviour, pp. 441. The Guildford Press.

Jin, K.-Y., Paulhus, D. L., and Shih, C.-L. (2023). A New Approach to Desirable Responding: Multidimensional Item Response Model of Overclaiming Data. 47(3), pp. 221–236 doi.org/10.1177/01466216231151704.

John, R., Michal, K., and David, S. (2021). Modern Psychometrics: The Science of Psychological Assessment, pp. 38–39. Routledge. doi.org/https://web.p.ebscohost.com/ehost/detail/.

Kline, P. (2015). A Handbook of Test Construction (Psychology Revivals). doi.org/https://doi.org/10.4324/9781315695990.

Lundgren, H., Poell, R. F., and Kroon, B. (2019). “This is not a test”: How do human resource development professionals use personality tests as tools of their professional practice? 30(2), pp.175–196. doi.org/10.1002/hrdq.21338.

Lyons-Thomas, J. (2014). Interscale Correlations. In: Michalos, A.C. (eds) Encyclopaedia of Quality of Life and Well-Being Research. Springer, Dordrecht.

doi.org/rg/10.1007/978-94-007-0753-5_1519.

Maltby, J., Day, L., and Macaskill, A. (2022). Personality, individual differences and intelligence. 5th edn. Harlow: Pearson.

Newton, P. E., and Shaw, S. D. (2013). Standards for talking and thinking about validity. Psychology Methods. pp.301–319. doi.org/rg/10.1037/a0032969.

Osterlind, S. J. (2010). Modern measurement : theory, principles, and applications of mental appraisal. 2nd edn. Pearson.

Paulhus D. L. (2017). Socially Desirable Responding on Self-Reports. doi.org/10.1007/978-3-319-28099-8_1349-1.

Paunonen, S. V., and Lebel, E. P. (2012). Socially desirable responding and its elusive effects on the validity of personality assessments. 103(1), pp. 158–175. doi.org/10.1037/a0028165.

Peltier, B., and Walsh, J. (1990). An investigation of response bias in the Chapman Scale. Educational and Psychological Measurement. 18, pp. 133–145.

Piedmont, R. L. (2024). Personality Assessment. Encyclopaedia of Quality of Life and Well-Being Research. Springer, Cham. doi.org/rg/10.1007/978-3-031-17299-1_2153.

Price, L. A. (2015). “DISC instrument validation study”. Peoplekeys – Unlocking Human Potentials. https://peoplekeys.com (Retrieved 17 April 2024).

Pritha, B. (2022). Construct Validity | Definition, Types, & Examples. https://www.scribbr.com/methodology/construct-validity/ (Retrieved 17 April 2024).

Owen, J., Mahatmya, D. and Carter, R. (2020). Dominance, Influence, Steadiness, and Conscientiousness (DISC) Assessment Tool, doi.org/https://doi.org/10.1007/978-3-319-24612-3_25.

Tourangeau, R., & Yan, T. (2007). Sensitive Questions in Surveys. 133(5), pp. 859–883. doi.org/10.1037/0033-2909.133.5.859.

Tunnel, J. (2022). Here’s How Your DiSC Personality Type Predicts Your Leadership Style. https://www.truity.com/blog/heres-how-your-disc-personality-type-predicts-your-leadership-style (Retrieved 17 April 2024).

Wheeler, E. M. A., and Archer, R. P. (2023). Personality assessment (H. S. Friedman & C. H. Markey (Eds.); pp. 756–760). Academic Press.

doi.org/10.1016/B978-0-323-91497-0.00012-6.

Willey, J. (2013). About Everything DISC: Research Report for Adaptive Testing Assessment. John Wiley & Sons.

Willie, J. (n.d.). DiSC. www.everythingdisc.com/what-is-disc/ (Retrieved 23 April 2024).

Further information about the statistical reliability and validity can also be found in the following document - About Everything DiSC: Theory & Research