< Lessons


The use of tests in recruitment, career counseling and development of personnel has grown strongly and is expected to accelerate along the current disruption of work. It is important for test users and those planning their use to be able to make informed decisions for which common sense is to a large extent sufficient. This article is a short extract from the book Psychological Assessment Methods at Work (Niitamo, 2003).

Types of tests used at work

The tests commonly used in work settings can be divided into three broad classes: tests measuring cognitive or intellectual abilities, personality tests and, tests of behavioral styles. Cognitive ability tests measure the person's maximum performance, e.g., ability to visually perceive and process spatial relations, a requirement of airplane pilots and air traffic controllers. Depending on the classification, altogether 8-20 distinct cognitive abilities are said to exist.

In contrast to maximum performance, personality tests concern the person's typical performance, e.g., inclination to behave in an extraverted manner. Based on their different content areas, personality factors may be divided into traits, motives, ways of thinking and attitudes. At work they are most often measured through standardized self-report questionnaires. Among the numerous trait questionnaires one of the most well-known is the NEO-PI (Costa & McRae, 1985) measuring the Big Five traits. Among motive or need questionnaires the PRF (Jackson, 1967) stands out as a widely popular choice. Among questionnaires measuring ways of thinking or cognitive styles is the MBTI (Myers & Briggs, 1984), very popular in worldwide use. Single attitudes such as optimism have often been appended as extensions to the three main classes.

Tests of behavior styles in turn differ from the general behavioral trends predicting personality tests in that they focus on behavior occurring in particular situations such as leadership, team work, conflict resolution etc. They measure behavior occurring in circumscribed situations which makes them more amenable to development than cognitive abilities or personality. Well-known exemplars include the MLQ (Bass & Avolio, 1990) measuring leadership styles, the BTRI (Belbin, 1993), a test of team roles, as well as the LSI (Kolb, 1984), a test of learning styles.

Predicting job performance

Meta-analyses incepted in the 80:ies have led to a scientifically robust picture of tests' ability to predict job performance. For the first, it is important to know that tests' ability to predict performance at work is relatively modest, but at the same time significant, depending on what to compare with. The relatively modest level is apparent in the remarkably smaller effect sizes than what is commonly attained in physical science measurement. For example, the proportion of variance in job performance explained by scientifically valid tests is between 8-20%.

The modest amount of variance explained means that 80-92% of job performance remains unpredicted. In other words, for the greater part, performance or success at work (or in life) remains (fortunately) unpredictable. On the other hand, job interview excluded, no other method or preceding information reaches even these figures according to the meta-analyses. Moreover, if a 30-60 minute testing session can lead to even minor predictions of such an important thing as performance or success at work, tests definitely rise to a significant position.

According to the meta-analyses, ability tests and structured interview occupy an almost shoulder-to-shoulder position as the two strongest predictors of job performance. Personality tests follow a step or two behind. Different combinations of prediction methods can to some extent increase the variance predicted toward the 20 % level.

There has been slight bafflement over the observation that so-called assessment centers combining all method categories tend to fall short from the 20 % prediction level. The explanation to this is that the meta-analyses performed on assessment centers have drawn either upon samples of already employed people (managers) or they represent candidates in the last phase of recruitment. In both cases the variance of qualities has severely been shrunken due to preceding screening stages. In other words, in such situations people are already "enough" intelligent, educated, results-oriented, conscious etc. which makes capturing of individual differencies much more difficult than in meta-analyses drawing upon broader population samples.

Technical criteria

Behavioral processes have loose boundaries and are complex in character. It is definitively not enough to just name a particular set of questions as a measure of a particular cognitive ability, personality factor or behavioral style. The measurability and predictive ability of such a set of questions has to be verified and documented in scientifically agreed-upon manner. The written documentation must be accessible generally or through reasonable effort by those interested. The essential technical, psychometric criteria include reliability, validity and reference norms.


Reliability concerns the test's ability to measure some quality. Construction of a behavioral process test is always a tedious process, an early milestone in which is establishing measurability for the test. Existence of reliability is mandatory because without it the test cannot ever predict anything. Reliability is indicated as internal consistency between the set of questions, their ability to form a sufficiently intertwined set questions intended to measure some quality. Another way to assess reliability focuses on measurement stability in time, by testing whether measurements taken in different points of time yield sufficiently similar results.


Validity concerns the test's ability to measure the intended quality and predict behavior external to the test itself. The former may be verified in many ways, for example through showing that the test relates to neighboring qualities in ways prescribed from the theory. For example, one would expect a test of idea-oriented thinking to correlate with a general creative thinking. Another typical procedure concerns theory suggested group differences, e.g., a test of leadership motive should be expected to differentiate leaders from experts. The most important aspect of validity is the test's ability to predict behavior, that is, a leadership motive test should predict independently appraised leadership behavior.


The numerical scores given to test responses are not very interpretable in themselves. They only indicate rank order between individual test takers in a given testing project but they don't tell whether a given score represents small or large quantity of that quality in any larger reference group. The explanation to this is that, in contrast to measurement of temperature, measurement of behavior processes lacks an absolute zero-point which to anchor on. Therefore, measurement is carried out by comparing received scores to scores in some larger reference group. The so-called raw scores (rs) received from responses to test questions are standardized, in other words, related to some larger reference group such as working age adults in a particular country.

The standard scores are usually expressed in the test's outcome profile indicating the test taker's position in relation to some reference group: how large a proportion of people in the reference group receive an equal or higher (or lower) score. Tests should generally be normed to those populations and countries where the test is used.

Content criteria

Technical features of tests can be examined in test publishers disseminations and in independent peer-reviewed reports (e.g., Mental Measurements Yearbook 1938-2017). Today there is truly a large number of tests offered that fulfill the above mandatory technical criteria. Moreover, no dramatic differences are to be expected in the ability to predict job performance between competing test brands, neither are there such things as wonder tests. The choice among tests is increasingly based on content criteria, the test's background theory, its areas of use and the user experience of both test users and the service receiving clients.

Background theory

Many new tests are published today with sky-high promises and lofty slogans carrying the risk falling into obsolescence after the short hype periods. Some of the new tests claim to be measuring such timely qualities that don't necessarily have any counterpart in any previous research. Other new tests appear as newly named qualities of already established ones, thus demonstrating reinventions the wheel. It is always useful to examine the kind of theory, concept or big picture appearing behind the test. While not shutting eyes from genuinely new and interesting test concepts, it is usually safer to set one's preferences on tests that are based on some established scientific theory.

Areas of use

The tests' areas of use may be divided broadly into recruitment and development. Recruitment uses cognitive ability as well as personality tests but the use of behavioral style tests is less frequent. In addition to predictive purposes the use of personality tests is argued to be useful in illuminating the person of the candidate. For example, in addition to spatial information processing, it makes sense to predict the candidate pilot's collaboration style in and outside the cockpit.

In addition to the traditional performance centered job success criteria (salary, career progress etc.) what is increasingly proposed as a valuable asset concerns different "citizenship" behaviors (OCB) carried out in organizations. Such "civic virtues" as collaboration, helping and supporting others etc. are more strongly predicted with personality tests than the traditional performance centered success criteria.

Development activities rarely involve intellectual skills because they are not seen as something that can significantly be developed through rehearsal. But in contrast, development programs draw heavily upon different behavioral style as well as personality tests. The former set their development target on behaviors in particular situations while the latter focus on general behavior tendencies driven by personality or those personality factors that are directly amenable to change, such as ways of thinking and attitudes.

User and client UX

Much more important than technical specifications such as mobile usability is how understandable the test contents are to both to test users and test service clients. The test user runs repeatedly into situations where he/she has to deliver information to test takers and service receiving managers. All this calls for understandable language and terminology in test content. Obviously the most important question for the user concerns the test's ability to differentiate work processes and guide the test user in development efforts. Here again, terminological simplicity and commonsense language rise to an important position.

Clients of the testing service include recruitment candidates taking the test, managers who receive those candidates as well as test takers participating in the organization's development programs. The test takers in recruitment situations must feel themselves treated in an appropriate and fair manner. More so, legistlation in most countries requires that test content be justified as relevant for success in the target job.

Managers who receive the testing service should without difficulties understand the test content while making their hiring decisions. In the organization's development programs in turn it is important that the test taker experiences the test as interesting, credible and if possible, inspiring. Both in recruitment and development it is useful to be able to provide the test takers with interpretation documents for understanding the content of the outcome profiles. Some tests offer machine-made descriptive reports on individuals creating an illusion of accuracy which can by no way be justified given the per se modest predictive power of tests. The other risk is that such illusory accuracy abolishes the need to interview people which in turn can lead to serious misjudgments.


The predictive ability of tests, although relatively modest in level, has been scientifically established during the 100 years of research in the field (Schmidt, 2016). Test users are today offered a large number of tests that fulfill the required technical criteria and it is no longer enough to just provide proof for this predictive power. The focus has shifted to the content features of tests. Content issues concern so-called ecological validity referring to the test's usability in the contexts and situations where the test is applied.

A new systemic criterion to appraisal of tests is emerging along the disruption of work. Because the effect sizes and portions of variance explained of tests measuring behavior processes are relatively modest, tests don't as such amount to be "king makers" for the HR. The value in tests becomes realized to their full extent only after they function as integrated parts in the whole organization's competency concept, in acquisition and development of competence.

Development processes are still implemented in separate silos across different parts of the organization. HR professionals are still overly concerned about being able to offer "something new" instead of focusing on the process contents and their systematic implementation. Only after integrated processes extending beyond quarterly periods can HR assume her important leader role in the disrupting world of work.

Bass, B. M., & Avolio, B. J. (1990). Transformational leadership development: Manual for the multifactor leadership questionnaire. Palo Alto, CA: Consulting Psychologists Press.
Belbin, R.M. (1993). Team roles at work. London, UK: Butterworth-Heinemann.
Briggs-Myers, I., & Briggs, K.C. (1985). Myers-Briggs Type Indicator (MBTI). Palo Alto, CA: Consulting Psychologists Press.
Costa, P. T. & McCrae, R. R. (1985). The NEO personality inventory manual. Odessa, FL: Psychological Assessment Resources.
Jackson, D.N. (1967). Personality Research Form Manual. Port Huron: Sigma Assessment Systems.
Kolb, D.A. (1984). Experiential learning: Experience as a source of learning and development. Englewood Cliffs, NJ: Prentice-Hall.
Mental Measurements Yearbook 2017. Buros
Niitamo, P. (2003). Henkilöarviomenetelmät työelämässä. (Psychological Assessment Methods at Work) in Finnish. Helsinki: Finnish Institute of Occupational Health.
Schmidt, F.L. (2016). The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 100 Years of Research Findings. https://www.researchgate.net/publication/309203898


Helsinki (HQ)

Competence Dimensions Ltd


GMT +3:00 - ± 1:00