How many candidates request that golden interview hour of 11am? It’s not early for any party to feel sleepy or hungry; it’s not too late in the day to feel the fatigue of work. How many applicants vie for Tuesday or Wednesday slot? As explained by Glassdoor, the well-known job review site, shows that Tuesday is the optimal day for an interview.

Whether recruiters confess it or not, there are simply certain times and days that are more optimal than others.

Hunger, sleep, work stress, and time of day affect the attention span of an interviewer.

According to research done by Monster Job Boards, interviewers take just 385 seconds to decide if the candidate is right for the role. talked to 273 managers. Participants qualified the importance of things like quality of small talk (60 percent agreed), strength of handshake (55 percent), and ability to hold eye contact (82 percent agree). While these are important, a recruiter only has a limited amount of time to make a correct conclusion about a candidate.

We’ve worked hard to ensure results are processed and parameters are clear.

Retorio's AI measures are based on a validated scientific foundation and are reliable.


What is reliability?

Reliability describes the extent to which the results can be reproduced when the research is repeated under the same conditions. It’s verified by checking the consistency of results across time, across different users or observers, and across parts of the test itself.

AI recruitment

Think testing for reliability akin to baking a cake. When a recipe calls for a certain number of ingredients, a certain temperature, and a certain process, a delicious cake is to be the result every time. Save adjusting for altitude or a quirky oven, the cake can be easily “reproduced”. Quite a few AI video interview software offer personality assessments. However companies not offering high reliability or transparency on what their AI data points are searching for may have missed the mark on ensuring objectivity.

Reliability of your AI recruitment depends on objectivity, which is defined by 3 factors.

1. Conduction

Historically, this has been a person, a teacher or a professional proctor. Then computers became the conductor in driver’s license tests to learning disabilities assessments. It’s key the proctor is independent and follows a standardized approach and process. Remember as a kid taking national exams and your teacher reading out a set of instructions? It’s about standardizing as many variables as much to produce an unbiased outcome. With Retorio's AI, candidates easily log-in to an account with a direct layout with standard questions created by hiring managers. It’s a marriage of standardization and personalization in this fast-moving world of work.

2. Evaluation

This second part of creating objectivity is ensuring the evaluation criteria are the same for every candidate. People should be judged by the same metrics. The assessment is standardized, investigating the same variables for recruitment.

3. Interpretation

This third pillar of objectivity relies on the fact results will be independently interpreted. It’s arguably the most important. Humans often possess clear and/or inadvertent influences, which may affect results. Lack of sleep, hunger, attention, or an ulterior motive may influence how a result is interpreted. Creating a mechanism to ensure objective analysis and interpretation of an assessment is key. Computers grade a range of assessments, from driving tests, student essays, to legal papers. Depending on the program, it can be a more objective methodology than the standard human reviewer.


prerequisites for reliability: objectivity and trait stability


Objectivity precedes reliability. According to physics, the mere act of observation can completely change the outcome of an event. One of the most famous experiments in quantum physics is the double slit experiment. It demonstrates, with eerie strangeness, that the very act of observing a particle has a dramatic effect on its behaviour. The act of looking at electrons makes them act like particles, rather than waves (when they’re not observed). Unless an assessment or its process is objective, it will fall under the influence of bias. That’s why having an independent proctor -like in AI recruitment- is vital to conducting as personal as a personality assessment.

Mitigating Factor 1 is the computer software program standardizes the process, not only the interpretation. To avoid such bias, personality assessments should be conducted, evaluated, and interpreted by an unbiased tool, like a computer program based on artificial intelligence.

In Retorio’s own objective self-composition, it provides a strong base for reliability.


Trait stability

Another prerequisite affecting the measurement is actually measuring the personality trait and not the current mood (validity). “How are you feeling?” is not an appropriate question for assessing personality as it’s not likely to receive a consistent response. The variables being measured must be stable in themselves, otherwise results will be useless. Personality versus mood is one example. Overall, personality is rather stable. It may change over a lifetime, but basic logic, feeling, and decision-making patterns generally stay the same. What changes often and quickly is mood.

Mitigating Factor 2 is standardizing the mood of a person. An individual may feel excited in the morning and tired and needing to withdraw in a few hours. Mood is also dependent on rapidly-changing variables like, hours of sleep, hunger, or simply a bad day. Retorio’s AI for recruitment measures the stable dimensions of personality, which don’t tend to change often or rapidly.

At Retorio, we’ve transferred the standard Big 5 data points psychologists use to a digital format. Our AI-enabled personality assessment includes these thousands of data points. This is roughly similar to how traditional book keeping was digitised into Excel or how taxes can now be filed through an online software without a tax consultant.


Measuring the reliability variable

The Reliability Coefficient is the number quantifying the degree of consistency. With a number, we can examine whether an outcome is consistent. A low figure may be due to a poor testing environment, a small number of participants included, or test design errors. For AI video interviews in recruitment, calculating the reliability coefficient helps us gauge potential errors in testing.


Types of Reliability

There are different ways of assessing reliability:

Retest Reliability

A meta-analysis was conducted to measure the Big Five’s retest reliability. The study reveals the 682 test–retest correlations collected within an interval of up to two months from 74 samples (total N = 14,923) across different measures of the Big Five. The median aggregated dependability estimate for the five traits was ptt = .816 (out of a hundred percent correlation between tests of ptt=1). Extraversion scales resulted in the most dependable scores, whereas agreeableness scales exhibited slightly larger measurement error. Meta-regression analyses indicated small moderation effects of the chosen retest interval for three traits, with shorter intervals resulting in higher retest correlations.

Internal vs. External Reliability

Internal reliability describes the consistency or how well the different steps are aligned to guide you to the desired cake. External reliability measures whether a test can be generalized beyond what it’s being used for. For example, if a student is looking to improve their grade, individual tutoring is a method that’s applicable to both mathematics and geography, even when it’s conducted by a different tutor or done in a different setting. A test diagnosing anxiety should be able to detect symptoms of anxiety in varying age groups, socio-economic status, and personality types. The higher the score on both internal and external reliability the more efficient the assessment.

Test-Retest Reliability

Test-retest reliability is typically estimated using the ICC (intraclass correlation coefficient). In statistics, this descriptive statistic is used when quantitative measurements are based on units that are organized into groups, like the 5 dimensions of personality. The ICC quantifies how strongly units in the same group resemble each other. This is particularly appropriate for test-retest reliability.

Consider this particular example: A group of kindergartners are given a vocabulary test on August 26th and then are retested on September 5th. Given the students’ abilities gain little significant changes, both test outcomes should yield similar results. How we find the test-retest reliability coefficient is to find the correlation between the test and the retest.

The test-retest reliability is one of the most important indicators in an AI video interview. This is where other AI recruitment tools have failed. Results were not consistent and a strong correlation could not be found between testing dates.

We maintain high test-retest reliability; we’ve analyzed thousands of hours of video material, teaching our AI what to look for in each dimension of the Big 5. We’ve tested and retested results from over 2000 participants, incurring 92% consistency in results.

Any new evaluation results are double-checked externally and adjusted. With Retorio’s AI-based system, employee assessment results for recruitment are saved, making it easily accessible for both employee and human resources manager. This provides companies a special opportunity for their recruitment strategy.



Companies, like BMW, Personio, and Lufthansa leverage Retorio's deep tech to support their own recruitment teams.  Our video-based AI was featured in TechCrunch and Süddeutsche Zeitung . 

Try for free


Popular Posts You May Like:


Elizabeth T.

Written by Elizabeth T.

Stay updated about the latest talent acquisition trends

We regularly inform you about the latest insights on AI-based recruitment, video recruiting, and other talent acquisition trends. Enhance your knowledge and become an expert.