What Personality Tests Can Actually Tell You And What to Do With the Map
The useful findings are rarely the flattering ones.
I took six personality tests in one sitting, all for free. Then I had the results cross-analysed systematically (using the scientific hierarchy of each instrument) to identify what they agreed on, where they diverged, and what both tell you.
I was expecting confirmation of things I already knew. What I found was more useful: confirmation of some things, correction of others, and a specific growth edge I had been quietly avoiding looking at directly.
This post is not primarily about my results. It is about how to use yours.
This post is a companion to the Inner World area of the Mind Pillar. It is not a blueprint or a standalone zoom post: it is a behind-the-scenes look at one specific self-knowledge tool: personality tests. The Inner World area is about developing accurate self-knowledge, and personality instruments are one of the few tools that have genuine empirical backing for doing that provided you know which ones to trust and how to use the results. The Knowing Yourself post is the right starting point; this one extends the practical toolkit.
What to Trust And What to Use With Caveats
Before the results: not all personality instruments are equally reliable, and the wellness industry consistently overstates the validity of the popular ones. Here is the honest sorting.
Scientifically rigorous, take these seriously:
The IPIP-NEO 1201 is a 120-item derivative of the gold-standard NEO-PI-R, validated across samples exceeding 300,000 participants.2 It measures the Big Five dimensions (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) plus 30 narrower facets within those five. The facets are where the useful information lives.
The HEXACO-PI-R3 (100 items) adds a sixth dimension the Big Five misses: Honesty-Humility. Cross-cultural lexical studies found this factor emerged independently across multiple languages, suggesting it reflects a genuine and distinct dimension of character.4 It predicts ethical behaviour, susceptibility to manipulation, and certain “dark triad” tendencies better than any Big Five trait alone. If you take only one test in this list, take the HEXACO.
The short IPIP Big Five5 (50 items) provides a useful baseline but, as I will show, it conflates things the longer test separates. The headline scores can mislead.
Reasonably validated, with explicit caveats:
The VIA Character Strengths6 (240 items) identifies 24 character strengths under six virtue categories drawn from positive psychology. Acceptable internal consistency, used in studies across 75+ countries.7 Some claims in the wellness marketing of this instrument outrun the evidence. Use it as a useful lens alongside the empirical instruments, not as a replacement for them.
Engaging, low scientific validation, use as reflection prompts only:
The 16Personalities8 MBTI-adjacent test has well-documented psychometric problems: the dichotomies do not reflect natural categories, test-retest reliability is poor (roughly half of people get a different type four weeks later), and predictive validity for real-world outcomes is weak.9 It can prompt useful reflection. It is not a measurement instrument.
The Enneagram10 has spiritual and philosophical origins rather than empirical ones. The nine-type structure has not been independently replicated by factor analysis.11 Useful as a narrative frame. Not a finding.
Which personality test is most scientifically reliable?
The HEXACO-PI-R and the IPIP-NEO 120 are the most empirically robust free options. Both are peer-reviewed, cross-culturally validated, and available free online. The HEXACO adds a sixth dimension (Honesty-Humility) that the standard Big Five does not measure. The IPIP-NEO 120 gives you 30 facets within the five dimensions, which is where the useful nuance lives.
New here? Start here:
Swiss Army Mum is a practical guide to long-term health for busy women, built on four pillars: Body, Mind, Glow, and Flow.
Not every tool. Just the right ones.
The Findings That Show Up Across Every Framework
When you run multiple instruments and look at what they independently agree on, the convergences are the most reliable signals. These are the things worth taking seriously, not because one test said them, but because instruments using completely different item sets and theoretical models all arrived at the same place.
The short test and the long test often tell different stories
The most instructive thing I found in my own results: the 50-item Big Five suggested high Openness. The 120-item IPIP-NEO placed overall Openness at the 14th percentile. When you look at the facets, the contradiction resolves immediately.
The short test had conflated “openness to intellectual ideas” with “openness to experience broadly.” The long test separated them. The accurate picture: strong, domain-specific intellectual curiosity, and very low openness to aesthetic novelty, imagination, or stimulation for its own sake.
Intellectual is not the same dimension as imaginative. This matters because many people who identify as “curious” or “creative” may be like this: intensely engaged with their specific domain and genuinely uninterested in novelty outside it. A short test will call this high Openness. A longer test will show you which kind.
This principle applies to Agreeableness too. The Big Five bundles warmth, altruism, sympathy, and non-aggression together. HEXACO separates them. If you score low on Big Five Agreeableness, you might assume you are difficult or unkind. The HEXACO often reveals the opposite: low warmth and altruism alongside high patience and non-aggression. The fuller picture is almost always more nuanced than the headline score.
The facets matter more than the domain
This is the single most useful methodological lesson from running multiple instruments. Anyone who has taken a Big Five and felt the result did not quite fit should take the 120-item version. The domain score is an average of things that move independently. Conscientiousness in my results sits at the 92nd percentile as a domain, but Orderliness within it is at the 21st percentile. Without the facet data, you would assume this is the personality of someone with colour-coded folders and a spotless inbox.
The facet data shows something different: very high follow-through and self-discipline alongside low preference for external organisation.
Discipline of character, not discipline of environment. Entirely different implications for how you work.
Instruments with no shared items can independently confirm the same finding and when they do, that finding is real
In my analysis, three instruments with completely different theoretical frameworks all identified honesty and ethical directness as a dominant feature. VIA ranked Honesty as the first of 24 character strengths. HEXACO flagged high Sincerity and Greed Avoidance. The 16Personalities and Enneagram descriptions both independently emphasised authenticity. When instruments that share nothing (not items, not factor structures, not theoretical models) converge on the same conclusion, that convergence is meaningful.
Look for convergences in your own results. What shows up across multiple instruments, in different forms? That is the finding most worth examining.
A Five-Part Framework for Using Your Results
The goal of a personality profile is not to know what you are. It is to act differently based on what you know. Here is the framework I used with my own results, structured so you can apply it to yours.
Step 1: What you can stop pretending
Look at your lowest scores across all tests. These are the dimensions where you are spending energy performing a version of yourself that does not match your actual wiring.
Ask: where am I expending effort trying to be someone I am not? What qualities am I apologising for not having?
The test results often give you permission to stop the performance.
My example: In my results, this was novelty and spontaneity. Excitement Seeking at the 1st percentile: 99 out of 100 people seek more stimulation than I do. I had been quietly treating this as a deficiency. The data reframed it: my drive is real and high, it just runs on mastery and depth, not variety. That is not a problem to fix. It is a fact to work with.
Step 2: What you can lean into without guilt
Look at your highest scores and your top VIA strengths. Where does your natural energy go without effort? What do you do so instinctively that you have stopped counting it as a strength? Often the most valuable capacities are the ones we take entirely for granted. They feel too obvious to mention, which means they rarely get deliberately deployed.
My example: For me this was the discipline cluster: self-efficacy and achievement striving both above the 95th percentile. I had been treating the ability to follow through as a personality baseline, something everyone has. The data clarified it is not. It is a structural feature of this particular profile and worth treating as a deployable asset, not a given.
Step 3: What environments and decisions to favor
Translate your profile from description to decision. For each significant trait, ask: what kind of environment rewards this, and what kind punishes it? Where would this person thrive, and where would the same profile generate friction? This step is about structural fit: work culture, social load, learning style, recovery needs. It is less about character and more about matching terrain to equipment.
My example: The bold introvert profile (very low sociability alongside very high social boldness) has a clear environmental implication. Solo or remote work is not a compromise, it is optimal. Forced social interaction is a genuine energy cost, not a character flaw. Knowing this changed some decisions.
Step 4: The genuine growth work
Look at your bottom VIA rankings and the Big Five facets where you score lowest on dimensions that affect your relationships and wellbeing. These are not failures. They are the capacities that require deliberate cultivation rather than instinctive expression. The useful question is: which of these actually matter for the life I want to live, and for the people I want to show up well for? Not everything at the bottom of the ranking needs attention. The ones that do tend to be obvious when you ask the question honestly.
My example: Kindness, Love, Social Intelligence. Combined with the loneliness research (introverts not needing less connection) these rankings had a specific implication I could not easily ignore. The growth work is warmth and attunement as deliberate practice, not as natural feeling.
Step 5: The one commitment
Go back through everything the tests surfaced across all four steps. Pick one thing (not the easiest thing and not the most dramatic thing) but the one that would make the most concrete difference to your daily life or to the people around you if you actually changed it.
Write it down as a specific behaviour, not an aspiration. “Be warmer” is not a commitment. “Check in with one person each week, without a specific reason” is.
Then set a date to retest. Personality traits are not fixed. They shift (slowly, with sustained effort) over years. The tests capture a current state, not a permanent one. Running the same instruments in 12 or 18 months and comparing the results is one of the more honest ways to check whether the practice is actually working or whether it has remained theoretical.
The Stoic frame: Marcus Aurelius did not just reflect on his failures. He made specific resolutions in the Meditations: to speak less, to be more patient, to act from reason rather than irritation. Not aspirations. Specific corrections. Step 5 is the same move: from map to commitment, from insight to practice.
A Practical Note Before You Start
The raw results are worth reading carefully. Each instrument gives you a report of some kind (a percentile breakdown, a ranked list, a type description) and sitting with those reports, noticing what surprises you and what confirms what you already knew, is itself useful.
But the most interesting analysis happens across the instruments, not within any single one. Where do they agree? Where do they contradict? What does the contradiction reveal about the limits of the tool, or about something genuinely complex in the profile being measured? Running that cross-analysis by hand across six different reports, each using different terminology and different scales, takes real effort.
I used Claude Opus to do it. I pasted in all six sets of results with a structured prompt asking for convergent signals across frameworks, meaningful contradictions, scientific sorting of the findings, and a practical framework for what to actually do with the map. The output was considerably more useful than reading each report in isolation.
The exact prompt I used is in the SAM Prompt Library, launching later this year for paid subscribers. It’s a Notion database of research, analysis, assistance and self-reflection prompts built alongside this platform. This is the first entry. It is designed to work with any combination of the six tests above, and it takes about two minutes to run once you have your results to hand.
The Tests: Where to Take Them
Total: roughly 675 questions across 65-70 minutes. Do them in one sitting if you can, on a morning when you are not rushed.
Answer instinctively. Answer as you are, not as you want to be.
These tests sketched a map. They do not tell you who you are. They tell you where you have been operating from: the characteristic patterns, preferences, and orientations that have shaped your experience so far.
Marcus Aurelius never claimed to have arrived. He kept notes on where he was failing. Six tests in one sitting give you a starting sketch. He examined himself for decades. That is the difference between a tool and a practice.
The map is only useful if you navigate by it.
The 80/20
Take the HEXACO-PI-R and the IPIP-NEO 120. Everything else is optional.
When you have the results, apply the five-part framework: what to stop pretending, what to lean into without guilt, what environments to favour, and where your genuine growth work is. You can use my AI prompt to get a thorough analysis of your results.
Spend twice as long on the last part as on all the others combined.
Thank You
Thank you for reading, sharing, and supporting this work. Whether you’ve been here since the beginning or just found Swiss Army Mum, I’m glad you’re here.
If this post sparked something, I’d be grateful if you forwarded it to someone who might find it useful, or hit the ♥️ or ↻ Restack button.
Medical note: This is educational, not personal medical advice. Your biology, history, and context matter. Work with a qualified healthcare professional.
References
Johnson, J.A. (2014). Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory. Journal of Research in Personality, 51, 78-89. https://doi.org/10.1016/j.jrp.2014.05.003
Ashton, M.C., Lee, K., & de Vries, R.E. (2014). The HEXACO Honesty-Humility, Agreeableness, and Emotionality factors. Personality and Social Psychology Review, 18(2), 139-152. https://doi.org/10.1177/1088868314523838
McGrath, R. E., Brown, M., Westrich, B., & Han, H. (2022). Representative Sampling of the Via Assessment Suite for Adults. Journal of Personality Assessment, 104(3), 380–394. https://doi.org/10.1080/00223891.2021.1955692
Pittenger, D. J. (2005). Cautionary comments regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal: Practice and Research, 57(3), 210–221. https://doi.org/10.1037/1065-9293.57.3.210
Hook JN, Hall TW, Davis DE, Van Tongeren DR, Conner M.The Enneagram: A systematic review of the literature and directions for future research. J Clin Psychol. 2021;77:865–883. https://doi.org/10.1002/jclp.23097








Really interesting and new way of looking at personality tests