Personality tests abound on the Web: I have yet to find one that
would tell me something interesting, useful, insightful, or
non-trivial about myself; usually I end up answering most questions
more or less randomly (and I'm sure that running the test a second
time would give entirely different results) — at least certain
tests have a not sure
category for answers, which makes them a
little less haphazard. Anyway, I wonder who invents these quizzes:
manic taxonomists, I tend to think, who want to classify people
according to criteria which they think intellectually elegant (think
of the Keirsey or the enneagram tests: the
categories are certainly seductive in their taxonomical harmony, but
are they pertinent?), but use completely heuristic and unscientific
methods for devising their tools. I'm sure I could produce a lot of
fun but entirely meaningless personality tests that would classify you
as one of the four elements, or one of the sixty-four 易經
(Yi Jing) hexagrams, or one of the twenty-two major arcana of the
tarot, or one of the three cardinal causes (power, will and knowledge)
of my mumbo-jumbo philosophy, or
whatever (or perhaps as a combination of all this).
But how about a truly scientific test with a really pertinent
statistical basis? Can such a thing be devised and, if so, how? I
tend to believe that the actual questions are of little importance
(and they might be as stupid as what is your favorite color
— and perhaps something about swallows, too — provided
they are numerous enough for statistical significance), what really
matters is how the data are interpreted.
If, for example, one asks three hundred yes/no questions, then a data point is a point in the three-hundred-dimensional discrete hypercube (or perhaps the full hypercube if the questions admit a continuous 0-through-1 answer scale). If a sample population is subjected to the test, one obtains a cluster of points in said hypercube. Then, out of that cluster, one wishes to extract certain statistically meaningful variables, or subpopulations: how can this be done? Here's at least one way to do it (probably not a very smart one, but at least it shows the sort of thing I'm after). Take the line in the hypercube that minimizes the sum of squares of distances to the population points — in other words, the Gaussian one-dimensional best fit for the data; projecting the quiz result point to this line determines a one-parameter classification which is in a certain way the most significant one (except that its main drawback is that it takes the euclidian structure on the cube as a natural one, which it is not, so it is really biased by the choice of questions; but you get the idea of the sort of things that could be done). After that, if one wishes to extract a second, and a third, variable from the quiz, one could simply take the next dimensional Gaussian fits for the data points. Of course, interpreting these variables demands some competence in psychology, but at least their definition would be based on some objective statistical data, not merely the test author's intuition on how to classify the human mind.
Has such a test been devised already? If not, I might try creating one myself, if only as a proof of concept. (Though the main difficulty would be to find a sample population willing to take the test although they will get no results from it until the entire data have been processed.)