David Madore's WebLog: A truly scientific personality test, anyone?

Personality tests abound on the Web: I have yet to find one that would tell me something interesting, useful, insightful, or non-trivial about myself; usually I end up answering most questions more or less randomly (and I'm sure that running the test a second time would give entirely different results) — at least certain tests have a not sure category for answers, which makes them a little less haphazard. Anyway, I wonder who invents these quizzes: manic taxonomists, I tend to think, who want to classify people according to criteria which they think intellectually elegant (think of the Keirsey or the enneagram tests: the categories are certainly seductive in their taxonomical harmony, but are they pertinent?), but use completely heuristic and unscientific methods for devising their tools. I'm sure I could produce a lot of fun but entirely meaningless personality tests that would classify you as one of the four elements, or one of the sixty-four 易經 (Yi Jing) hexagrams, or one of the twenty-two major arcana of the tarot, or one of the three cardinal causes (power, will and knowledge) of my mumbo-jumbo philosophy, or whatever (or perhaps as a combination of all this).

But how about a truly scientific test with a really pertinent statistical basis? Can such a thing be devised and, if so, how? I tend to believe that the actual questions are of little importance (and they might be as stupid as what is your favorite color — and perhaps something about swallows, too — provided they are numerous enough for statistical significance), what really matters is how the data are interpreted.

If, for example, one asks three hundred yes/no questions, then a data point is a point in the three-hundred-dimensional discrete hypercube (or perhaps the full hypercube if the questions admit a continuous 0-through-1 answer scale). If a sample population is subjected to the test, one obtains a cluster of points in said hypercube. Then, out of that cluster, one wishes to extract certain statistically meaningful variables, or subpopulations: how can this be done? Here's at least one way to do it (probably not a very smart one, but at least it shows the sort of thing I'm after). Take the line in the hypercube that minimizes the sum of squares of distances to the population points — in other words, the Gaussian one-dimensional best fit for the data; projecting the quiz result point to this line determines a one-parameter classification which is in a certain way the most significant one (except that its main drawback is that it takes the euclidian structure on the cube as a natural one, which it is not, so it is really biased by the choice of questions; but you get the idea of the sort of things that could be done). After that, if one wishes to extract a second, and a third, variable from the quiz, one could simply take the next dimensional Gaussian fits for the data points. Of course, interpreting these variables demands some competence in psychology, but at least their definition would be based on some objective statistical data, not merely the test author's intuition on how to classify the human mind.

Has such a test been devised already? If not, I might try creating one myself, if only as a proof of concept. (Though the main difficulty would be to find a sample population willing to take the test although they will get no results from it until the entire data have been processed.)