Alex's post is long. It's well written, and you should read it sometime, but I think I'll still be an enormous hypocrite and provide you with a summary of what he says anyway:
- It should be possible to read a review, and then answer the question, "Did this guy like the game? Was it a waste of his time? Does he regret playing it? Would he recommend it to others?" If you cannot answer this, the review is worthless, rambling drivel which literally does not make any sense.
- There is no such thing as a score-less review. To demonstrate, take any given group of reviews (ostensibly) without scores. Now label each review as "positive" or "negative". You should be able to do this easily due to 1. When done, go back and replace each "positive" with 1, and each "negative" with 0. Even though the reviewer did not give a score, you have correctly approximated the score that he would have given, with scores on a scale of 0 to 1. Now, repeat this procedure with 5 tags: Strongly positive/negative, mildly positive/negative, and neutral. Replace them with 1-5. You have now approximated a score on a more familiar 1-5 scale.
- It follows that a review, any review, even if it claims to not assign a score, must describe a score implicitly. That is, even if there isn't a score, you can read it and say, oh, this looks like a 6/10 (from 2). If you can't say this, then the review is nonsense (from 1).
Alex also talks about perfect scores. He's wrong there: If you think 100/100 is a perfect score, I don't see why you can't or won't think 5/5 is a perfect score. It doesn't really matter to me- I don't have a problem with scores being out of 5 and not 100.
Another thing he complains about is close scores like, for example, 76/100 and 77/100. His position boils down to "I cannot imagine myself making very precise judgements about games, therefore it is impossible." It is a laughable position. Alex seems to have a habit of confusing the negligible with the actually non-existent: Just because it's hard to see that 76 to 77 difference, doesn't mean it doesn't exist. I agree that if you are really using a score system with 100 values (or worse, decimals!), you should probably think again. But that doesn't mean it's inherently bad, and it doesn't mean there can't be some guy out there who really can review games so finely that he can discern a 1/100 difference in quality (although, most that score out of 100 probably can't do it). This part is also tangential to my purposes.
So what are my purposes, then? Well, as I said, it's clear that I can't just not score my reviews. That would be sticking my head in the sand. However, I have one problem with scores: Suppose you have a scoring system out of 100. You have 4 categories: Graphics, Gameplay, Story, Replay value. Each one gets a score out of 25, then you sum them all for the final score. Reasonable enough, and many mainstream reviewers actually do this. (To make it Alex-friendly, you can make each category 0-1 and then sum them to 0-4.)
Anyhow, the problem: With this scheme, Dwarf Fortress gets 0+25+0+25=50. But Dwarf Fortress isn't a mediocre game! To fix it, you can make it so that gameplay and replay value are out of 45, and others out of 5. Then DF gets 90. Cool, right? Yes, but now Limbo gets, oh, 5+30+5+0=40 if you are really generous. I mean, I didn't think Limbo was perfect1. But I certainly didn't think it was below average crap that deserves a 40/100.
So, for some games, graphics matter and replay value doesn't. For others, the opposite. Rather than come up with a complicated weighting scheme to solve this, I tried to find a lazy shortcut. I think I succeeded.
If you give a game a 10/10, what does that mean? Essentially, it's the same as saying, "dude, this game is awesome, you'll love it". 0/10 would be saying "piece of shit, don't bother". Reviews are, at their basest, for answering the question, "should I play this game?" Yes, they serve as commentary and can be very valuable in that respect as well, but that question is what gave rise to "reviews" in the first place.
So how would I deal with, say, DF, if I was to give scores? Probably I'd give it a 9/10, and say something to the effect of "if you like roguelikes with ASCII graphics, then it's really a 10/10, and if you really care about the graphics it's 6/10 with tilesets and 3/10 without". Tastes vary. Review audiences are heterogenous2.
However, the review isn't necessarily going to be an absolute endorsement (or disapproval), either. It will probably say, "some such people will like this, some such people will not". Now, if you see a 5/10 game, what if you can't tell whether you're the guy who will like it despite its flaws, or the guy who will definitely hate it?
Sometimes, it's obvious from reading the review. Oftentimes it's not. And in that case, you'll guess. And with a 5/10 score, you will probably guess that you're equally likely to be in either camp... Wait, hold on. Isn't 1/2 the chance of success for an unbiased binary trial? Hmm, what if... What if review scores are probabilities? What if, when I give a game score X out of Y, that means I'm estimating X/Y of my audience will like it, and consequently3, that there's an X/Y probability that you will like it?
Yeah, I'm kinda proud of myself for this. I think it's a great idea - I'm perfectly happy with a score system like this, both as reviewer and review reader4. So how would it look in practice?
Now, I don't want to make 1% resolution estimates, there aren't even 100 people reading my reviews. So I will use this scale:
- 1: a game only an indy dev could love - 10% chance you'll like it; 10% of my audience will like it.
- 2: mostly shit, but has noteworthy positive qualities - 30% chance you'll like it; 30% of my audience will like it.
- 3: absolutely mediocre - 50% chance you'll like it; 50% of my audience will like it.
- 4: recommended, but not for everyone - 70% chance you'll like it; 70% of my audience will like it.
- 5: if you don't like this, you don't have a soul - 90% chance you'll like it; 90% of my audience will like it.
In fact, if I happen to decide that "indie game bias" is relevant for a game, I can just bump it up one level. That seems reasonable. If the devs are, say, literally curing cancer and disease, I can totally see bumping a game 2 levels. I like that - I'm okay with foldit being a 3/5 game, and I'm okay with treating it like a 5/5 game because of its mission.
Furthermore, the above may be written in the context of video games, but there's nothing about this system specific to video games. There's no reason not to use it for movies, books, what have you.
Lastly, the nice thing is that, while I've never heard of a reviewer using this system explicitly, all the review scores out there are very compatible with it. Good games are likely to get high scores, and you are likely to enjoy good games. Ergo, high score means more likely to enjoy. You can assume these are just traditional scores, too, if the "math" is confusing, but if basic probability confuses you, what on earth are you doing on my blog?
1: If you look now, you will see my Limbo review does include a score. That was added after the fact, after this post was written.
2: I don't know if you can even target a homogenous audience of non-trivial size, but I know I wouldn't want to even if I could.
3: It's just basic probability. If a persons in a room like a game, and b persons don't, then when you pick one of them at random, the chance that you get someone who does like it is p=a/(a+b). Since you are only thinking about this because you have no idea which group you belong to, we can assume you are equally like to be any one of those persons. So the chances of you liking the game are also p, which is equal to the fraction of people who like it.
4: It also solves all sorts of problems we weren't even trying to solve: Among other things, it means that even if you buy a 9/10 game and hate it (or buy a 1/10 game and love it), that's fine, because it's a probabilistic prediction, and you are still better off trusting it (assuming the reviewer is trustworthy and reliable).
5: Two things you may notice: First, I'll never have to say you will definitely like a game, or definitely dislike it. Second, no matter how many times I'm wrong, I can always blame it on probability. Man, I'm so clever! Seriously, though: Sorry about this, but them's the breaks. I don't think a system that allows 0% or 100% probabilities would be productive, and I'm not sure if it would be mathematically sensible. Nor do I intend to find out.