Blog: Sorting Out My Corpus

So, last time I was on about how I chose Doctor Doom to be the focus for my PhD (what I have, by the way, now finished) and then worked out what "The Marvel Age" was in order to set some criteria for the CORPUS that I'd look at.

"Corpus" is an excellent word which basically means "a collection of books and stuff" where, in my case, the "stuff" bit was mostly comics, cartoons, radio shows and games. In order to put it together all I needed to do was find every single appearance by Doctor Doom in any of these texts between the dates I'd settled on. PEASY right?

IT WAS NOT PEASY. The first thing I had to do was search The Grand Comics Database for every comic Doom appeared in during this time. Well, actually, the FIRST thing was to look at the various different databases and realise that GCD (as the cool kids call it) was the best one, THEN download it onto my own server, THEN work out how all the table fit together, and THEN search it for the aforesaid appearances.

Doing that showed me that the GCD is not necessarily entirely reliable or consistent. It's a wonderful resource put together by hundreds of different people over many years, which is all very excellent but unfortunately means the quality of the data varies HUGELY, as does the decision making processes. For example, generally speaking the people entering the data DON'T add in information about every single advert or the contents of letters pages, but some DO, which meant that I had to go through checking every single result to make sure it actually WAS a story featuring Doctor Doom, rather than an advert of an image in a letters column. Such things ARE valid items about Doctor Doom (and the source of some FASCINATING ARTICLES hem hem) but weren't what I was after here.

Another problem was that not every appearance by Doctor Doom was recorded, at least not as Doctor Doom. One HUGE example of this was his appearances in Not Brand Echh, a "humour" series (i.e. ripping off "Mad Magazine") using Marvel characters published in the 1960s. Doom appeared in LOADS of issues of this, but was "hilariously" referred to as "Doctor Bloom" in most cases (I have no idea why) so did not come up in my searches, despite appearing on the COVER for some of these. I only realised this when I was reading through the corpus and noticed a single issue that HAD been logged as "Doctor Doom", so had to go back and check through THE LOT to find more. On the plus side, this led to a FASCINATING PRESENTATION about how it all worked!

None of this was PEASY as I say, but it was a flipping DODDLE compared to what came next. As far as I know there isn't a database of cartoons, books, radio shows etc etc which allows you to search by character names (iMDB does some of this but by NO MEANS all, and it's expensive!) so I had to just GO LOOKING for them. This involved a WHOLE HEAP of Googling and LOTS of going back and forth over the course of the PhD as new things kept on popping up. Some of these turned up too late to be included - most HEINOUSLY I missed an actual NOVEL starring Doctor Doom called (inevitably) DOOMSDAY - but in the end I got myself a PhD Corpus of 266 texts, of which three were newspaper strips, six were radio shows or similar audio-only recordings, six were games, 15 were cartoons and all the rest comics. You can see a big list of them - along with all the other items I collected which were didn't make it into the final corpus - over on the MARVEL AGE DOOM site.

This was a LOT of comics, games, books etc etc to get through, and I soon realised that it was WAY too much, so I used A Stratified Random Sampling methodology (which - FEAR NOT - I shall not go into here) to narrow this down to a representative of 69 (nice) texts. I then set out to EXAMINE them!

How I did THAT is a whole other story which I shall save for a whole other time, as I think that's probably about ENOUGH for now. Basically it was DEAD CLEVER but also QUITE HARD and took FLIPPING AGES. Further details can be supplied on application!

posted 8/4/2022 by MJ Hibbett

How much time could you have saved if you weren't so CHILDISHLY INSISTENT on reaching the figure of 69??
posted 8/4/2022 by Gareth

I spent several HOURS trying to work out how to NOT have that, honest. Also, several MORE sniggering about it.
posted 8/4/2022 by MJ Hibbett

