Thursday, June 17, 2010

Spelling, Transcription, and Phonetics

So you've decided to research your family tree. Congratulations. You figure you're all set. You know the names of your grandparents, and even the maiden name of your great grandmother. What could go wrong?

You type the name of your ancestor into the search box, hit enter and suddenly you realize it might not be quite that simple as you see the 15,000 hits generated. And this is the easy part. The hard part is finding your ancestor when the name has been misspelled.

So what are the problems you face searching for your surname. Well, consistency of spelling for one. If a name can be spelled multiple ways phonetically, then it will be, and the problem is exacerbated (that's my $10 word for the day) if you happen to have a foreign name. Another issue is simple misspellings because the immigration clerk was tired after a long day, the children are crying, the typewriters are pounding, and there was a thunderstorm banging away while the clrek was filling in the forms. Alternately the clerk couldn't spell terribly and only had the job because of family connections, or... I'm sure you get the spelling point. If you are looking at transcribed records that were originally hand written then you add a whole other layer for errors to happen, because the origianl clerk may have made a mistake, and then the transcriber writes what they think the original clerk wrote. Heaven help us if the transcription is done by one person and then typed into the database by another.

So what can you do to solve some of these issues? Soundex searches help, but soundex has some real issues for example Lee and Leigh both have different codes even though they are pronounced identically. For an excellent article on soundex issues go here.

Just to give you an idea of what can happen to a name that seems simple lets look at Halls. First we'll just do basic misspellings based on phonetics or dropping the "s":

HALLS
hawlshauls
hallhawlhaul
allsawls
hallse
holls
holl
oll
all

Now we will move on to errors by clerks due to noise, or whatever:
HALLS
Haels
Hiles
Hills
Hales
Halles

Now my favourite, transcription errors from hand written originals:
HALLSkalls
kolls
kells
kallo
kollo
kello
hallo
hello
kales
keles
malls
mells
mallo
mello
ralls
rallo
rolls
rollo
rells
rello
malla
mella
rella
ralla
helle
hella
maels
meels
maela
meela
meals
meelo
maelo
mealo
kales
koles
kelas
halas
helas

I could go on, but I think you get the picture. In the second and third tables I have not bothered to remove the "s" from the last name, so you can effectively double the number of ways that the name Halls can be spelled incorrectly. In the examples above there are at least 100 ways to misspell/mistranscribe Halls, and the list is not exhaustive. I don't mean to get down on transcribers, they do a lot of hard work from sources that are very difficult to read for a number of reasons, but sometimes you have to wonder what drugs some transcribers were on.

You also run into the occasional problem where the database you are searching has the first name as the last name, and vice-versa.

You'd think that other than what is mentioned above, Halls isn't a bad name to try to track, most names will have similar issues with spelling and it is not terribly common. True, it is not common, but when the default practice of a search engine, even Google, is to treat the "s" on the last name like it isn't there, and then throw in every community center, church hall, residence hall, town hall, site that mentions a certain Christmas song, hall's with an apostrophe, and a certain brand of cough candy, you come up with approximately 33.5 million hits on Google.

It could be worse
. Smith gets 412 million hits on Google.

No comments:

Post a Comment