Friday, November 7, 2008

Readings week 11

Web Search Engines Parts 1 and 2
Search engines compute amazing amounts of information, but the author made it seems as though the components of a search engine are relatively simple. I felt like the hardest part of a search engine to maintain is the space required to process and store all the information. He talked about sorting by relevancy, and his option was the newspaper The Onion. I wondered how exactly a computer program can determine relevancy. If I search for "The Onion" the word "The" will be dropped from the search; but, what about "an" or "some" couldn't these terms be used to determine the context?

OAI Metadata Harvesting
The OAI project sounds really cool, I like the idea a lot. I think that it is a little ironic though, that archivists have trouble maintaining the data about their institutions and what exactly they are doing. I am excited to see where this project goes in the future. I wonder if the institutions will be able to do a better job of sharing information so that the metadata can be gathered across instutions and made searchable?


The Deep Web
At first I basically understood the deep web, but I couldn't get past my thoughts of child pornographers denying search engine access to their sites. Now I feel as though I have a better understanding of the deep web. I don't understand how Bergman got the information on the data sizes of deep web sites though. I think of something like Blackboard where the content is changed and added to everyday by hundreds and perhaps thousands of universities. Also, if the information is restricted like academic journal sites, how can you claim to be able to predict the size of the database without joining?

1 comment:

Unknown said...

Good point about how this data was gathered ... I was wondering the very same thing.