Like the old adage “Crawl before you can walk and walk before you can run”
The two articles by David Hawking, “Web Search Engines” reminded me of the old adage that you need to learn how to crawl before you can walk and ultimately run. Both articles explained the importance of crawlers and the type and process by which they gather information. “Successful” crawling will lead to enhanced query results. On an unrelated note, I was surprised to learn that the average query length is 2.3 words. I would think that it would be difficult to find the most relevant results with that short and un-detailed of a query.
A lesson from childhood-“sharing is caring”
The Open Archives Initiative Protocol for Metadata Harvesting speaks to the power of sharing. With a mission that seeks to “develop and promote interoperability standards that aim to facilitate the efficient dissemination of content,” the OAI ultimately enriches the work of researchers as well as those seeking to learn. Constructing an environment that encourages sharing of information should help to accelerate discovery and yield better results in many disciplines and cross-disciplines. Sharing important information could be seen as truly caring about the advancement and progress towards solutions within a collaborative and scholarly community. Of course, there are many “rules” or guidelines that need to be agreed upon in order to make the searching process streamlined and minimize frustration. As more researchers see the value in contributing to a repository, then better and richer the research becomes.
Future’s so Bright…
“The Deep Web: Surfacing Hidden Value” by Michael Bergman was an incredibly interesting article to read. Given the relatively explosive growth of the internet and content on the web, I am no terribly surprised at the sheer volume of information that is available. What is surprising is noting the disparity between the information of the surface web versus the deep web. I was particularly struck by the section about the rate of search failure being close to 85%. I am encouraged that BrightPlanet is working to diligently make the information of the deep web more accessible through the use of direct query technology. I cannot wait for the day when that technology is fused with the information available on the entire web-surface and deep-and searches move towards a 85% success rate.




