Search Engine Concepts and Design

Samir Amberkar
(originally published on 13-Apr-2010)

Abstract: The article describes basic concepts behind web search engine. It then looks at basic framework of such search engine. Though discussed for web search engine, the concepts are applicable to other search tools as well.

Aim of search engine is to provide relevant results in minimal time for words, phrases searched.


Relevancy of results is trial and error process and should get better over period of time - if there is a team in place who analyse what is searched, what is typically expected, and what search engine gives. For example, if user search for pizza , it is more likely that user meant to search for nearby pizza restaurants rather than pizza recipe or origins of pizza or meaning of word pizza. Good search engine will provide most relevant results, but at the same time, it may also provide alternate results in side bar or by some other means.

Though relevancy is trial and error process and there may not be thumb's rules in place, certain assumptions can be made. Like if user search for 3G mobile , even though user has not put quotes around, it makes sense to provide results wherein words 3G and mobile are next to each other or as near as possible. Another good assumption is: the documents in which word is found at the beginning or found in capitals are more relevant. This is based on observation that all documents will have title (and may be abstract too) at the beginning describing the content of page; and that makes document more relevant for those words.

Relevancy also depend on type of document (like HTML page, word/pdf document, presentation document) and type of website (like is it knowledge related, is it commercial site, is it blog etc.)

Minimal time

To minimise searching time, it makes sense that rather than searching the documents as and when required for search word, we can create a sort of index or database which can be searched at much faster speed. This is similar to what you do when you arrange your music CD collection, you arrange in certain order most suitable to you so that later when you need it, you get CD that you want easily. Good search engine will also do the same: arrangendata in certain way, taking partial or as much as possible care of relevancy !

