|
|
|
|
|
Design
Following diagram shows a simple search engine design:
Fig 2.1
Search engine consist of three major components: Search Index, Index Generator, and Searcher
Index Generator works off-line and generates transient index which need to be synchronised with permanent index . Major components of Index Generator are: Parser and Crawler. Crawler crawls and gets the documents like in case of web search engine, crawler will go around the Internet looking for documents and downloading the same. Crawler will also need to check for updates to documents already downloaded and/or present in Search Index. Parser will parse these documents and may be based on partial or as much as possible relevancy, create search index entries.
The search index entries are stored in transient part of Search Index which would be later synchronised with permanent part. Synchronisation is major process as many times it leads to rearrangement of Search Index as relevancy of a document keep changing with additions of new documents, modification of earlier documents, logic applied for relevancy etc. This is the reason Search Index will have at least two parts as mentioned.
Searcher is in charge of user interaction and it is relatively simple compared to other two components. Searcher may consist of Interface part and Logic part. Interface part takes care of user interface (simple line by line result or categorised results, font/colors etc.). Interface gets the results data from Logic part. Logic part searches in Permanent part of Search Index. Its job is simpler if Relevancy is properly taken care during indexing. Logic can in fact fine tune the relevancy.
|
|
|
|
|
|
|
|
|
|