Doctor of Philosophy (Ph.D.)
William L Bynum
A new concept for information storage and retrieval is proposed that links chunks of information within and among documents based on semantic relationships and uses those connections to efficiently retrieve all the information that closely matches the user's request. The storage method is semantic hypertext, in which conventional hypertext links are enriched with semantic information that includes the strength and type of the relationship between the chunks of information being linked. A retrieval method was devised in which a set of cooperating software agents, called scouts, traverse the connections simultaneously searching for requested information. By communicating with each other and a central controller to coordinate the search, the scouts are able to achieve high recall and high precision and perform extremely efficiently.;An attempt to develop a document base connected by semantic hypertext is described. Because of the difficulties encountered in the attempt, it was concluded that there is no satisfactory method for automatic generation of semantic hypertext from real documents. The collection of semantically linked documents used in this research was generated synthetically.;A Java-based agent framework used to develop three types of software scouts. In the simplest implementation, Scoutmaster, the paths of the scouts through the document base were specified by a central controller. The only task of each scout was to follow the links specified by the central controller. In the next level of autonomy, Broadcaster, the controller was used strictly as a conduit for scouts to exchange messages. The controller received information from the scouts and broadcast it to all of the other scouts to use in determining their actions. In the final implementation, Melee, the central controller was used only to inaugurate the scout searches. After initialization, the scouts broadcast their messages to all the other scouts.;Experiments were performed to test the ability of the scouts to find information in two synthetically created document sets. All scout types were able to find all of specified information, i.e. high recall, while searching few documents that did not contain the information, i.e. high precision. Using groups of scouts, the best time to search document sets with up to 3000 documents and 2.5 million links was about thirty seconds.
© The Author
Rehder, John J., "Semantic software scouts for information retrieval" (2000). Dissertations, Theses, and Masters Projects. William & Mary. Paper 1539623977.