One of the problems when searching through a vast database, or the World Wide Web, is that you almost need to be an expert in the area to know what words to put in the “search” box. For example, if you wanted to find out about a movie, and all you remembered about it was the approximate year it came out and a word or two from the tag line, you would be unlikely to locate it quickly and easily, even on a dedicated movie site such as IMDb.
Funded by an ARC Discovery Project, Dr Wei Wang and a team of PhD researchers in the School of Computer Science and Engineering have developed a prototype database searcher that focuses on links rather than formal language. “About 80% of database queries are very, very simple, but there are always 20% of queries that are complex and unexpected,” he says. “This research will help casual users to obtain valuable information easily from databases, and this ultimately leads to a wider and easier use of database technologies and an increase in productivity.
“It’s also possible that no single page supplies your need for information, so this method can combine that information – we search beyond the page boundary to describe what you are looking for.”
Wei has also been developing several methods that can quickly and easily search through large databases for similar objects, such as plagiarised paragraphs in university essays. Previously it was very time-consuming to do such searches. “Many systems that had to deal with the problem for large datasets resort to approximate or heuristic solutions, because fast and scalable exact solutions were not available,” he says. “Our research enables us to apply exact solutions to these problems, hence guarantees that there is no sacrifice of result quality. For example, in plagiarism detection, this means all suspicious cases will be found out and referred to the experts.”
The system designates each object or article with a mathematical signature “and if two objects are very similar, their signatures are likely to be very similar as well,” he says. “The prototype we put on the web has been used by other universities in the US and Germany.”
The system could be used for governments or institutions such as large banks searching for similarities in merged databases to find multiple records that might apply to the same person. It will also have the capability to search through images and videos.