December 18, 2013

Conclusions

We have presented a framework and an implemented system for browsing and
analyzing sets of documents which are annotated with category keyword labels.
The system might be used as a support tool for domain experts that need to analyze
and summarize large document sets. It may also be used in the regular query-andbrowse
cycle of a document retrieval session, to support the browsing phase.
Currently, when users face the common response of the type “1000 documents
match your query”, they need to guess in advance how they might restrict their query.
In such cases the KDT system could provide much help in figuring out the
content of these 1000 documents, and narrowing down the sets of target
documents. The KDT system is based on a compact model, which relies on rather modest assumptions. It requires annotation of documents with category keywords which are organized in a simple hierarchy. It also demonstrates the rich variety of KDD
operations that can be based on keyword co-occurrence distributions and their
comparison with the relative entropy distance measure. The simplicity of the
model makes it rather easy to implement, and the pre-computation of keyword cooccurrence distributions makes online computations very efficient. In future work we plan to extend the KDT framework to work also on cooccurrence distributions of terms and
groups of terms that were extracted directly from the texts. This way we hope to
combine these two levels of representation, namely category labels and document
terms, in analogy to the way they are often combined in retrieval queries.

No comments:

Post a Comment