Web-Page Summarization Using Clickthrough Data

  • Jian-Tao Sun ,
  • Dou Shen ,
  • Hua-Jun Zeng ,
  • Qiang Yang ,
  • Yuchang Lu ,
  • Zheng Chen

Published by Association for Computing Machinery, Inc.

Publication

Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to un- cover the full knowledge associated with aWeb page to build a high-quality summary, because the Web contains many hidden relationships that are not used in these methods. Uncovering the inherent knowledge is important to building good Web-page summarizers. In this paper, we extract the extra knowledge from the clickthrough data of a Web search engine to improve Web-page summarization. We first ana- lyze the feasibility to utilize clickthrough data in text sum- marization, and then propose two adapted summarization methods that take advantage of the relationships discovered from the clickthrough data. For those pages not covered by the clickthrough data, we put forward a thematic lexi- con approach to generate implicit knowledge for them. Our methods are evaluated on a relatively small dataset consist- ing of manually annotated pages as well as a large dataset that is crawled from the Open Directory Project website. The experimental results indicate that significant improve- ments can be achieved through our proposed summarizer as compared with summarizers without using the clickthrough data.