The Improvements of Text Rank for Domain-Specific Key Phrase Extraction

Special Issues Editor (Nottingham Tent University, United Kingdom (Great Britain))

TextRank is a common method to extract keyphrases which are important for many tasks of Natural Language Processing. Although Term Frequency and Inverse Document Frequency (TFIDF) is used to calculate the node weight in improved TextRank method for Keyphrase extraction in previous work, it performs poorly when extracting domain-specific keyphrase. In this paper, we introduce Average Term Frequency(ATF) and Document Frequency(DF) to calculate the node weight. Further, we incorporate node weights and semantic relationship of two words into a new method-Hybrid TextRank for extracting Keyphrase from specific domain. When using these improvements approaches to extract keyphrases from a Tibetan religion domain, experiments demonstrate that hybrid TextRank is better than the others and its precisions of Top 10, 20, 30 words reach 90%, 85%, 80%.

Journal: International Journal of Simulation: Systems, Science & Technology, IJSSST V17

Published: Jul 14, 2016

DOI: 10.5013/IJSSST.a.17.26.11