タイトル: Automatic Keyword Extraction from Historical Document Images
著者: Terasawa, Kengo
Nagasaki, Takeshi
Kawashima, Toshio
アブストラクト: This paper presents an automatic keyword extraction method from historical document images. The proposed method is language independent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. Moreover, since it does not need word segmentation, it can be applied to Eastern languages where they do not put clear spacing between words. The first half of the paper describes the algorithm to retrieve document image regions which have similar appearance to the given query image. The algorithm was evaluated in recall-precision manner, and showed its performance of over 80?90% average precision. The second half of the paper describes the keyword extraction method which works even if no query word is explicitly specified. Since the computational cost was reduced by the efficient pruning techniques, the system could extract keywords successfully from relatively large documents.
研究業績種別: 原著論文/Original Paper
資料種別: Journal Article
査読有無: あり/yes
単著共著: 共著/joint
発表雑誌名,発表学会名など: 7th IAPR Workshop on Document Analysis Systems
巻: LNCS 3872
開始ページ: 413
終了ページ: 424
年月日: 2006年2月13日
出版社: Springer
