韩信列传词云

这几天,回顾了一下汉初人物,韩信算是很可惜的一位。粗略的处理了一下王学孟先生译的《淮阴侯列传》。网上下了一个韩信的画像,开始是个铅笔画,效果极差,后来将图片处理为黑白将空白处涂黑,勉强可看。

#!usr/bin python3.7

# _*_ encoding:utf-8 _*_

import jieba
from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator
from imageio import imread
import matplotlib.pyplot as plt
import wordcloud
text = open('淮阴侯列传.txt').read()
# back_color = imread('hanxin.png')
back_color = imread('韩信.png')
wc = WordCloud(background_color='white',
max_words=100000,
mask=back_color,
min_font_size=2,
max_font_size=100,
stopwords=STOPWORDS.add('韩信'),
font_path='simfang.ttf',
width=30000,
height=24000,
random_state=42)

def process_words(text):
words_list = []
jieba.add_word('韩信')
words_generator = jieba.cut(text,cut_all=False)
with open('stopwords.txt') as f:
str_text = f.read()
unicode_text = str_text
f.close()
for word in words_generator:
if word.strip() not in unicode_text:
words_list.append(word)
return ' '.join(words_list)
text = process_words(text)
import nltk
wc.generate(text)
image_colors = ImageColorGenerator(back_color)
plt.imshow(wc,interpolation='bilinear')
plt.axis('off')
plt.figure()
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis('off')
plt.show()
韩信列传词云

淮阴侯词云(彩)

韩信列传词云

淮阴侯词云(黑白)

plt.imshow(back_color)
plt.show()
韩信列传词云

淮阴侯

wordcloud = WordCloud(max_font_size=100,
font_path='simfang.ttf').generate(text)
plt.figure()
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis('off')
plt.show()
韩信列传词云

淮阴侯列传词云

文中最凸显的是汉王,天下,军队。从常识来看也是正常。由于处理的比较粗略,如‘不能’,‘他们’等也比较大。


分享到:


相關文章: