Intro
For about five years now, I’ve been saving my favorite song verses into a CSV file, along with a “score” of how much I like it. After some time, I decided to do a wordcloud in Mathematica but, after finding the WordCloud python package, I decided to translate it into Python and enhance it a bit. With over 885 verses, this script generates a wordcloud with all the verses and makes it easy to overlay it over a background with software such as Adobe Illustrator.
CodeDev
As previously mentioned, verses are stored in a CSV with each row consisting of a verse-score tuple. This score is somewhat arbitrary and unbounded, but it does provide a certain metric that will be used to define the sizes of the verses in the cloud.
Setup variables
First thing we’ll do is to import the required libraries:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from matplotlib.colors import LinearSegmentedColormap
And our working directories:
# Setup paths
BASE_PATH = '/Users/github/lastfmViz/'
(DATA_PATH, STAT_PATH, IMG_PATH, FONT_PATH) = (
BASE_PATH + 'data/',
BASE_PATH + 'img/',
BASE_PATH + 'fonts/'
)
With those out of the way, we’ll define some image resolution and style parameters:
(WIDTH, HEIGHT, RESOLUTION) = (3840, 2160, 2000)
# Style parameters
# REL_SCL: How much priority the "score" has on the font size
# MIN_SIZE: Font size of the smallest verse
# MAX_WRD: Maxim number of words.
(REL_SCL, MIN_SIZE, MAX_WRD) = (.05, 8, 5000)
# Defining color dictionary from gray to black
# (to make it look like a "typewriter")
cdict = {
'red': [(0, .2, .2), (1, .0, .0)],
'green': [(0, .2, .2), (1, .0, .0)],
'blue': [(0, .2, .2), (1, .0, .0)]
}
cMap = LinearSegmentedColormap('csMap', cdict, N=256)
Load data
With these parameters defined, we load our CSV file into a pandas dataframe.
data = pd.read_csv(
stp.DATA_PATH + stp.USR + '_vrs.csv',
names=['Verse', 'Score']
)
ranks = {
str(data.iloc[i][0]): int(data.iloc[i][1]) for i in range(data.shape[0])
}
Generate wordcloud
To make it look better, I decided to use a typewriter-like font I downloaded from: 1001 freefonts. The lines that define and generate the wordcloud follow:
# Generate the wordcloud
wordcloudDef = WordCloud(
width=WIDTH, height=HEIGHT, max_words=MAX_WRD,
relative_scaling=REL_SCL, min_font_size=MIN_SIZE, prefer_horizontal=1,
background_color="rgba(0, 0, 0, 1)", mode="RGBA",
colormap=cMap,font_path=stp.FONT_PATH + 'mytype.ttf'
)
wordcloud = wordcloudDef.generate_from_frequencies(ranks)
Save image
And then we save it to disk using matplotlib without a frame or padding:
# Export the resulting image
ax1 = plt.axes(frameon=False)
plt.figure()
plt.imshow(wordcloud, interpolation='bilinear')
plt.tight_layout(pad=0)
plt.axis("off")
plt.savefig(
stp.IMG_PATH + 'VER_WDC.png',
dpi=RESOLUTION, orientation='portrait', transparent=True,
bbox_inches='tight', pad_inches=0
)
plt.close('all')
Overlaying
It is worth noting that these lines create a wordcloud without a background, but by using Inkscape with one of the watercolor paper backgrounds downloaded from inspiration hut, it is extremely easy to combine the layers to produce the final result:
Code repo
- Repository: Github repo
- Dependencies: matplotlib, WordCloud, pandas