Verses Wordcloud

For about five years now, I’ve been saving my favorite song verses into a CSV file, along with a “score” of how much I like it. After some time, I decided to do a wordcloud in Mathematica but, after finding the WordCloud python package, I decided to translate it into Python and enhance it a bit. With over 885 verses, this script generates a wordcloud with all the verses and makes it easy to overlay it over a background with software such as Adobe Illustrator.

Development

As previously mentioned, verses are stored in a CSV with each row consisting of a verse-score tuple. This score is somewhat arbitrary and unbounded, but it does provide a certain metric that will be used to define the sizes of the verses in the cloud.

First thing we’ll do is to import the required libraries:

# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from matplotlib.colors import LinearSegmentedColormap

And our working directories:

# Setup paths
BASE_PATH = '/Users/github/lastfmViz/'
(DATA_PATH, STAT_PATH, IMG_PATH, FONT_PATH) = (
    BASE_PATH + 'data/',
    BASE_PATH + 'img/',
    BASE_PATH + 'fonts/'
  )

With those out of the way, we’ll define some image resolution and style parameters:

(WIDTH, HEIGHT, RESOLUTION) = (3840, 2160, 2000)
# Style parameters
#   REL_SCL: How much priority the "score" has on the font size
#   MIN_SIZE: Font size of the smallest verse
#   MAX_WRD: Maxim number of words.
(REL_SCL, MIN_SIZE, MAX_WRD) = (.05, 8, 5000)
# Defining color dictionary from gray to black
#   (to make it look like a "typewriter")
cdict = {
    'red':   [(0, .2, .2), (1,  .0, .0)],
    'green': [(0, .2, .2), (1,  .0, .0)],
    'blue':  [(0, .2, .2), (1,  .0, .0)]
  }
cMap = LinearSegmentedColormap('csMap', cdict, N=256)

With these parameters defined, we load our CSV file into a pandas dataframe.

data = pd.read_csv(
    stp.DATA_PATH + stp.USR + '_vrs.csv',
    names=['Verse', 'Score']
  )
ranks = {
      str(data.iloc[i][0]): int(data.iloc[i][1]) for i in range(data.shape[0])
  }

To make it look better, I decided to use a typewriter-like font I downloaded from: 1001 freefonts. The lines that define and generate the wordcloud follow:

# Generate the wordcloud
wordcloudDef = WordCloud(
    width=WIDTH, height=HEIGHT, max_words=MAX_WRD,
    relative_scaling=REL_SCL, min_font_size=MIN_SIZE, prefer_horizontal=1,
    background_color="rgba(0, 0, 0, 1)", mode="RGBA",
    colormap=cMap,font_path=stp.FONT_PATH + 'mytype.ttf'
  )
wordcloud = wordcloudDef.generate_from_frequencies(ranks)

And then we save it to disk using matplotlib without a frame or padding:

# Export the resulting image
ax1 = plt.axes(frameon=False)
plt.figure()
plt.imshow(wordcloud, interpolation='bilinear')
plt.tight_layout(pad=0)
plt.axis("off")
plt.savefig(
    stp.IMG_PATH + 'VER_WDC.png',
    dpi=RESOLUTION, orientation='portrait', transparent=True,
    bbox_inches='tight', pad_inches=0
  )
plt.close('all')

It is worth noting that these lines create a wordcloud without a background, but by using Adobe Illustrator with one of the watercolor paper backgrounds downloaded from inspiration hut, it is extremely easy to combine the layers to produce the final result:

Documentation and Code