Today we’ll try to train our own neural network to write lyrics for songs. The training sample will be the texts of the Hands Up group. Nothing prevents you from exchanging data for the texts of your favorite groups. To retrieve data from websites we use Python3 (module BeautifulSoup).
The task will be to download data (texts) from websites and then train the neural network based on them.
In fact, you can break the work into 2 stages:
Stage 1: unload and save the lyrics in a convenient format.
Stage 2: train your own neural network.
To trolls and amateurs to search for the secret meaning I will say right away
Virtual environment for the project (virtualenv).
REPO
#start virtualenv -p python3 my_song #
#run source my_song/bin/activate
#install modules pip install -r requirements.txt
Unloading and saving the lyrics from the website in save in * .csv format.
#-*- coding: utf-8 -*- import urllib.request from bs4 import BeautifulSoup import pandas as pd URL = 'http://txtmusic.ru/index.php?s=%D0%F3%EA%E8+%C2%E2%E5%F0%F5%21' # page = urllib.request.urlopen(URL) soup = BeautifulSoup(page) li = soup.body.findAll('li') # <li> URL URLS = ['http://txtmusic.ru/'+laget('href') for l in li] df = pd.DataFrame(columns=['name', 'text']) list_of_names = [] list_of_text = [] ind=0 BIG = "" for URL in URLS: page = urllib.request.urlopen(URL) soup = BeautifulSoup(page) article = soup.body.findAll('article') # ( rticle) text = str(article[0]).split('\n')[8] text = text.split('<br/>') text = [t for t in text if t!=''] text = " ".join(text) name= str(article[0].h1).split(" - ")[1].rstrip("</h1>") list_of_text.append(text) list_of_names.append(name) df.name = list_of_names df.text = list_of_text df.to_csv('songs.csv') # 'songs.csv'
The next step is to transliterate the text into the Latin version (the model works better for the Latin alphabet than for the Cyrillic alphabet).
import pandas as pd df = pd.read_csv('songs.csv') df = df[['name','text']] df.text = df.text.apply(lambda x: cyrtranslit.to_latin(x, 'ru')) df.text.to_csv('trans.csv') ''' cyrtranslit.to_latin(' ', 'ru') 'Moyo sudno na vozdushnoj podushke polno ugrej' cyrtranslit.to_cyrillic('Moyo sudno na vozdushnoj podushke polno ugrej') ' ' '''
from textgenrnn import textgenrnn textgen = textgenrnn() textgen.train_from_file('trans.csv', num_epochs=1) # created file textgenrnn_weights.hdf5
And that’s all! It was easy and convenient to use (textgenrnn) [ https://github.com/minimaxir/textgenrnn ], the texts are still not realistic, but you have to change the model parameters yourself.
The advantage of textgenrnn is that you do not need to deal with any data processing, just load a text data set and sit down with a cup of coffee while watching your model training.
# textgen_2 = textgenrnn('textgenrnn_weights.hdf5') textgen_2.generate(3, temperature=1.0) textgen_2.generate_to_file('lyrics.txt')
Now, after you have learned how to make textgenrnn to create texts, you can do a lot using this knowledge:
https://github.com/minimaxir/textgenrnn
https://towardsdatascience.com/ai-generates-taylor-swifts-song-lyrics-6fd92a03ef7e