Predict Yorùbá Hymn Lyrics with Tensorflow

Wuraola Oyewusi
3 min readMar 11, 2020

--

Notebook: The Notebook for the codes can be found here.
I didn’t clear code outputs because of those who want to be sure they are following the tutorial correctly.

The DataSet: It is a collection of 10 Yoruba hymns in about 260 lines, the first 10 lines are the titles of the hymns. Yoruba hymns are available online but they are without the tone marks or diacritics, so I took time to put the tone marks. Here is a link to the dataset

I like hymns especially Yorùbá ones. I like Machine Learning and Deep Learning algorithms especially the ones that work well with sequences, I just got better at doing Natural Language Processing with Tensorflow and boom! there’s this article.

Hymns are songs and poems, Each line has an average of about 8 syllables
Next word prediction is a popular NLP task. So we’ll train a deep learning model on a Yoruba hymns dataset and see how it will predict coherent lyrics.
If you’re curious if the model will learn Yorùbá words with the accents or diacritics, yes it will.

Yorùbá hymn lyrics generated by a Bidirectional LSTM in Tensorflow.

‘olúwa gbà gbà mí ègbè nù kúrọ̀
olùgbàlà gbóhùn mi ko ṣì gbọ́ràn
Ọlọ́run ọ̀rọ̀ rẹ̀ mo figbàgbọ́ rísun
ìṣẹ́gun ni jà re wò re pòrurù
ìyanu mi ba ti jẹ ní gbèsè
gbórí ọ̀rọ̀ rẹ̀ mo figbàgbọ́ rísun
ayọ̀ ńbọ̀ fún mi titi náà ló
ìfẹ́ rẹ̀ ju t’ìyekan lọ sógo
ìfẹ́ ọkàn kò sì ní tán wa
olúwa mi sí ńké pé o ró
ọ̀rẹ́ ayé nkọ̀ wá sílẹ̀ ní
ọ̀rẹ́ òtítọ́ ayé nkọ̀ wá sílẹ̀ ní’

The model was trained on a very small dataset, the performance was good because meaningful words in context were generated for most of the lyrics. It can be improved by increasing the data size, trying out other algorithms and maybe I will find time to train a character based on the same dataset and compare the performance.

This is a great web tool to put intonation marks on your Yorùbá text here
There are quite a number of Yorùbá hymns here

--

--