Health Related Headlines Datasets for Natural Language Processing(NLP)
I have always wanted to work with more Datasets that are related to health and useful for Natural Language Processing. Then it occurred to me that I could scrape the web for Headlines and Teasers of News Articles and Titles of Journals (Maybe do something like the popular Reuters News Dataset but the categories will be related to health). I always looked forward to giving back to the Web too for all the things I have learnt by just searching.
It definitely took a lot of hours to make the data tidy but I ‘low key’ enjoy Data wrangling.
I attempted to present the 39,387 rows main Dataset not just as a whole but in chunks and different file formats, so users can experiment according to their need. I hope that people can dive in, do some Topic Modelling, Sentiment Analysis,Data Classification, Sequence Prediction,Data Preprocessing,Find trends ,maybe some Data Visualization and many tasks that probably will not cross my mind.
I attached samples on different ways to load the files to notebook below
The files are hosted on my github account:
https://github.com/WuraolaOyewusi/Health-Related-Headlines-Datasets-for-Natural-Language-Processing