I’ve decided how to group the different data points based on the actor’s present in the protest and now need to divide the datasets. Normally I’d do this through fancy indexing, but the actor columns I’m using for grouping contain string data which doesn’t interact nicely with fancy indexing. Due to this, I’ll need to run through the datasets manually with loops looking for substrings within each actor entry. This will take some time but ultimately lead to multiple curated data frames for analysis.
APRIL 4TH: GROUPING PROTESTERS
It seems like a sentiment analysis on the notes for each event is a dead end. While it’s technically possible, the notes lack many words that carry strong positive or negative connotations. Upon further inspection, the notes column doesn’t actually contain articles about the protests, but rather facts about it organized into a paragraph.
Because of this I’ve decided to switch my focus over to grouping the types of people and organizations associated with protests. Once I have this, there are many different relationships I can look into such as who participates in the most violent protests or which group amasses the largest number of total protesters.