I have a data dataset composed as follows:
- ts: the date on which the news was published;
- body: the text of the news;
- stock: ticker of the action to which the news refers (e.g. TWTR: Twitter);
- positive: integer> = 0. Indicates a count of "positive" words, from a financial point of view, found in the news;
- negative: integer> = 0. Indicates a count of "negative" words, from a financial point of view, found in the news.
In particular I have to carry out:
1) Exploratory data analysis
2) Data analysis techniques which means:
◼ Association rules
◼ Clustering = Perform multiple analysis sessions with one or more algorithms (e.g., KMeans,
DBSCAN) + Evaluate the various expert quality indexes (e.g., SSE).
Do you have any suggestions on where to start and how should I move?
Thanks so much!!!