Options

# How to visualize data distribution of texts

Member Posts: 2 Newbie
Hi, I've a dataset of twitter comments containing only the attributes "text" and "sentiment" (negative and positive). How can I visualize the data distribution with a scatter-plot? I suppose I need some other attributes to build the scatter plot, but I don't know what to calculate. I calculated the polarity and subjectivity with textblob and vader and use the polarity attribute as value for the x assis, and the subjectivity for the y assis, but I don't know if it is correct.
Besides, Vader calculated for some negative comments a positive polarity (>0) and a negative polarity (<0) for some positive comments, so the data distribution looks non-linear and I don't know if it is correct or not.
My teacher wants me to plot the dataset so we can see whether the distribution is linear or not and understando which classification model fits this distribution.
Hope someone can help me. Thank  you.

• Options
Member Posts: 8 Contributor II
Hey,

Use the polarity and subjectivity scores as the X and Y axes, respectively, to plot the data distribution of your Twitter comments dataset. It's crucial to remember that sentiment analysis results from various libraries or methodologies may differ and may not always accurately reflect human perception.

Here is a method for making the scatter plot step-by-step:

1. Utilizing TextBlob or Vader, as you indicated, determines the polarity and subjectivity ratings for each remark in your dataset.

2. Organize the polarity and subjectivity scores, as well as the sentiment labels (positive/negative), to prepare your data.

3. Select a charting library for the programming language you're using, such as Python's matplotlib.

4. Use the polarity as the X-axis and subjectivity as the Y-axis to create a scatter plot. To distinguish between favorable and bad comments, give them various colors or indicators.

5. Place the data points on a scatter plot using subjectivity and polarity as the axes, respectively. Each data point is a comment and the polarity and subjectivity ratings it receives determine where it appears on the plot.

6. To see how the data are distributed, analyze the scatter plot. Determine if it seems linear or non-linear, and take into account the distribution of the sentiment labels throughout the plot. You may evaluate whether categorization models could be appropriate for your dataset with the aid of this study.

Keep in mind that sentiment analysis results aren't always fully consistent with how people really see things. Additionally, polarity and subjectivity ratings could not fully capture other elements influencing the emotion of Twitter remarks. As a result, it's crucial to read the scatter plot results carefully and take into account any other dataset characteristics or facets that can have an impact on the sentiment analysis.

If you need further assistance or have more specific questions, feel free to ask!

Kind Regards
Vivek Garg
React Native