Hey together,

I am rather new in the area of data analysis and have a challenge to tackle.
Maybe you guys know how to handle it.
I have a dataset with lots of polynominal and text values. Now I want to create cluster within the data, but for my understanding for example k-means needs kind of distances to create output.
Do you know how to cluster that text values?

Frame: I am a student and have got a example data set from a boat manufacturer with detected quality issues. The target is to find links between issues occured BEFORE the current production phase and the issues in the current production phase (FOCUS, column G).
The links I want to check basically on the localization tags, maybe also on material or item number level. 

Info on the dataset:
Column G and row A are inserted by me for better understanding of the dataset and my aim.

I would be cheered up, if anyone has a hint how to start.



    you do not want to treat this as a clustering problem, but as a prediction (supervised learning) problem.

    I would highly recommend to push this into AutoModel first.

