The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
EXTRACTING DATA FROM UNSTRUCTURED TEXT FILE
Hi
I have a TEXT file with unstructured data
-------------------------------------------------------------------********-------------------------------------------------------
ADD UCELLMEAS:CELLID=29335, LOGICRNCID=900, INTERFREQINTERRATMEASIND=INTER_FREQ_AND_INTER_RAT, RPTIND=NO_REPORT, MAXNUMRPTCELLS=CURRENT_CELL_AND_2BEST_NEIGHBOUR, FACHMEASIND=INTER_FREQ_AND_INTER_RAT, FACHMEASOCCACYCLELENCOEF=6, RPTINDIND=REQUIRE, MAXNUMRPTCELLSIND=REQUIRE, INTRAFREQMEASIND=REQUIRE, DEFERMCREADIND=FALSE;
ADD UCELLMEAS:CELLID=29336, LOGICRNCID=900, INTERFREQINTERRATMEASIND=INTER_FREQ_AND_INTER_RAT, RPTIND=NO_REPORT, MAXNUMRPTCELLS=CURRENT_CELL_AND_2BEST_NEIGHBOUR, FACHMEASIND=INTER_FREQ_AND_INTER_RAT, FACHMEASOCCACYCLELENCOEF=6, RPTINDIND=REQUIRE, MAXNUMRPTCELLSIND=REQUIRE, INTRAFREQMEASIND=REQUIRE, DEFERMCREADIND=FALSE;
ADD UCHPWROFFSET:CELLID=21711, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
ADD UCHPWROFFSET:CELLID=34051, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
ADD UCHPWROFFSET:CELLID=34052, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
ADD UCHPWROFFSET:CELLID=34053, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
-------------------------------------------------------------------********-------------------------------------------------------
I am only interested in extracting rows that contain "UCHPWROFFSET" and would like to convert it to the below format (Tabular)
CELLID LOGICRNCID AICHPOWEROFFSET PICHPOWEROFFSET
21711 900 -6 -7
34051 900 -6 -7
34052 900 -6 -7
34053 900 -6 -7
Any idea how it can be done using Operators within RAPIDMINER
Floyd
I have a TEXT file with unstructured data
-------------------------------------------------------------------********-------------------------------------------------------
ADD UCELLMEAS:CELLID=29335, LOGICRNCID=900, INTERFREQINTERRATMEASIND=INTER_FREQ_AND_INTER_RAT, RPTIND=NO_REPORT, MAXNUMRPTCELLS=CURRENT_CELL_AND_2BEST_NEIGHBOUR, FACHMEASIND=INTER_FREQ_AND_INTER_RAT, FACHMEASOCCACYCLELENCOEF=6, RPTINDIND=REQUIRE, MAXNUMRPTCELLSIND=REQUIRE, INTRAFREQMEASIND=REQUIRE, DEFERMCREADIND=FALSE;
ADD UCELLMEAS:CELLID=29336, LOGICRNCID=900, INTERFREQINTERRATMEASIND=INTER_FREQ_AND_INTER_RAT, RPTIND=NO_REPORT, MAXNUMRPTCELLS=CURRENT_CELL_AND_2BEST_NEIGHBOUR, FACHMEASIND=INTER_FREQ_AND_INTER_RAT, FACHMEASOCCACYCLELENCOEF=6, RPTINDIND=REQUIRE, MAXNUMRPTCELLSIND=REQUIRE, INTRAFREQMEASIND=REQUIRE, DEFERMCREADIND=FALSE;
ADD UCHPWROFFSET:CELLID=21711, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
ADD UCHPWROFFSET:CELLID=34051, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
ADD UCHPWROFFSET:CELLID=34052, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
ADD UCHPWROFFSET:CELLID=34053, LOGICRNCID=900, AICHPOWEROFFSET=-6, PICHPOWEROFFSET=-7;
-------------------------------------------------------------------********-------------------------------------------------------
I am only interested in extracting rows that contain "UCHPWROFFSET" and would like to convert it to the below format (Tabular)
CELLID LOGICRNCID AICHPOWEROFFSET PICHPOWEROFFSET
21711 900 -6 -7
34051 900 -6 -7
34052 900 -6 -7
34053 900 -6 -7
Any idea how it can be done using Operators within RAPIDMINER
Floyd
0
Answers
I haven't the time to create the exact ones you need, but the basics would be a lookbehind for each field for example
(?<=.*UCHPWROFFSET.*CELLID\=)[0-9]* (or something like that)
For ease of demonstration here is a quick demo I knocked up, I doubt you'd want to use this in production though and will need to tweak it a bit.