Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Problem with R-script results"
Hi everyone,
I used the R extension before and always managed to get the data out using "outdata <-as.data.frame(Results)"
I'm currently working on a script that download twitter followers using the twitter API, everything work fine and I managed to create the results data-frame but Rapidminer doesn't like it and spit out this error:
"In order to import an R Data Frame as example set the data frame must provide attribute names"
I'm confused because when I check my result variable with colnames(outdata), the column names are assigned as you can see here:
colnames(Results)
[1] "id_str" "name" "screen_name"
[4] "url" "profile_image_url" "description"
[7] "location" "followers_count" "friends_count"
[10] "statuses_count" "created_at"
Anyone else had this problem before?
Here's the process and the script:
I used the R extension before and always managed to get the data out using "outdata <-as.data.frame(Results)"
I'm currently working on a script that download twitter followers using the twitter API, everything work fine and I managed to create the results data-frame but Rapidminer doesn't like it and spit out this error:
"In order to import an R Data Frame as example set the data frame must provide attribute names"
I'm confused because when I check my result variable with colnames(outdata), the column names are assigned as you can see here:
colnames(Results)
[1] "id_str" "name" "screen_name"
[4] "url" "profile_image_url" "description"
[7] "location" "followers_count" "friends_count"
[10] "statuses_count" "created_at"
Anyone else had this problem before?
Here's the process and the script:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="30">
<list key="attribute_values">
<parameter key="target" value=""WRADARltd""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="r:execute_script_r" compatibility="5.3.000" expanded="true" height="76" name="Execute Script (R)" width="90" x="179" y="30">
<parameter key="script" value="#Install libraries install.packages("httr"); install.packages("rjson"); install.packages("data.table"); library(httr); library(rjson); library(data.table); #Declare API keys and inputs consumer_key = "UansBjFHOm8fpkB4jx5mSiDTu"; consumer_secret = "dKTbQ0QKCqHaejnLyHOW0PHuOzpn5PIhjVmOkGMzcyLuaFrA7p"; target = indata$target; #Auth secret <- RCurl::base64(paste(consumer_key, consumer_secret, sep = ":")); req <- POST("https://api.twitter.com/oauth2/token", config(httpheader = c( "Authorization" = paste("Basic", secret), "Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8" )), body = "grant_type=client_credentials", encode = "multipart" ); #Extract the access token token <- paste("Bearer", content(req)$access_token) #Request Followers count url <- paste("https://api.twitter.com/1.1/users/lookup.json?screen_name=",target,sep=""); req <- GET(url, config(httpheader = c("Authorization" = token))); json <- content(req, as = "text"); tmp <- fromJSON(json); num_followers <- tmp[[1]]$followers_count; #If more than 5000 followers split ID request otherwise retrieve all in one request if (num_followers<5000) { 	#Request <5000 IDs 	url <- paste("https://api.twitter.com/1.1/followers/ids.json?screen_name=",target,sep=""); 	req <- GET(url, config(httpheader = c("Authorization" = token))); 	json <- content(req, as = "text"); 	IDs <- fromJSON(json); 	IDlist <- data.frame(IDs$ids); 	 	} else { 	 	#Request batches of IDs 	batches <- ceiling(num_followers/5000); 	batch_size <- ceiling(num_followers/batches); 	curs <- "-1"; 	IDlist <- data.frame(); 	for (i in 1:batches) { 		url <- paste("https://api.twitter.com/1.1/followers/ids.json?cursor=",curs,"&screen_name=",target,"&count=",batch_size,sep=""); 		req <- GET(url, config(httpheader = c("Authorization" = token))); 		json <- content(req, as = "text"); 		IDs <- fromJSON(json); 		tmp <- data.frame(IDs$ids); 		IDlist <- rbind(IDlist, tmp); 		curs <- IDs$next_cursor_str; 		Sys.sleep(60); 		}; 	}; #Loop through IDs in batches to retrieve full profiles batches <- ceiling(num_followers/100); batch_size <- ceiling(num_followers/batches); BatchIDcounter <- 0; FollowersList <- data.frame(); for (i in 1:batches) { 	start <- BatchIDcounter; 	end <- BatchIDcounter+batch_size; 	if (end>length(IDlist[,1])) { 		end <-length(IDlist[,1]); 		batch_size <- (end-start); 	}; 	BatchIDs <- paste(IDlist[start:end,1],collapse=","); 	url <- paste("https://api.twitter.com/1.1/users/lookup.json?include_entities=false&user_id=",BatchIDs,sep=""); 	req <- GET(url, config(httpheader = c("Authorization" = token))); 	json <- content(req, as = "text"); 	profiles <- fromJSON(json); 	for (j in 1:batch_size) { 		tmp <- data.table(id_str="",name="",screen_name="",url="",profile_image_url="",description="",location="",followers_count="",friends_count="",statuses_count="",created_at=""); 		if (length(profiles[]$id_str)==0) {tmp$id_str=""} else{tmp$id_str=head(profiles[]$id_str[1])}; 		if (length(profiles[]$name)==0) {tmp$name=""} else{tmp$name=head(profiles[]$name[1])}; 		if (length(profiles[]$screen_name)==0) {tmp$screen_name=""} else{tmp$screen_name=head(profiles[]$screen_name[1])}; 		if (length(profiles[]$url)==0) {tmp$url=""} else{tmp$url=head(profiles[]$url[1])}; 		if (length(profiles[]$profile_image_url)==0) {tmp$profile_image_url=""} else{tmp$profile_image_url=head(profiles[]$profile_image_url[1])}; 		if (length(profiles[]$description)==0) {tmp$description=""} else{tmp$description=head(profiles[]$description[1])}; 		if (length(profiles[]$location)==0) {tmp$location=""} else{tmp$location=head(profiles[]$location[1])}; 		if (length(profiles[]$url)==0) {tmp$url=""} else{tmp$url=head(profiles[]$url[1])}; 		if (length(profiles[]$followers_count)==0) {tmp$followers_count=""} else{tmp$followers_count=head(profiles[]$followers_count[1])}; 		if (length(profiles[]$friends_count)==0) {tmp$friends_count=""} else{tmp$friends_count=head(profiles[]$friends_count[1])}; 		if (length(profiles[]$statuses_count)==0) {tmp$statuses_count=""} else{tmp$statuses_count=head(profiles[]$statuses_count[1])}; 		if (length(profiles[]$created_at)==0) {tmp$created_at=""} else{tmp$created_at=head(profiles[]$created_at[1])}; 		FollowersList <- rbind(FollowersList, tmp); 	}; 	BatchIDcounter <- end; 	Sys.sleep(5); }; #Output outdata <- as.data.frame(FollowersList); "/>
<enumeration key="inputs">
<parameter key="name_of_variable" value="indata"/>
</enumeration>
<list key="results">
<parameter key="outdata" value="Data Table"/>
</list>
</operator>
<operator activated="true" breakpoints="before" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="447" y="30">
<parameter key="excel_file" value="C:\Documents and Settings\menghii1\Desktop\first.xlsx"/>
<parameter key="file_format" value="xlsx"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Execute Script (R)" to_port="input 1"/>
<connect from_op="Execute Script (R)" from_port="output 1" to_op="Write Excel" to_port="input"/>
<connect from_op="Write Excel" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
Here the log details:
I see that the same problem appeared in another post http://rapid-i.com/rapidforum/index.php/topic,7128.msg24720.html#msg24720.
"tennenrishin" reported that this issue only happened when NA values were present however I tested that and in my case it happens even if all the values are present.
I even tried to explicitly declare every column of the DataFrame when creating it but no luck...
This was actually one of the main reasons why I stopped using RM and moved to do most of my work directly in R... But that's a real pity as there are many things that I like about RM.
Any possible solution?
I had the same problem. What I figured is when you put data from R to Rapidminer you must assure that all character data is a factor. Example
If you have a input variable name as "input" and you want to create a column "newcolumn" with the value "newvalue" may you type
input$newcolumn<-"newvalue"
But this expression will give you a error.
However, if you try
input$newcolumn<-as.factor("newvalue" )
Rapidminer will recognize the new attribute
https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_r_scripting
Thanks