The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
create FP-Growth graph
TobiasNehrig
Member Posts: 41 Maven
Hi Experts,
I've a questions about creating a graph form the results of the FP-Growth operator without using the Create Association Rules operator. Is there a way to visualize the FP-Growth results in a graph?
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="Crawler" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="Crawler Spon" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="web:crawl_web_modern" compatibility="7.3.000" expanded="true" height="68" name="Crawl Web" width="90" x="112" y="34">
<parameter key="url" value="http://www.spiegel.de"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+www.spiegel.+"/>
<parameter key="follow_link_with_matching_url" value=".+spiegel.+|.+de.+"/>
</list>
<parameter key="max_crawl_depth" value="10"/>
<parameter key="retrieve_as_html" value="true"/>
<parameter key="add_content_as_attribute" value="true"/>
<parameter key="max_pages" value="2000"/>
<parameter key="max_page_size" value="100000"/>
<parameter key="delay" value="100"/>
<parameter key="max_concurrent_connections" value="200"/>
<parameter key="max_connections_per_host" value="100"/>
<parameter key="user_agent" value="Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0"/>
</operator>
<operator activated="true" class="free_memory" compatibility="8.2.001" expanded="true" height="82" name="Free Memory (8)" width="90" x="246" y="34"/>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="Get Pages" width="90" x="447" y="34">
<parameter key="link_attribute" value="Link"/>
<parameter key="page_attribute" value="link"/>
<parameter key="random_user_agent" value="true"/>
</operator>
<operator activated="true" class="free_memory" compatibility="8.2.001" expanded="true" height="82" name="Free Memory (7)" width="90" x="648" y="34"/>
<connect from_op="Crawl Web" from_port="example set" to_op="Free Memory (8)" to_port="through 1"/>
<connect from_op="Free Memory (8)" from_port="through 1" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_op="Free Memory (7)" to_port="through 1"/>
<connect from_op="Free Memory (7)" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (3)" width="90" x="246" y="34">
<parameter key="create_word_vector" value="false"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="web:extract_html_text_content" compatibility="7.3.000" expanded="true" height="68" name="Extract Content" width="90" x="179" y="34">
<parameter key="ignore_non_html_tags" value="false"/>
</operator>
<operator activated="true" class="free_memory" compatibility="8.2.001" expanded="true" height="82" name="Free Memory (9)" width="90" x="380" y="34"/>
<connect from_port="document" to_op="Extract Content" to_port="document"/>
<connect from_op="Extract Content" from_port="document" to_op="Free Memory (9)" to_port="through 1"/>
<connect from_op="Free Memory (9)" from_port="through 1" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store" width="90" x="514" y="34">
<parameter key="repository_entry" value="../spon-seiten roh"/>
</operator>
<connect from_op="Crawler Spon" from_port="out 1" to_op="Process Documents from Data (3)" to_port="example set"/>
<connect from_op="Process Documents from Data (3)" from_port="example set" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="124" name="Prepare Data" width="90" x="246" y="34">
<process expanded="true">
<operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role (2)" width="90" x="45" y="34">
<parameter key="attribute_name" value="text"/>
<list key="set_additional_roles">
<parameter key="Title" value="label"/>
</list>
</operator>
<operator activated="true" class="generate_id" compatibility="8.2.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
<operator activated="true" class="order_attributes" compatibility="8.2.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="380" y="34">
<parameter key="attribute_ordering" value="Title|text"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="text|Title|id"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="715" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="Title.is_not_missing."/>
</list>
<parameter key="filters_logic_and" value="false"/>
<parameter key="filters_check_metadata" value="false"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="124" name="Multiply uncut" width="90" x="849" y="34"/>
<connect from_port="in 1" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
<connect from_op="Reorder Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Multiply uncut" to_port="input"/>
<connect from_op="Multiply uncut" from_port="output 1" to_port="out 1"/>
<connect from_op="Multiply uncut" from_port="output 2" to_port="out 2"/>
<connect from_op="Multiply uncut" from_port="output 3" to_port="out 3"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="fp Growth" width="90" x="514" y="34">
<process expanded="true">
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
<parameter key="prune_method" value="percentual"/>
<parameter key="prune_below_percent" value="30.0"/>
<parameter key="prune_above_percent" value="100.0"/>
<parameter key="prune_below_absolute" value="20"/>
<parameter key="prune_above_absolute" value="2000"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (4)" width="90" x="45" y="34"/>
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (4)" width="90" x="179" y="34">
<parameter key="mode" value="linguistic sentences"/>
<parameter key="language" value="German"/>
</operator>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="313" y="34">
<parameter key="min_chars" value="2"/>
</operator>
<operator activated="true" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (4)" width="90" x="447" y="34"/>
<operator activated="true" class="text:stem_snowball" compatibility="8.1.000" expanded="true" height="68" name="Stem (Snowball)" width="90" x="581" y="34">
<parameter key="language" value="German"/>
</operator>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (4)" width="90" x="715" y="34"/>
<operator activated="true" class="free_memory" compatibility="8.2.001" expanded="true" height="82" name="Free Memory (4)" width="90" x="849" y="34"/>
<connect from_port="document" to_op="Tokenize Non-letters (4)" to_port="document"/>
<connect from_op="Tokenize Non-letters (4)" from_port="document" to_op="Tokenize Linguistic (4)" to_port="document"/>
<connect from_op="Tokenize Linguistic (4)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (4)" to_port="document"/>
<connect from_op="Filter Stopwords (4)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
<connect from_op="Stem (Snowball)" from_port="document" to_op="Transform Cases (4)" to_port="document"/>
<connect from_op="Transform Cases (4)" from_port="document" to_op="Free Memory (4)" to_port="through 1"/>
<connect from_op="Free Memory (4)" from_port="through 1" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="Co-occurrence" width="90" x="514" y="34">
<process expanded="true">
<operator activated="true" class="text_to_nominal" compatibility="8.2.001" expanded="true" height="82" name="Text to Nominal" width="90" x="45" y="34"/>
<operator activated="true" class="numerical_to_binominal" compatibility="8.2.001" expanded="true" height="82" name="Numerical to Binominal" width="90" x="179" y="34"/>
<operator activated="true" class="concurrency:fp_growth" compatibility="8.2.001" expanded="true" height="82" name="FP-Growth" width="90" x="380" y="34">
<parameter key="positive_value" value="true"/>
<parameter key="min_support" value="0.5"/>
<parameter key="min_frequency" value="2"/>
<parameter key="find_min_number_of_itemsets" value="false"/>
<enumeration key="must_contain_list"/>
</operator>
<connect from_port="in 1" to_op="Text to Nominal" to_port="example set input"/>
<connect from_op="Text to Nominal" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_port="in 1" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Co-occurrence" to_port="in 1"/>
<connect from_op="Co-occurrence" from_port="out 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Crawler" from_port="out 1" to_op="Prepare Data" to_port="in 1"/>
<connect from_op="Prepare Data" from_port="out 1" to_op="fp Growth" to_port="in 1"/>
<connect from_op="fp Growth" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
best regards
Tobias
Tagged:
0
Best Answer
-
TobiasNehrig Member Posts: 41 Maven
Hi,
i've found a solution to create a co-occurrence graph based the approach of @bhupendra_patil. After writing the FP-Growth result in a XML-File, I had to read the XML-File two times and create a new ExampleSet.
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="advanced_file_connectors:read_xml" compatibility="8.1.000" expanded="true" height="68" name="Read XML Name Items" width="90" x="45" y="85">
<parameter key="file" value="/home/knecht/output_fp-growth.ioo"/>
<parameter key="xpath_for_examples" value="//object-stream/FrequentItemSets/FrequentItemSets/default/frequentSets/com.rapidminer.operator.learner.associations.FrequentItemSet"/>
<enumeration key="xpaths_for_attributes">
<parameter key="xpath_for_attribute" value="items[1]/com.rapidminer.extension.concurrency.operator.learner.associations.fpgrowth.NominalItem[1]/name[1]/text()"/>
<parameter key="xpath_for_attribute" value="items[1]/com.rapidminer.extension.concurrency.operator.learner.associations.fpgrowth.NominalItem[1]/attribute::id"/>
</enumeration>
<list key="namespaces"/>
<parameter key="use_default_namespace" value="false"/>
<list key="annotations"/>
<parameter key="locale" value="German"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="word.true.text.attribute"/>
<parameter key="1" value="word_id.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="advanced_file_connectors:read_xml" compatibility="9.0.000" expanded="true" height="68" name="Read XML Count Items" width="90" x="45" y="187">
<parameter key="file" value="/home/knecht/output_fp-growth.ioo"/>
<parameter key="xpath_for_examples" value="//object-stream/FrequentItemSets/FrequentItemSets/default/frequentSets/com.rapidminer.operator.learner.associations.FrequentItemSet[count(items/*)=2]"/>
<enumeration key="xpaths_for_attributes">
<parameter key="xpath_for_attribute" value="items[1]/com.rapidminer.extension.concurrency.operator.learner.associations.fpgrowth.NominalItem[1]/attribute::reference"/>
<parameter key="xpath_for_attribute" value="items[1]/com.rapidminer.extension.concurrency.operator.learner.associations.fpgrowth.NominalItem[2]/attribute::reference"/>
<parameter key="xpath_for_attribute" value="frequency[1]/text()"/>
</enumeration>
<list key="namespaces"/>
<parameter key="use_default_namespace" value="false"/>
<list key="annotations"/>
<parameter key="locale" value="German"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="word_id1.true.integer.attribute"/>
<parameter key="1" value="word_id2.true.integer.attribute"/>
<parameter key="2" value="frequency.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="multiply" compatibility="9.0.000" expanded="true" height="103" name="Multiply (2)" width="90" x="179" y="85"/>
<operator activated="true" class="concurrency:join" compatibility="9.0.000" expanded="true" height="82" name="Join" width="90" x="313" y="187">
<parameter key="join_type" value="right"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="word_id" value="word_id1"/>
</list>
</operator>
<operator activated="true" class="rename" compatibility="9.0.000" expanded="true" height="82" name="Rename Word1" width="90" x="447" y="187">
<parameter key="old_name" value="word"/>
<parameter key="new_name" value="word1"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="concurrency:join" compatibility="9.0.000" expanded="true" height="82" name="Join (2)" width="90" x="581" y="85">
<parameter key="join_type" value="right"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="word_id" value="word_id2"/>
</list>
</operator>
<operator activated="true" class="rename" compatibility="9.0.000" expanded="true" height="82" name="Rename Word2" width="90" x="715" y="85">
<parameter key="old_name" value="word"/>
<parameter key="new_name" value="word2"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Graph" width="90" x="916" y="85">
<parameter key="script" value="library(dplyr) library(tidytext) library(widyr) library(ggplot2) library(igraph) library(ggraph) rm_main = function(data) { table <- data_frame(Item1 = data$word1, Item2 = data$word2, Frequency = data$frequency) table <- as.data.frame(table) set.seed(2018) cooccurre_graph <- table %>% filter (Frequency==851)%>% #filter (Frequency>=500)%>% graph_from_data_frame() %>% ggraph(layout ="lgl")+ geom_edge_link()+ geom_node_point(size = 3) + geom_node_text(aes(label=name), repel = TRUE, point.padding = unit(0.2, "lines")) + theme_void() png("//home//knecht//cooccurrence_graph.png") plot(cooccurre_graph, width = 1600, height=900) dev.off() table <- as.data.frame(table) return(list(table, cooccurre_graph)) } "/>
</operator>
<connect from_op="Read XML Name Items" from_port="output" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Read XML Count Items" from_port="output" to_op="Join" to_port="right"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Join (2)" to_port="left"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_op="Rename Word1" to_port="example set input"/>
<connect from_op="Rename Word1" from_port="example set output" to_op="Join (2)" to_port="right"/>
<connect from_op="Join (2)" from_port="join" to_op="Rename Word2" to_port="example set input"/>
<connect from_op="Rename Word2" from_port="example set output" to_op="Graph" to_port="input 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>Tobais
1
Answers
Not that I know of, but I would be interested if any other community members know a way to do this!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi @Telcontar120,
I found this post "Writing Association Rules to Exampleset or file" from @bhupendra_patil and I've tried to implement this in my process. But writing the FP-Growth result in a XML file blows nearly my RAM (32 GB) and creates a 8GB file. The mentioned Read XML Operation blows finally my RAM and the Process terminates.
I'm curious as to which version of RM Studio you are using. 8.1 and below has the old versions of FP growth and frequent item sets. You might have to update to 8.2 to get a performance bump.
Hi @JeffChowaniec,
I'm using RapidMiner 8.2.001
I tried running your process and I found that the web crawl runs for 25+ mins and I wasn't able to finish the process because I need my machine for some other tasks. I have a 32gb machine and I could see it getting taxed pretty hard at some points. Have you tried it with a data set that is a fraction of what you are trying to query? The idea is we want to make sure that even a small data set in this case will run and not take up the available memory before we dedicate a 1 hr+ run time to this.
I haven't tried to crawl less pages because once I crawled stored it in the repository. This file is to huge to upload it here. Instead I here is a repository file after the Numerical to Binominal Operator as input data for FP-growth.
Hi,
I think I've found my problem with the memory. I had to cut the FP-Growth parameter max items per itemset from 0 to 2. Now I struggle with filling the ExampleSet from the XML file "Writing Association Rules to Exampleset or file". In this example Data import wizard fullfills in Step 4 automaticly the column current value. That doesn't happens in my approach and I don't know why.
Hi,
it's me again.
I'm trying to sort out how it might be possible add the Item names in @bhupendra_patil approach Writing-Association-Rules-to-Exampleset-or-file. The approach FP-Growth runs and I see all Columns more or less filed but if I'm using instead the new FP-Growth the Item names are not shown. Has anyone an idea how this is is possible?
If I'm using this approach on my process, than I'll see all the numerical values but no item names.
best regards
Tobias