Exposing "complex" JSON via RapidMiner Server
Hello, World!
I am using RapidMiner Server to expose an API that provides zones and coordinates to a custom map. Each zone can contain at least three coordinates, and the custom map creates drawings on these coordinates and performs decorations with other information. I currently have two endpoints looking like this (the rest of the info has been dropped for clarity):
Zones:
[
{ "id":1, "zone":"zone 1" },
{ "id":2, "zone":"zone 2" }
]
Coordinates:
[
{"id":1, "zone_id": 1, "x": 0, "y": 0},
{"id":2, "zone_id": 1, "x": 1, "y": 1},
{"id":3, "zone_id": 1, "x": 2, "y": 2},
{"id":4, "zone_id": 2, "x": 10, "y": 10},
{"id":5, "zone_id": 2, "x": 10, "y": 20},
{"id":6, "zone_id": 2, "x": 20, "y": 20},
{"id":7, "zone_id": 2, "x": 20, "y": 10}
]
How feasible is to end up with this as a JSON output?
[
{"id":1,"zone":"zone 1","coords":[{"x":0,"y":0},{"x":1,"y":1},{"x":2,"y":2}]},
{"id":2,"zone":"zone 2","coords":[{"x":10,"y":10},{"x":10,"y":20},{"x":20,"y":20},{"x":20,"y":10}]}
]
The idea is to just embed the coordinates inside the result, as an Array, and expose it. Currently I expose this as two separate API endpoints, but the designer has to build the object properly using JavaScript on the client side (let alone creating two requests to the RapidMiner Server, which is less than ideal), and that scenario is far from ideal.
I don't have XML to provide this time, but imagine two simple "Retrieve" operators connected to the results window.
I would appreciate your help, Thanks in advance!
Best Answer
-
kayman Member Posts: 662 Unicorn
Not the most elegant way, but using concatenations, aggregations and some replacements gives you the right results.
Based on your example the below works (order is a bit different but in essence JSON doesn't care about that, and it is easy to fix if needed)
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="103" name="JSON 2 data" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
<parameter key="text" value="[{ "id":1, "zone":"zone 1" },{ "id":2, "zone":"zone 2" }]"/>
</operator>
<operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document (2)" width="90" x="179" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="row" value="$."/>
</list>
<process expanded="true">
<connect from_port="segment" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:json_to_data" compatibility="8.1.000" expanded="true" height="82" name="JSON To Data (3)" width="90" x="313" y="34"/>
<operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (2)" width="90" x="45" y="136">
<parameter key="text" value="[ {"id":1, "zone_id": 1, "x": 0, "y": 0}, {"id":2, "zone_id": 1, "x": 1, "y": 1}, {"id":3, "zone_id": 1, "x": 2, "y": 2}, {"id":4, "zone_id": 2, "x": 10, "y": 10}, {"id":5, "zone_id": 2, "x": 10, "y": 20}, {"id":6, "zone_id": 2, "x": 20, "y": 20}, {"id":7, "zone_id": 2, "x": 20, "y": 10} ]"/>
</operator>
<operator activated="true" class="text:replace_tokens" compatibility="8.1.000" expanded="true" height="68" name="Replace Tokens" width="90" x="179" y="136">
<list key="replace_dictionary">
<parameter key="\n" value=" "/>
</list>
</operator>
<operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="313" y="136">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="row" value="$."/>
</list>
<process expanded="true">
<connect from_port="segment" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:json_to_data" compatibility="8.1.000" expanded="true" height="82" name="JSON To Data (2)" width="90" x="447" y="136"/>
<connect from_op="Create Document" from_port="output" to_op="Cut Document (2)" to_port="document"/>
<connect from_op="Cut Document (2)" from_port="documents" to_op="JSON To Data (3)" to_port="documents 1"/>
<connect from_op="JSON To Data (3)" from_port="example set" to_port="out 1"/>
<connect from_op="Create Document (2)" from_port="output" to_op="Replace Tokens" to_port="document"/>
<connect from_op="Replace Tokens" from_port="document" to_op="Cut Document" to_port="document"/>
<connect from_op="Cut Document" from_port="documents" to_op="JSON To Data (2)" to_port="documents 1"/>
<connect from_op="JSON To Data (2)" from_port="example set" to_port="out 2"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="313" y="34">
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="id" value="zone_id"/>
</list>
</operator>
<operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="82" name="Subprocess" width="90" x="447" y="34">
<process expanded="true">
<operator activated="true" class="numerical_to_polynominal" compatibility="8.2.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="45" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="x|y"/>
</operator>
<operator activated="true" class="replace" compatibility="8.2.000" expanded="true" height="82" name="Replace" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="x"/>
<parameter key="replace_what" value="^(.*)$"/>
<parameter key="replace_by" value="x:$1"/>
</operator>
<operator activated="true" class="replace" compatibility="8.2.000" expanded="true" height="82" name="Replace (2)" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="y"/>
<parameter key="replace_what" value="^(.*)$"/>
<parameter key="replace_by" value="y:$1"/>
</operator>
<operator activated="true" class="generate_concatenation" compatibility="8.2.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="447" y="34">
<parameter key="first_attribute" value="x"/>
<parameter key="second_attribute" value="y"/>
<parameter key="separator" value=","/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="x|y"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="rename" compatibility="8.2.000" expanded="true" height="82" name="Rename" width="90" x="715" y="34">
<parameter key="old_name" value="x,y"/>
<parameter key="new_name" value="coords"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="849" y="34">
<list key="aggregation_attributes">
<parameter key="coords" value="concatenation"/>
</list>
<parameter key="group_by_attributes" value="zone|id"/>
<parameter key="ignore_missings" value="false"/>
</operator>
<operator activated="true" class="rename" compatibility="8.2.000" expanded="true" height="82" name="Rename (2)" width="90" x="976" y="34">
<parameter key="old_name" value="concat(coords)"/>
<parameter key="new_name" value="coords"/>
<list key="rename_additional_attributes"/>
</operator>
<connect from_port="in 1" to_op="Numerical to Polynominal" to_port="example set input"/>
<connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
<connect from_op="Replace (2)" from_port="example set output" to_op="Generate Concatenation" to_port="example set input"/>
<connect from_op="Generate Concatenation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
<connect from_op="Rename (2)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:data_to_json" compatibility="8.1.000" expanded="true" height="82" name="Data To JSON" width="90" x="581" y="34"/>
<operator activated="true" class="loop_collection" compatibility="8.2.000" expanded="true" height="82" name="Loop Collection" width="90" x="715" y="34">
<process expanded="true">
<operator activated="true" class="text:replace_tokens" compatibility="8.1.000" expanded="true" height="68" name="Replace Tokens (2)" width="90" x="179" y="34">
<list key="replace_dictionary">
<parameter key=""coords":"(.*?)"" value=""coords":[{$1}]"/>
<parameter key="\|" value="},{"/>
<parameter key="x:" value=""x":"/>
<parameter key="y:" value=""y":"/>
</list>
</operator>
<connect from_port="single" to_op="Replace Tokens (2)" to_port="document"/>
<connect from_op="Replace Tokens (2)" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="JSON 2 data" from_port="out 1" to_op="Join" to_port="left"/>
<connect from_op="JSON 2 data" from_port="out 2" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Subprocess" to_port="in 1"/>
<connect from_op="Subprocess" from_port="out 1" to_op="Data To JSON" to_port="example set 1"/>
<connect from_op="Data To JSON" from_port="documents" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>How does it work :
First we get the JSON imports and convert them to proper datasets, next we concatenate x and y field while adding the variable names to the data itself. Then we aggragate all of the coordinates and produce a first json file. This does not contain the final data yet, but enough 'marks' that can be used to construct the remaining JSON logic with some basic find / replace logic.
As stated, not really high level and far from optimal, but it does the trick.
Good luck with it!
2
Answers
Hello, World!
I have new data. This is the farthest I could go with my limited knowledge on how to operate with JSON. I still can't enclose the "Registration Subtypes" into an enclosing string or something that can help me achieve the desired results.
Is there any kind of trick I can use to actually form the desired JSON file? I'm still researching my options, as long as these don't include using an external thing to build JSON
All the best,
As we've discussed in other threads, RapidMiner's ability to deal with complex JSON formats is somewhat limited. I am not aware of any way for you to get RapidMiner to create the JSON file with the nested array structure that you want. But I would love to see if any of the other experts in the community have a solution for this!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi Rodrigo,
can you share the json files or some process/script that generates the data?
It looks like the solution is a join on zone_id, but I am surely missing the point somewhere.
Regards,
Sebastian
Hi Rodrigo,
I battled this for about 45 min and could not get it. RapidMiner just does not have good JSON tools right now. Perhaps there are some good Python libraries that will do this trick for you?
Scott
Hi @SGolbert, @sgenzer and @Telcontar120
Here is the result of my research:
@SGolbert: with joining, if you have an example set with X categories and another with Y elements per category, you will end up with (XY) example sets. What I am looking for is something that help me including an example set as a property inside another example set, and the inverse operations that I would call mapping and flattening. With that, you end up with X categories and an array (or hash) inside each category.
After reviewing tons of comments from the community (whoa, it is indeed massive!), googling around, and sitting down at the RapidMiner code (because I like my source like I like my mind: open! #noblackboxes to the core!) to see if that's feasible to be implemented, it seems that the tabular nature of a standard example set can't cope with the structural nature of JSON.
In the end, it is not (or it shouldn't be, at least) a massive amount of work to create an include operator that does what I want, but I wonder if it is practical to implement other things, such as pre-flattening the object before training a model or applying it, and whether it should be considered (Creating such an object might help implementing SOAP Web Services from RapidMiner Server, which is a desired feature in some large companies and something that can take advantage from the real time scoring system, for example).
Thanks to everyone for their answers!
P.S.- @sgenzer, does it sound too mind-boggling to take this idea as a feature, a new plugin, or an idea for the RapidMiner Wisdom hackathon?
All the best,
@rfuentealba put it this way - if the product team hears me yell "APIs" and "JSON" one more time, they will strangle me.
more seriously, our friend @Telcontar120 has an idea open for voting that you are welcome to vote and comment on. https://community.rapidminer.com/t5/Product-Ideas/JSON-file-rotation/idi-p/49638
Scott
Hi @kayman,
This looks awesome! Thank you. I haven't tried it extensively for my use case, but saw some things that might help me building the JSON extension I'm writing. Thanks again!
All the best,
Rodrigo.