The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Parsing XML"
I've been experimenting with the REST API from LastFM. My query to the API asks for artists similar to Bono.
Here's the XML file that the query generates:
http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&;artist=bono&api_key=b25b959554ed76058ac220b7b2e0a026
I'm trying to parse the XML file and generate output that provides "artist" and "match" for each of the 100 entries in the XML file. The current output generates 200 rows containing the URL I'm querying, the full contents of the page, and the name of the attributes I setup with XPATH queries. The output I want to see is a different artist name and associated match number on each row. Any advice on how to achieve this is greatly appreciated.
Thanks,
Jamie
This is what I want to see in the Data View:
This is what I currently see in the Data View:
Here's my process:
Here's the XML file that the query generates:
http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&;artist=bono&api_key=b25b959554ed76058ac220b7b2e0a026
I'm trying to parse the XML file and generate output that provides "artist" and "match" for each of the 100 entries in the XML file. The current output generates 200 rows containing the URL I'm querying, the full contents of the page, and the name of the attributes I setup with XPATH queries. The output I want to see is a different artist name and associated match number on each row. Any advice on how to achieve this is greatly appreciated.
Thanks,
Jamie
This is what I want to see in the Data View:
This is what I currently see in the Data View:
Here's my process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<process expanded="true" height="628" width="736">
<operator activated="true" class="web:process_web" compatibility="5.0.4" expanded="true" height="60" name="Process Documents from Web" width="90" x="45" y="30">
<parameter key="url" value="http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=bono&api_key=b25b959554ed76058ac220b7b2e0a026"/>
<list key="crawling_rules">
<parameter key="0" value="http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=bono&api_key=b25b959554ed76058ac220b7b2e0a026"/>
</list>
<parameter key="add_pages_as_attribute" value="true"/>
<parameter key="max_pages" value="1"/>
<process expanded="true" height="481" width="788">
<operator activated="true" class="text:cut_document" compatibility="5.0.7" expanded="true" height="60" name="Cut Document" width="90" x="70" y="46">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="name" value="/h:lfm/h:similarartists/h:artist/h:name"/>
<parameter key="match" value="/h:lfm/h:similarartists/h:artist/h:match"/>
</list>
<list key="namespaces"/>
<list key="index_queries"/>
<process expanded="true" height="463" width="702">
<connect from_port="segment" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_port="document" to_op="Cut Document" to_port="document"/>
<connect from_op="Cut Document" from_port="documents" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="write_database" compatibility="5.0.8" expanded="true" height="60" name="Write Database" width="90" x="246" y="30">
<parameter key="connection" value="AWS RDS"/>
<parameter key="table_name" value="artists"/>
<parameter key="overwrite_mode" value="append"/>
</operator>
<connect from_op="Process Documents from Web" from_port="example set" to_op="Write Database" to_port="input"/>
<connect from_op="Write Database" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
this is really advanced parsing. Normally I would not post a complete process but simply outlying the way to go, but it's a great example of what one can do with the Text Processing and Web Extension in combination. So here's this very cool process: Greetings,
Sebastian
Thanks so much for providing the complete process! This helps a lot.
Best,
Jamie