The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
COVID-19 board
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Hello all RapidMiner community members -
I have been hesitant to post a new discussion about COVID-19 as there is so much already out there, but I am sincerely concerned about the well-being of our RapidMiner family. I am also very interested to hear if anyone out there is either (a) working on any data science COVID-19 projects, and/or (b) any service projects that you may be leading/participating in that is helping COVID-19 patients or research in your local community.
So please use this discussion board to share, discuss, and support one another. I sincerely hope you are well during this very difficult time, and my deepest sympathies to those who are either ill or directly affected by friends or family that are suffering.
Scott
I have been hesitant to post a new discussion about COVID-19 as there is so much already out there, but I am sincerely concerned about the well-being of our RapidMiner family. I am also very interested to hear if anyone out there is either (a) working on any data science COVID-19 projects, and/or (b) any service projects that you may be leading/participating in that is helping COVID-19 patients or research in your local community.
So please use this discussion board to share, discuss, and support one another. I sincerely hope you are well during this very difficult time, and my deepest sympathies to those who are either ill or directly affected by friends or family that are suffering.
Scott
Tagged:
8
Comments
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/
I use to love the quote "don't waste the hours of daylight to what you can do at night". With the years, and > 1000 24h/36h shifts as anesthesiologist and emergency physician later, I try to live an ascetic life or structured as you will, where sleep, healthy food and minimizing sensorial input are essential when they are accessible. I reality the day starts between 5-6 am where I train my 2 german shepherds fully skilled as personal protection working dogs trained to be teddy bears when they are not working or training (22/24h). Not that I live in a dangerous region or I am paranoid but accomplishing difficult tasks with another intelligent creature is very satisfying. Probably a similar satisfaction as interacting with you all. Back to the current reality:
Maybe more reflections in the near future if I find some time.
Cheers Sven, Keep the spirit high and use your time to solve real world problems!
DocMusher : your post is impressive, thanks for sharing!
My wife and me both work for big companies having global production of goods for multiple customers. Both companies still handle the topic mostly manual, consolidating data from several stakeholders manually in order to roughly identify potential countermeasures, aligning with governments or re-focus strategies.
Why?
Data science needs well designed models and structures. You can perform data analysis from SAP easily, but what are the answer you will find and what are measures/actions you derive from the analysis? Countermeasures/actions in regard to this unforeseen situation still needs human intelligence, creativity and flexibility, which should be available in good management (but not mandatorily IS available ;-))...
************
I started to collaborate with @mbs (see other post: https://community.rapidminer.com/discussion/56951/huge-field-trial-regarding-global-economy-ecology-and-society#latest).
We want to analyse global data in multiple areas in order to evaluate positive and negative effects of covid19 on ecology, economy and society. As an outcome, my wish is to publish a paper with results that are beneficial for the society (e.g. key messages easy to read and having impact on daily live...)
We came to following interrim conclusion:
1. We have to wait until more data are available, minimum 150days.
--> However, some data are only available on daily basis, so we are collecting some single data separately
2. We have to ask the right questions in order to find the right answers (sounds simple, but is essential).
--> What is possible to derive from the data, what is logical?
3. It is difficult to get data from different sources.
--> I am willing to get access to statista or tradingeconomics, but it is also helpful to check other sources or find partners to deliver information such as information on consumption, surveys, newspaper articles etc. But i can't spend too much time for dilligence work...
4. And maybe the biggest challenge is to differ between "natural fluctuations of economy", direct effects of COVID19 and indirect effects, thus classical question of causality...
--> Therefore I think on having sub-model approach, thus dividing the overall topic into sub-systems with different topics. And adding also fixed relations (Y=a*X) and logical relations (if 'A' and 'B', then 'C')
Who else is willing to collaborate within this activity?
Minimum outcome is personal learning on how to deal with such kind of complex data analysis problems. Maximum outcome is a paper that might have impact to our society.
Please give PN and let's align on HOW to collaborate!
Jan
Thanks
Sven
However, ROI can be short-term and selective or long-term and holistic.
I am Scientist and I want to change long-term. Based on this single disruptive impact of COVID, I want to find understanding, clear rational and cause-effect-relation evidence EXAMPLES, how we as human being shall change our behavior in order to act sustainable for the future. Furthermore, I want to find ways, how to derive valuable information from data - the essential task of data science (knowing that there are plenty of theories already available).
For concrete action, there are plenty of tools such as this paper showing how to predict COVID spread via Kalman-Filter: https://towardsdatascience.com/using-kalman-filter-to-predict-corona-virus-spread-72d91b74cc8
Sven
Hello
Great discussion
I agree with you. Let continue it with private message.
Thank you
mbs
Very interesting and useful. If you need anything related to Data science, do let me know. I am more than happy to help with this project.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
https://www.dailymail.co.uk/sciencetech/article-8125355/US-government-talks-Facebook-Google-track-coronavirus.html
https://www.newscientist.com/article/2238136-google-may-help-uk-officials-track-coronavirus-social-distancing/
Well, this is not big data, but giant data...
This is way above our level regarding power.
But I was wondering what you can do with more data available for simple folks like us. Public available. And combining those diverse and meaningful data to new insights/findings.
Data is daily updated. We can model the rate of transmission by using neural nets build-in block and have an idea about what's going to happen soon.
Wish you all healthy days. DE.
Dortmund, Germany
Indeed nice work, would you mind to share your process here?
If someone is interested in a general epidemic calculator, this looked interesting although I lack the time to review in depth, I think @mschmitz has the background to assess its correctness from math point of view.
Cheers, stay healthy, use this period as an opportunity to value what is important.
Sven
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="open_file" compatibility="9.6.000" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="resource_type" value="URL"/>
<parameter key="filename" value="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/>
<parameter key="url" value="http://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="9.6.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="column_separators" value=","/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="store" compatibility="9.6.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">
<parameter key="repository_entry" value="../data/confirmed cases"/>
</operator>
<operator activated="true" class="open_file" compatibility="9.6.000" expanded="true" height="68" name="Open File (2)" width="90" x="45" y="136">
<parameter key="resource_type" value="URL"/>
<parameter key="filename" value="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/>
<parameter key="url" value="http://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="9.6.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="179" y="136">
<parameter key="column_separators" value=","/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="store" compatibility="9.6.000" expanded="true" height="68" name="Store (2)" width="90" x="313" y="136">
<parameter key="repository_entry" value="../data/deaths"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_port="result 1"/>
<connect from_op="Open File (2)" from_port="file" to_op="Read CSV (2)" to_port="file"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Store (2)" to_port="input"/>
<connect from_op="Store (2)" from_port="through" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="84"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Dortmund, Germany
I am just amazed and awed by this discussion. I think I speak for everyone at RapidMiner saying that you are all in our thoughts. Huge respect for @DocMusher for his amazing work and powerful message.
Scott
Thank you very much for your follow up. I am very new in the community and it is great that RM has a very powerful one.
I figured out a way with Twitter and Aylien (not supported anymore I guess) connections and Aylien Extract Sentiment Block for quick look-up the re-tweets sentiment (objective/subjective) after health minister's tweet + 2 hours.
I just want to add text processing by creating a document of the most recent 100 tweets every hour updated and see some words' vectors during this period about Covid-19.
Any suggestions for improving?
Thanks all, stay healthy.
Thank you very much for the link it is very great for the application of math models. I think I can inject them in to Generate Attribute block and create a forecasted set for each country and/or state.
Here is my design view for ETL.
You can append and run for playing with the dataset at the Results View Visualization tool. I made a filter for China and Confirmed cases then selected attributes for dates and date-related attributes, applied transpose. Output as below:
Now we can easily visualize rate like:
From the first day of the confirmed case till the last date. As w can observe there is a good resistance at some point which is correlated with the population at the state, that confirmed cases begin decelerating. We can add text data of related country's precautions by authorities, plus or minus couple days, we can measure how it worked.
It is very premature at the moment. I have another design with a neural network block to estimate the next days' confirmed cases.
Any suggestions?
Stay healthy.
Thanks 2 @mschmitz
it is excellent! thanks for sharing.
Best,
Martin
Dortmund, Germany
Unfortunately no, I don't have RM Server. I didn’t know I could reach one.
Best,
Dortmund, Germany
https://www.kungfu.ai/tracking-coronavirus-disinformation-on-twitter/
Cheers
Sven
On a similar note, I was reflecting this morning that this COVID-19 pandemic will provide data for at least 20 years of PhD theses in every field imaginable...
https://www.google.com/covid19/mobility/
https://globalnews.ca/news/6775542/google-mobility-reports-a-slippery-slope-cyber-security-expert/
FYI