The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Reading a csv file?"
andres222es
Member Posts: 1 Learner I
I am just starting with rapidminer, it's my second day with the platform. I am managing a data file for a bus company, here in spain. I would like
to have a starting point (model), to obtain several basic stats:sales, transactions, sales by platform, unique users per year(identified with device ID), recurrence factor per month(total transactions per month/unique device IDS), median net price of ticket etc.I have 4 years worth of data.
My data comes from csv files, with ; as a separator and 50 columns as parameters(deviceID, operating system, return or one way ticket, etc).
Once i have those stats, i would like to use rapidminer modelling or predictive analysis for example to estimate the number of tickets sold next year and the total number of sales, and if we will increase in unique users.
I-ve tried using the csv process, and haven't got anything back. How do i post a minimal chunk of my data here so that someone here can give a starting point ? or if anyone can suggest how to obtain a starting template for what i want it would be helpful.
Another problem i have is that my data is logged in a 6 gb csv file, and i've managed to split the data into ten chunks, but this is a bit annoying. I think rapidminer can't manage such file? , when i try to open the file the program stops working( i have to close it, it doesn't answer)
Any help would be kindly appreciated
to have a starting point (model), to obtain several basic stats:sales, transactions, sales by platform, unique users per year(identified with device ID), recurrence factor per month(total transactions per month/unique device IDS), median net price of ticket etc.I have 4 years worth of data.
My data comes from csv files, with ; as a separator and 50 columns as parameters(deviceID, operating system, return or one way ticket, etc).
Once i have those stats, i would like to use rapidminer modelling or predictive analysis for example to estimate the number of tickets sold next year and the total number of sales, and if we will increase in unique users.
I-ve tried using the csv process, and haven't got anything back. How do i post a minimal chunk of my data here so that someone here can give a starting point ? or if anyone can suggest how to obtain a starting template for what i want it would be helpful.
Another problem i have is that my data is logged in a 6 gb csv file, and i've managed to split the data into ten chunks, but this is a bit annoying. I think rapidminer can't manage such file? , when i try to open the file the program stops working( i have to close it, it doesn't answer)
Any help would be kindly appreciated
Tagged:
0
Answers
Did you look at the Read CSV operator tutorial process? Simply reading your file in shouldn't be difficult as long as it isn't too big.
Personally when I have a project like this (with a large raw data file) I think it is easier to start by taking just the first 100 rows or so (manually copy them from the original file) plus the header. Then you can set up your entire data import and ETL process using that small file and make sure you are getting all the output you want. Then you can run the whole thing on your larger files.
Everything that you have described is easy to do in RapidMiner (outside of the memory constraints already noted). Summarizing information for different buses and routes by date will be handled by the Aggregate operators.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts