The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to Extract Numbers from Text Mining

danongdanong Member Posts: 3 Contributor I
edited June 2019 in Help
Hi,

i have tokenize and filtered out some words which left only numbers and english words,

then my problem now is i want to extract out both numbers and english words seperately and putting them in different results,

how can i achieve that?

Btw, i'm using text mining tool here, the file is in .txt format and is semi-structured.


Thanks for helping.

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    sorry, I did not get your point. Can you give us an example, best of the data before the desired transformation and what you would like to achieve?

    Cheers,
    Ingo
  • danongdanong Member Posts: 3 Contributor I
    hi, thanks for reply,

    i had solved the problem actually.

    okay i will rephrase my problem here:


    i had a text file, for example : "Bobbie goes to school today in the morning at 8 oclock with his 30 packs of noodles."

    i would like to filter out english words (bobbie, goes, to ... etc) and as well numberings (8, 30)

    but i found that the filter only allow to do one thing only, either english word or numberings,
    but does not allow for filtering both.


    i could not find other way,
    but lastly i load the file 2 times, and do filtering seperately and i got it solved.


    thanks.
  • LiZeyuanLiZeyuan Member Posts: 1 Learner II
    Hey, Mate

    I am a beginner of Rapidminer
    i am facing a similar issue that i want to extract the numerics from the text, eg:
    " the task finished at the year 2018" 
    I just need the numeric information " 2018". how to filter the words when tokenizing?

    Thanks 
    much appreciate 

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    There was similar discussion recently on the community: https://community.rapidminer.com/discussion/55230/how-to-extract-year-from-a-string
    Maybe this can give you some hints.
    Cheers,
    Ingo
  • kaymankayman Member Posts: 662 Unicorn
    You could have done it in one load also, and use the multiply operator. One port you use to filter 'number style strings', the other to do the opposite. 

    Same outcome of course but only one time dataload.
  • Ahmedte1234Ahmedte1234 Member Posts: 3 Learner II
    How can I post question in this forum I need help very much
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello @Ahmedte1234

    Please see below screenshots. You have a big icon "Ask Question" on the top right of this community window. If you click that you can read some quick tips on posting question. You need to provide the title of the question and give a detailed version of your process and issue.



    Once you click this, you get the below screen. Read the three steps provided in the below screen and provide your detailed explanation of the issue.


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.