The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
LogFileSource
Hi guys,
After I managed to import around 200 000 examples into rapidMiner, I recognised that the software transforms the logfile date into a numeric value. But as I wanted to include descriptive statistics into the overall analysis I searched for an operator rebuilding the timestamp.
As I realized that there are only operators which transform date formats into numeric or nominal values I got really upset. Does anybody of you know how to transform numeric date values into a readable timestamp? Please help me as I don't want to use other logfile open source tools for analysing them in a descriptive way. I was looking for creating some basic operator chains for this task to use them in the future.
Thank you!
0
Answers
you can transform numerical attributes into date attributes within two steps using the Numerical2Polynominal and Nominal2Date operators. Here is a small example: If you want to restrict these transformations to some individual attributes, you can achieve this using the AttributeSubsetPreprocessing. Extended Example: Best regards,
Ralf
Thank you for the fast reply. I tried to fix the problem but it always stops with some error message; that it's unparseable somehow.
Well I think it would be useful to post some details.
The log file entries look like that :
[01/Nov/2008:00:03:26 +0100]
and the configuration file looks this way :
<!-- Date and Time-->
<field class="org.polliwog.fields.DateTimeField"
openQuote="["
closeQuote="]">
<param id="format"
value="dd/MMM/yyyy:HH:mm:ss Z" />
</field>
After the LogFileSource operator is finished one date looks like this:
20424903
as well as after converting it with the Numerical2Polynominal operator
I tried this configuration for the Nominal2Date operator:
attr. name : "time" <-- correct one ;D
date_type : "date_time"
date_format : dd'/'MM'/'yyyy':'hh':'mm':'ss' 'Z
time_zone : SYSTEM
locale : German <-- I am staying in Sweden right now, but the analysis is on logfiles of a mobile service used in the Alps
keep_old_attr. : unchecked
date_format : dd/MMM/yyyy:hh:mm:ss Z
or
date_format : dd/MMM/yyyy:HH:mm:ss Z
Best regards,
Ralf
Message: Cannot parse the data in line 1 for attribute time with the date format dd/MMM/yyyy:hh:mm:ss Z: Unparseable date: "20424903"
The operator chain looks like that
Root
->operator chain
->LogFileSource
->AttributeSubsetPreprocessing
->Numerical2Polynominal
->Nominal2Date
...... a lot of other adjustments and filter ;D
the LogFileSource operator obviously already transforms the date string into a number, i.e. the number of milliseconds or seconds or so from a reference date. Hence the date_format parameter of the Nominal2Date attribute should be:
date_format: S
or
date_format: s
or so. For more information on the date and time format strings please mark the Nominal2Date operator in the process view of RapidMiner and press F1 for the help text.
Best regards,
Ralf
Well, after I tried putting in just "s" "S" "m" ... it started to put in some existing dates, but not the one in the log file ... it's exhausting :-\
I will try other inputs. I already read through that info box but anyway, thank you for your help!
If you have any further information or idea, just come up with it.
Perhaps you could put the date format of mine into one of your test log files and delete the other entries to find a solution ?
But I think you have other things to do ... let's see if I find a solution during the weekend. If so, I will of course put the information into the forum
to be continued ...
Does perhaps anybody know how the LogFileSource operator exactly transforms the timestamp of a log file?
Is there any source code available to get an idea of what date_format (Nominal2Date operator value) should be chosen in order to rebuild the timestamp?
Thank you for any answers to these questions!
I am not sure but I think I already solved the problem last Friday without recognizing it. ::)
As you can read in one of my last posts I already tried the input "m" for date_format
which should be the correct value for my log file timestamp format.
I think I was confused that the first entries of the transformed timestamps havn't been the exact timestamps saved within the logfiles.
Well, this is only true for the first 35 rows of one sample logfile with around 600 rows. But I think a deviation of around 0,5 % through
transformation is still a good value to work with.
Ralf, thank you again for this very useful last hint of yours. The LogFileSource operator has somehow transformed the timestamp
into a specific value:
The sum of minutes which have to be added to the timestamp of 1970-01-01 00:00 in order to more or less get the actual one.
I hope this information will help others to rebuild their timestamps!
Best regards!
I don't want to open a new thread, thus I just point out another question :
As I want to create some basic operator chains to analyse log files in a descriptive way:
Did anybody already have a similar idea using e.g. the OLAP Aggregation operator? ???
Analysing log files by hits per day/month/year and many other analysing procedures would be possible using it.
Has anybody some sample operator chains created already? ???
I would be grateful for any help in this area as I am still a beginner in using rapidMiner. ::)
Hi community,
perhaps something else : does anybody of you know which unit of time is used for the default session length
the operator LogFileSource uses?
Thank you!
I guess your last question depends on what your server returns. I doubt, it does more than reading the parameter from the log file. The most common unit will be milliseconds, I think.
Greetings,
Sebastian
well, the problem is that I don't know anything about the server configuration for this mobile service. The thing is, that the operator LogFileSource works with a default "session_timeout" value of 400000 .
And I agree with you that it should be milliseconds, thus a value of around 6.7 minutes. But what is the basis for this value? Is it some kind of average value where session lengths have been observed during a couple of years? ???
Well, I will just put in some other values for testing purposes. Let's see if there are other results in the end ...
Regards!
I doubt that there so much thinking about the default of this parameter. I guess it was more like: "Uhmm, we have milliseconds, lets take an hour, that makes in milliseconds, uhmmm, ok now round it to 400000, looks nicer now."
And the session length does not depend on the server anyway. It more or less depends on the user and will differ from user to user. But since you don't know, what a user does in one session and when it stops, you have to draw a line. So the default says: If the user didn't do anything for around an hour, the next access is part of a new session.
Greetings,
Sebastian