Sessionization of Conversions (or any single metric)

Sessionization can be broken out into two separate processes pre and post-sessionization; with pre = you are collecting session information upfront and attaching a sessionID to every record using a web-analytics shema to you.  This discussion will concentrate on post-Sessionization = your website does not collect session information, so you want to derive sessions after the fact.  It is not easy especially if you are trying to turn that into clickstream analysis. Vertica gives a good summary definition:

“Sessionization is a common analytic operation in clickstream analysis. Given an input clickstream table, where each row records a webpage click made by a particular user (or IP address), the sessionization operation identifies user’s web browsing sessions from the recorded clicks, by grouping the clicks from each user based on the time-intervals between the clicks. Conceptually, if two clicks from the same user are made too far apart in time (as defined by a time-out threshold), they will be treated as coming from two browsing sessions.”

Post-sessionization to create clickstream (page-pathing) data is hard – here is a good short video on it:
http://www.asterdata.com/resources/sql-mapreduce-casestudy-sessionization/index.php

However, post-sessionization for a single activity/metric, does not have to be hard.  For example, you can sessionize your most important metrics – such as clicks on affiliate-outClicks or advertisements, or whatever your conversion is.   This makes particular sense for affiliate-outClicks b/c you generally only need to set that cookie once per user, per affiliate.  Thereby, your core metric is CTR = clicks/visits, is actually best defined as (sessions w/ clicks)/visits.   In MySQL you can accomplish this like so:

set @userIdentifier = ”;
set @num = 1;
set @ClickDate = ’0000-00-00 00:00:00′;
set @extraVariable = ”;

SELECT
D.userIdentifier,
COUNT(D.clickId) as clicks,
sum(case when row_number = 1 or row_number = 0 then 1
else 0 end) as SessionClicks
FROM
(
select
ro.userIdentifier
,ro.clickDate
,ro.clickID
,@num := if ((@userIdentifier = ro.userIdentifier) and (TIMESTAMPDIFF(MINUTE,@clickDate, ro.created)) <= 30 and (@extraVariable = ro.extraVariable), @num + 1, 1) as row_number
,@userIdentifier := ro.userIdentifier as dummyuser
,@clickDate := ro.created as dummycreated
,@extraVariable := ro.site as dummysite
from
conversionsTable as ro
where
#and ro.userIdentifier is not null
Order By userIdentifier, created
) as D
group by 1,2
;

I need to read these again:

http://developer.teradata.com/extensibility/articles/sessionization-map-reduce-support-in-teradata

http://www.vertica.com/2010/10/04/sessionize-with-style-part-1/

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>