<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: The Great Web Data Capture Debate: Web Logs or JavaScript Tags?</title>
	<atom:link href="http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html</link>
	<description>Pluralitas non est ponenda sine neccesitate.</description>
	<pubDate>Fri, 05 Dec 2008 18:00:59 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
		<item>
		<title>By: Edward Vielmetti</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-476074</link>
		<dc:creator>Edward Vielmetti</dc:creator>
		<pubDate>Thu, 20 Nov 2008 05:48:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-476074</guid>
		<description>Avinash, here from your reconciliation checklist.

http://www.kaushik.net/avinash/2008/11/ultimate-web-analytics-data-reconciliation-checklist.html

where you mention one source for problems is people running with "javascript turned off (2-3% typical)".  I think it's actually a bit more subtle than that.

With ad blockers it's fairly easy to configure which parts of a complex page load, and to turn off not just the display ads, but also to refuse to load the Google Analytics tracking javascript.  So your "turned off" reader might just be running silent, with most of the javascript working but the trackers gone.

Since all of that code runs in the user's browser, it's not beyond comprehension that someone suitably motivated could throw junk into the data stream as well, but I haven't seen code to do that yet.</description>
		<content:encoded><![CDATA[<p>Avinash, here from your reconciliation checklist.</p>
<p><a href="http://www.kaushik.net/avinash/2008/11/ultimate-web-analytics-data-reconciliation-checklist.html" rel="nofollow">http://www.kaushik.net/avinash/2008/11/ultimate-web-analytics-data-reconciliation-checklist.html</a></p>
<p>where you mention one source for problems is people running with &#8220;javascript turned off (2-3% typical)&#8221;.  I think it&#8217;s actually a bit more subtle than that.</p>
<p>With ad blockers it&#8217;s fairly easy to configure which parts of a complex page load, and to turn off not just the display ads, but also to refuse to load the Google Analytics tracking javascript.  So your &#8220;turned off&#8221; reader might just be running silent, with most of the javascript working but the trackers gone.</p>
<p>Since all of that code runs in the user&#8217;s browser, it&#8217;s not beyond comprehension that someone suitably motivated could throw junk into the data stream as well, but I haven&#8217;t seen code to do that yet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rakhi</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-400113</link>
		<dc:creator>rakhi</dc:creator>
		<pubDate>Tue, 08 Jan 2008 19:21:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-400113</guid>
		<description>I still did not get what is meant by 'search for omniture' on walmart.com</description>
		<content:encoded><![CDATA[<p>I still did not get what is meant by &#8217;search for omniture&#8217; on walmart.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Doug Watt</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-93470</link>
		<dc:creator>Doug Watt</dc:creator>
		<pubDate>Mon, 30 Apr 2007 21:35:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-93470</guid>
		<description>I don’t see why you think tagging separates the Data Serving and Data Capture groups when the site has to be tagged, unless the tagging you are talking about is really basic.  The huge advantage of packet sniffing (I prefer to call it passive data capture because it is a lot more than just capturing packets) is that it really does separate analytical data collection from the web site groups and the data can be used to feed into many different applications.  Passive data capture sees the datalink layer so it can give accurate timing on a page load even if the graphics are served by Akamai because there is a final acknowledgment to the page load.  It cleans, filters and sessionizes the data in real-time to give one clean log to load.  We have customers that exceed 100 million pageviews per day after we have filtered out all of the locally served graphics, stylesheets, robots, local and remote test tools, etc.  Many of the graphics are served by Akamai, but users know that and don’t need that in the analytics.  The argument about local caching is mainly theoretical.  The pages they really care about are never cached because they are the secure buying transactions and they have such a large statistical basis it doesn’t matter much anyway.  We are finding customers where the marketing guys love what they are getting from a tag solution but its only basic stuff and they want more.  The IT guys don’t want to do custom tagging.  The solution is passive data capture that can feed any log file analysis package or emulate a tag server to feed packages expecting data from tags.  See www.metronomelabs.com.</description>
		<content:encoded><![CDATA[<p>I don’t see why you think tagging separates the Data Serving and Data Capture groups when the site has to be tagged, unless the tagging you are talking about is really basic.  The huge advantage of packet sniffing (I prefer to call it passive data capture because it is a lot more than just capturing packets) is that it really does separate analytical data collection from the web site groups and the data can be used to feed into many different applications.  Passive data capture sees the datalink layer so it can give accurate timing on a page load even if the graphics are served by Akamai because there is a final acknowledgment to the page load.  It cleans, filters and sessionizes the data in real-time to give one clean log to load.  We have customers that exceed 100 million pageviews per day after we have filtered out all of the locally served graphics, stylesheets, robots, local and remote test tools, etc.  Many of the graphics are served by Akamai, but users know that and don’t need that in the analytics.  The argument about local caching is mainly theoretical.  The pages they really care about are never cached because they are the secure buying transactions and they have such a large statistical basis it doesn’t matter much anyway.  We are finding customers where the marketing guys love what they are getting from a tag solution but its only basic stuff and they want more.  The IT guys don’t want to do custom tagging.  The solution is passive data capture that can feed any log file analysis package or emulate a tag server to feed packages expecting data from tags.  See <a href="http://www.metronomelabs.com" rel="nofollow">http://www.metronomelabs.com</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avinash Kaushik</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22713</link>
		<dc:creator>Avinash Kaushik</dc:creator>
		<pubDate>Thu, 28 Dec 2006 21:57:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22713</guid>
		<description>GMAC88: The statement was correct, perhaps I was tardy in providing full context. My reference to Wal-Mart database soulution was about their multi petabyte back end that runs all of their logistics, erp and business intelligence systems (nothing to do with the web or web analytics). 

It has been fairly well established that one of Wal-Mart's core strategic advantage is its IT system and the core essence of that IT system sits on a custom built database (not oracle or sybase or db2 etc). I was referring to that but clearly from your comment I should have been more expansive.

Thanks for the comment.

Avinash
PS: I did follow your instructions and did a search for Omniture on walmart.com, but this is what I get:

&lt;img src="http://www.kaushik.net/avinash/wp-content/uploads/2006/12/walmart-omniutre.png"&gt;

I am kidding of course, I know what you meant when you said "search for Omniture"! :)</description>
		<content:encoded><![CDATA[<p>GMAC88: The statement was correct, perhaps I was tardy in providing full context. My reference to Wal-Mart database soulution was about their multi petabyte back end that runs all of their logistics, erp and business intelligence systems (nothing to do with the web or web analytics). </p>
<p>It has been fairly well established that one of Wal-Mart&#8217;s core strategic advantage is its IT system and the core essence of that IT system sits on a custom built database (not oracle or sybase or db2 etc). I was referring to that but clearly from your comment I should have been more expansive.</p>
<p>Thanks for the comment.</p>
<p>Avinash<br />
PS: I did follow your instructions and did a search for Omniture on walmart.com, but this is what I get:</p>
<p><img src="http://www.kaushik.net/avinash/wp-content/uploads/2006/12/walmart-omniutre.png"/></p>
<p>I am kidding of course, I know what you meant when you said &#8220;search for Omniture&#8221;! :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gmac88</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22707</link>
		<dc:creator>gmac88</dc:creator>
		<pubDate>Thu, 28 Dec 2006 21:04:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22707</guid>
		<description>Hi there,

Your statement about Wal-Mart building their own data collection solution is in-correct or alluded to they were using an inhouse solution which is incorrect.  They do have their own Datawarehouse where they also do combined primary, secondary for brick and motar and tertiary analysis for websites but they do use a 3rd party vendor to do the primary website data collection, processing, and analysis.

Your Statement: "Often this is a easy choice to make of any company that considers its core competency to be to focus on its business and not developing web analytics solutions (though admittedly if you are Wal-Mart you can absolutely do that - for example they have invented their own database solution since nothing in the world can meet their size and scale)."

Wal-Mart uses Omniture to track and collect data for all their properties including their Windows Media Player music site.  Do a search for "Omniture" on Wal-Mart.com and you will their tags.</description>
		<content:encoded><![CDATA[<p>Hi there,</p>
<p>Your statement about Wal-Mart building their own data collection solution is in-correct or alluded to they were using an inhouse solution which is incorrect.  They do have their own Datawarehouse where they also do combined primary, secondary for brick and motar and tertiary analysis for websites but they do use a 3rd party vendor to do the primary website data collection, processing, and analysis.</p>
<p>Your Statement: &#8220;Often this is a easy choice to make of any company that considers its core competency to be to focus on its business and not developing web analytics solutions (though admittedly if you are Wal-Mart you can absolutely do that - for example they have invented their own database solution since nothing in the world can meet their size and scale).&#8221;</p>
<p>Wal-Mart uses Omniture to track and collect data for all their properties including their Windows Media Player music site.  Do a search for &#8220;Omniture&#8221; on Wal-Mart.com and you will their tags.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sébastien Brodeur</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22600</link>
		<dc:creator>Sébastien Brodeur</dc:creator>
		<pubDate>Thu, 28 Dec 2006 14:03:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22600</guid>
		<description>Steve,

I agree, filtering is needed when working with log files.  But we already remove bot, css, xml and images from the logs, and it's still 800Mb of data each day.  I don't say that javascript is better than log data mining, I just say that sometime log data mining is not worth the trouble. It all depend of the goal of the site.

One think log data mining will never be replace by javascript tag for, is: analysis load on server.  But even this can be replace by product like Coradiant (www.coradiant.com)</description>
		<content:encoded><![CDATA[<p>Steve,</p>
<p>I agree, filtering is needed when working with log files.  But we already remove bot, css, xml and images from the logs, and it&#8217;s still 800Mb of data each day.  I don&#8217;t say that javascript is better than log data mining, I just say that sometime log data mining is not worth the trouble. It all depend of the goal of the site.</p>
<p>One think log data mining will never be replace by javascript tag for, is: analysis load on server.  But even this can be replace by product like Coradiant (www.coradiant.com)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Theresa Locklear</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22088</link>
		<dc:creator>Theresa Locklear</dc:creator>
		<pubDate>Tue, 26 Dec 2006 15:48:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-22088</guid>
		<description>Great post. On combo-solutions: There are several WA tools available that allow for an integration of both page tags (javascript) AND log file analysis. This encompasses the best of both worlds as you can set up rules to determine what should be interpreted as page views, clicks, exits, impressions, file downloads, form entries (with specific info grabbed from the UI), etc... Robot filters can be used, historic log data can be integrated, cache busting can be used, web 2.0 tags can be implemented, and pages with inaccurate tags (for let's say page name) can still be evaluated. 

These solutions are usually not any more expensive to personnel time or $$$ than the drop and go page tag solutions out there.  

-Theresa</description>
		<content:encoded><![CDATA[<p>Great post. On combo-solutions: There are several WA tools available that allow for an integration of both page tags (javascript) AND log file analysis. This encompasses the best of both worlds as you can set up rules to determine what should be interpreted as page views, clicks, exits, impressions, file downloads, form entries (with specific info grabbed from the UI), etc&#8230; Robot filters can be used, historic log data can be integrated, cache busting can be used, web 2.0 tags can be implemented, and pages with inaccurate tags (for let&#8217;s say page name) can still be evaluated. </p>
<p>These solutions are usually not any more expensive to personnel time or $$$ than the drop and go page tag solutions out there.  </p>
<p>-Theresa</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-21045</link>
		<dc:creator>steve</dc:creator>
		<pubDate>Sat, 23 Dec 2006 05:14:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-21045</guid>
		<description>Sébastien;

who said it isn't 1Gb a day? :-)

Can I humbly suggest you're approaching the problem in the wrong way, urm, twice. :-)
What percentage of that 1Gb is relevant to clickstream? What percentage is robots and other gumpf.
Do upfront filtering of both and you have probably reduced 1Gb to 50Mb. And if you can't clickstream that....
Practically every dedicated loganalysis tool I've ever seen and/or used does a *really* poor job of prefiltering efficiently.
There are stock tools that do it far better. egrep, gawk and perl, for example, are far more efficient at pre-filtering. Use multi-cpu machines such that all are working on part of the problem. The pre-filter stages will also help with IO latency. Especially if you leave the logs compressed. IO latency is a killer for big log analysis.


eg. I do all my analysis on a HT 3Ghz dual Xeon IBMx235 server. I can process a full years worth of logs in about an hour or so. Multiple staged filtering, all 4 effective CPU's working their butts off, disks ticking along nicely.
Worst case, do the filtering on multiple machines and stage it that way. Easy!
Hardware is cheap - your time is $$$$$.

Another trick - get an OS that can run in memory file systems. eg /dev/shm under linux or /tmp on Solaris. Stage your files into that. Thus you end up processing the files in memory. IO latency goes to zippo. Especially, if you build your filter chain/pipeline, such that the next log is there ready and waiting before anything needs it.


Which brings me to the other issue: Re-analysis is not as immediate, usually, as day to day analysis. So it doesn't matter if it runs overnight. Or longer.
I don't reanalyse 4+ years of data every day. Maybe 4-8 times a year. I did easily double that in two weeks at the start of 2006. It varies.


If your structure/layout changes, is clickstream even worth redoing? It's no longer relevant any more. The value is not so much in the clickstream analysis anymore but elsewhere.
So that part, IMHO, becomes moot anyway.

Oh I agree the world is not black and white. It's in colour! ;-)


But the key message I, probably badly, tried to put across:
Don't ignore any source of data. Cross check, verify, recheck those assumptions. It's too easy to get lulled into a false sense of security in this game.


Javascript Tagging, and here's where I do agree with Avinash, is probably more than good enough for most people who genuinely care about how their site is used. But when *I* check my personal site (vs work's), it's next to useless. Misses huge swathes of folk. And that percentage changes too. 

I'd have to ask him, it should be fairly obvious, but I'd betcha I'm invisible to any non-log analysis Avinash runs here. Why? I run the noscript &#38; adblock plugin's in Firefox. Got fed up with all the javascript rubbish that too many sites throw my way. And if a site totally breaks without javascript. Shrug, plenty of others that don't. Not my loss. Very few indeed are the web sites that have content compelling enough for me to care.

Evil Grin: I know one site that switched on javascript tagging of the ... conversion step you'd call it. To better track it and such. *BIG* site. 1Gb a day? Try per half hour if not more. ;-)
Well they hadn't really looked at how their users actually used that conversion step and immediately alienated a huge chunk of their customers. Big Mistake. They reverted that change *very* quickly.
This was about 4-8 months ago from memory.


Todd,
makes some excellent points. At work I actually use about 6 additional logging subsystems as adjuncts to the base Apache logging. It varies. Some capture App specific stuff, others search engine (internal) and so on.
As they are all different, they assist with cross checks - bug hunting and so on. Impact is insignificant.
Pure Gold tho for getting different views on the system as a whole. And that's the key here. It's not a webserver, or even a webserver farm. It's a system! that just so happens to serve webpages as it's principle function.

Cheers!
- Steve</description>
		<content:encoded><![CDATA[<p>Sébastien;</p>
<p>who said it isn&#8217;t 1Gb a day? :-)</p>
<p>Can I humbly suggest you&#8217;re approaching the problem in the wrong way, urm, twice. :-)<br />
What percentage of that 1Gb is relevant to clickstream? What percentage is robots and other gumpf.<br />
Do upfront filtering of both and you have probably reduced 1Gb to 50Mb. And if you can&#8217;t clickstream that&#8230;.<br />
Practically every dedicated loganalysis tool I&#8217;ve ever seen and/or used does a *really* poor job of prefiltering efficiently.<br />
There are stock tools that do it far better. egrep, gawk and perl, for example, are far more efficient at pre-filtering. Use multi-cpu machines such that all are working on part of the problem. The pre-filter stages will also help with IO latency. Especially if you leave the logs compressed. IO latency is a killer for big log analysis.</p>
<p>eg. I do all my analysis on a HT 3Ghz dual Xeon IBMx235 server. I can process a full years worth of logs in about an hour or so. Multiple staged filtering, all 4 effective CPU&#8217;s working their butts off, disks ticking along nicely.<br />
Worst case, do the filtering on multiple machines and stage it that way. Easy!<br />
Hardware is cheap - your time is $$$$$.</p>
<p>Another trick - get an OS that can run in memory file systems. eg /dev/shm under linux or /tmp on Solaris. Stage your files into that. Thus you end up processing the files in memory. IO latency goes to zippo. Especially, if you build your filter chain/pipeline, such that the next log is there ready and waiting before anything needs it.</p>
<p>Which brings me to the other issue: Re-analysis is not as immediate, usually, as day to day analysis. So it doesn&#8217;t matter if it runs overnight. Or longer.<br />
I don&#8217;t reanalyse 4+ years of data every day. Maybe 4-8 times a year. I did easily double that in two weeks at the start of 2006. It varies.</p>
<p>If your structure/layout changes, is clickstream even worth redoing? It&#8217;s no longer relevant any more. The value is not so much in the clickstream analysis anymore but elsewhere.<br />
So that part, IMHO, becomes moot anyway.</p>
<p>Oh I agree the world is not black and white. It&#8217;s in colour! ;-)</p>
<p>But the key message I, probably badly, tried to put across:<br />
Don&#8217;t ignore any source of data. Cross check, verify, recheck those assumptions. It&#8217;s too easy to get lulled into a false sense of security in this game.</p>
<p>Javascript Tagging, and here&#8217;s where I do agree with Avinash, is probably more than good enough for most people who genuinely care about how their site is used. But when *I* check my personal site (vs work&#8217;s), it&#8217;s next to useless. Misses huge swathes of folk. And that percentage changes too. </p>
<p>I&#8217;d have to ask him, it should be fairly obvious, but I&#8217;d betcha I&#8217;m invisible to any non-log analysis Avinash runs here. Why? I run the noscript &amp; adblock plugin&#8217;s in Firefox. Got fed up with all the javascript rubbish that too many sites throw my way. And if a site totally breaks without javascript. Shrug, plenty of others that don&#8217;t. Not my loss. Very few indeed are the web sites that have content compelling enough for me to care.</p>
<p>Evil Grin: I know one site that switched on javascript tagging of the &#8230; conversion step you&#8217;d call it. To better track it and such. *BIG* site. 1Gb a day? Try per half hour if not more. ;-)<br />
Well they hadn&#8217;t really looked at how their users actually used that conversion step and immediately alienated a huge chunk of their customers. Big Mistake. They reverted that change *very* quickly.<br />
This was about 4-8 months ago from memory.</p>
<p>Todd,<br />
makes some excellent points. At work I actually use about 6 additional logging subsystems as adjuncts to the base Apache logging. It varies. Some capture App specific stuff, others search engine (internal) and so on.<br />
As they are all different, they assist with cross checks - bug hunting and so on. Impact is insignificant.<br />
Pure Gold tho for getting different views on the system as a whole. And that&#8217;s the key here. It&#8217;s not a webserver, or even a webserver farm. It&#8217;s a system! that just so happens to serve webpages as it&#8217;s principle function.</p>
<p>Cheers!<br />
- Steve</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todd Chenard</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20905</link>
		<dc:creator>Todd Chenard</dc:creator>
		<pubDate>Fri, 22 Dec 2006 17:50:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20905</guid>
		<description>I wholeheartedly agree with JavaScript tagging for core tracking.  But there are limitations with JS tagging in the volume of data you can collect (e.g. the IE URL 2083 char limit) and in many cases the type of reporting you can create out of your web analytics solution.  Another alternative besides standard Apache/IIS web logs is custom application logging.  The big drawback here is that you need IT resources to build the custom application logging and load it into a data warehouse.  Application logging allows our company to capture 20-30 different variables per business event.  Our JS tagging can't handle the combination of that many variables and their sizes.  Application logging has also allowed us to analyze search engine crawlers and other unwanted automated activity with wonderful variable depth.  We do have unique identifiers in place that allow us to tie our JS tagging solution back to our application logging at the individual business event level.  Our JS tagging solution is effectively a subset of our application logging in regards to custom variables, but the incremental benefit of JS tagging is that it provides our UV, referrer, and other standard web metric counts (all the goodies you get out of the box with a web analytics provider).  One more thing, I can't tell you how many times we have cross-validated between both tracking systems due to 'bugs' in the data capture in one or both systems.</description>
		<content:encoded><![CDATA[<p>I wholeheartedly agree with JavaScript tagging for core tracking.  But there are limitations with JS tagging in the volume of data you can collect (e.g. the IE URL 2083 char limit) and in many cases the type of reporting you can create out of your web analytics solution.  Another alternative besides standard Apache/IIS web logs is custom application logging.  The big drawback here is that you need IT resources to build the custom application logging and load it into a data warehouse.  Application logging allows our company to capture 20-30 different variables per business event.  Our JS tagging can&#8217;t handle the combination of that many variables and their sizes.  Application logging has also allowed us to analyze search engine crawlers and other unwanted automated activity with wonderful variable depth.  We do have unique identifiers in place that allow us to tie our JS tagging solution back to our application logging at the individual business event level.  Our JS tagging solution is effectively a subset of our application logging in regards to custom variables, but the incremental benefit of JS tagging is that it provides our UV, referrer, and other standard web metric counts (all the goodies you get out of the box with a web analytics provider).  One more thing, I can&#8217;t tell you how many times we have cross-validated between both tracking systems due to &#8216;bugs&#8217; in the data capture in one or both systems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avinash Kaushik</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20902</link>
		<dc:creator>Avinash Kaushik</dc:creator>
		<pubDate>Fri, 22 Dec 2006 17:39:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20902</guid>
		<description>&lt;strong&gt;SEO Portal:&lt;/strong&gt; It is important to stress that your unique needs might mean a different solution, that is perfectly ok. 

As regards to combo solutions, if there are resources and tools that make it easy for you to do it then that's absolutely great. Usually though with finite resources (people and $$$) and a need for speed it might make sense to simplify and make a single choice.

You are right on downloads (Dr. Turner also rightly added robots and vendor independence as benefits for logs), they will only be in logs though there is very finite data you can get about download from a log file.  Many companies are starting to use Akamai Download Manager type apps, in which case there is deep and rich information about the downloads that one can get from there. IMHO the information you'll get there is actually business actionable (repeated attempts, tie to a user, aborts etc etc) which we simply can't get from a log file. 

Not every company will / can use a Download Manager in which case if downloads reporting is important log files are a good place to be.

&lt;strong&gt;Dr. Turner: &lt;/strong&gt;Absolutely agree with the sentiment you have expressed in your comment. That's it. No more comment from me! :)

Thanks,

Avinash.</description>
		<content:encoded><![CDATA[<p><strong>SEO Portal:</strong> It is important to stress that your unique needs might mean a different solution, that is perfectly ok. </p>
<p>As regards to combo solutions, if there are resources and tools that make it easy for you to do it then that&#8217;s absolutely great. Usually though with finite resources (people and $$$) and a need for speed it might make sense to simplify and make a single choice.</p>
<p>You are right on downloads (Dr. Turner also rightly added robots and vendor independence as benefits for logs), they will only be in logs though there is very finite data you can get about download from a log file.  Many companies are starting to use Akamai Download Manager type apps, in which case there is deep and rich information about the downloads that one can get from there. IMHO the information you&#8217;ll get there is actually business actionable (repeated attempts, tie to a user, aborts etc etc) which we simply can&#8217;t get from a log file. </p>
<p>Not every company will / can use a Download Manager in which case if downloads reporting is important log files are a good place to be.</p>
<p><strong>Dr. Turner: </strong>Absolutely agree with the sentiment you have expressed in your comment. That&#8217;s it. No more comment from me! :)</p>
<p>Thanks,</p>
<p>Avinash.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sébastien Brodeur</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20882</link>
		<dc:creator>Sébastien Brodeur</dc:creator>
		<pubDate>Fri, 22 Dec 2006 14:26:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20882</guid>
		<description>Steve,

When the size og your log file will be 1 gig by day, tell me how you will be able to calculate (in a finite time) clickstream from back to 2002?  

Also, my web site structure happen to change over time, making clickstream data analysis difficult.

I still believe the world is made of shade of gray, not black or white.</description>
		<content:encoded><![CDATA[<p>Steve,</p>
<p>When the size og your log file will be 1 gig by day, tell me how you will be able to calculate (in a finite time) clickstream from back to 2002?  </p>
<p>Also, my web site structure happen to change over time, making clickstream data analysis difficult.</p>
<p>I still believe the world is made of shade of gray, not black or white.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Friday&#8217;s Internet Marketing News Roundup &#124; Marketing Pilgrim</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20881</link>
		<dc:creator>Friday&#8217;s Internet Marketing News Roundup &#124; Marketing Pilgrim</dc:creator>
		<pubDate>Fri, 22 Dec 2006 14:24:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20881</guid>
		<description>[...] This will likely be the last news post until after Christmas. Here’s what’s caught my attention today.

1. Avinash Kaushik discusses the merits of javascript analytics over web log files. [...]</description>
		<content:encoded><![CDATA[<p>[...] This will likely be the last news post until after Christmas. Here’s what’s caught my attention today.</p>
<p>1. Avinash Kaushik discusses the merits of javascript analytics over web log files. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SEO Portal</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20859</link>
		<dc:creator>SEO Portal</dc:creator>
		<pubDate>Fri, 22 Dec 2006 13:16:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20859</guid>
		<description>but what about the combination of the two? 
How do you track your downloads without using logfiles?</description>
		<content:encoded><![CDATA[<p>but what about the combination of the two?<br />
How do you track your downloads without using logfiles?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Julien Coquet</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20840</link>
		<dc:creator>Julien Coquet</dc:creator>
		<pubDate>Fri, 22 Dec 2006 11:22:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20840</guid>
		<description>Hi Avinash, great read as usual :D

Regarding the comments you made on the pro's and con's of Javascript and the implied tagging implementation requirements and constraints, I'd like to point to a recent article by Bruce Tate, CTO of GoodWell, which he recently posted on the IBL community site.

In a nutshell, he describes how it is increasingly difficult to ignore or rule out Javascript in this Web 2.0 age.

Basically, if you're going to track user behavior using Javascript, you might as well leverage that implementation to track your overall Web traffic ;)

&lt;a href="http://www-128.ibm.com/developerworks/java/library/j-cb12196/?ca=dgr-lnxw07Javascript-Respect" rel="nofollow"&gt;Here is the link to the article&lt;/a&gt;

Cheers,

Julien</description>
		<content:encoded><![CDATA[<p>Hi Avinash, great read as usual :D</p>
<p>Regarding the comments you made on the pro&#8217;s and con&#8217;s of Javascript and the implied tagging implementation requirements and constraints, I&#8217;d like to point to a recent article by Bruce Tate, CTO of GoodWell, which he recently posted on the IBL community site.</p>
<p>In a nutshell, he describes how it is increasingly difficult to ignore or rule out Javascript in this Web 2.0 age.</p>
<p>Basically, if you&#8217;re going to track user behavior using Javascript, you might as well leverage that implementation to track your overall Web traffic ;)</p>
<p><a href="http://www-128.ibm.com/developerworks/java/library/j-cb12196/?ca=dgr-lnxw07Javascript-Respect" rel="nofollow">Here is the link to the article</a></p>
<p>Cheers,</p>
<p>Julien</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephen Turner</title>
		<link>http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20829</link>
		<dc:creator>Stephen Turner</dc:creator>
		<pubDate>Fri, 22 Dec 2006 10:40:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.kaushik.net/avinash/2006/12/the-great-web-data-capture-debate-web-logs-or-javascript-tags.html#comment-20829</guid>
		<description>Great post, Avinash. I've always believed that the choice is much less important than people think. There are advantages of Javascript tags (e.g., avoids caching problems, easier to collect additional variables) and advantages of logfiles (e.g., contains search engine robot activity, absence of vendor lock-in), and we can argue (or worry) about them all day. But in the end, a good analyst can get excellent actionable data from either technology. When you start segmenting your data, which is really at the heart of web analytics, then you're comparing the relative numbers from two groups of visitors, and whether you're 10% above or 10% below some unknowable "true" figure seems much less important.</description>
		<content:encoded><![CDATA[<p>Great post, Avinash. I&#8217;ve always believed that the choice is much less important than people think. There are advantages of Javascript tags (e.g., avoids caching problems, easier to collect additional variables) and advantages of logfiles (e.g., contains search engine robot activity, absence of vendor lock-in), and we can argue (or worry) about them all day. But in the end, a good analyst can get excellent actionable data from either technology. When you start segmenting your data, which is really at the heart of web analytics, then you&#8217;re comparing the relative numbers from two groups of visitors, and whether you&#8217;re 10% above or 10% below some unknowable &#8220;true&#8221; figure seems much less important.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
