Sunday, January 03, 2010

Google AdSense Fraud

As I mentioned in my last post, we stopped advertising through Google’s Content Network very soon after we started experimenting with AdSense when we thought we had detected significant fraud. Having not spent much money by this time, the amounts involved were relatively small (less than £100), but the fraction was high. We considered at least 1/3 of our clicks to be fraudulent: deliberately, criminally fraudulent.

Proud of our forensic IT skills, we rushed to Do-No-Evil Google to report our discovery. Our report contained the bare minimum of facts; we were so convinced it screamed fraud we did not bother to go into much more detail than the websites and the Click Through Rates (CTRs). After a week we had a polite and considered response that attempted to persuade us that this was not fraud. Clearly more evidence was required, so we put together a verbose description of the main points that alerted us, and sent it off in expectation of an apology if not a seat on the board for our fraud-busting smarts.

Sadly they refused / failed to be convinced and eventually we had to agree to disagree. Obviously we will continue to use Google for advertising – what choice is there for an online business? But we firmly believe they need to put their house in order. If 1/3 of their revenue is fraudulent they will lose consumer confidence and possibly face sanctions for complicity.

Below I have included the email chain. The name of the operative has been removed as the problem is systemic not personal. The key point, and one I should have put front-and-center (in CAPS perhaps?), is under Red Flag 2 – Set 2. Namely that a badly named, unlinked, and parked-domain website achieved ten times the CTR of Google’s own homepage.

I would very much like to hear from anyone with similar experiences, or from anyone (at Google or otherwise) who disagrees with any of the points we made. Our transaction volume is small and our market is niche, so I am aware we need more data to make a statistically significant conclusion.

Initial Inquiry

[From Rupert to Google on 5/11/09]

Nearly *one third* of all content based CPC placements were obvious click fraud.

Many of the sites have exactly the same content and have a CTR of 100% (or greater in one case!).

I find it incredible that the market leader cannot spot something like this automatically. The fraudsters are not even trying to disguise it.

[Information about the site URLs and the fraud period]

Google’s Response

Hello Rupert,

We have received your request for an invalid clicks investigation. Thank you for your patience while we reviewed your account. I apologize for our delayed response. I understand you are concerned about the quality of clicks you have accrued from certain sites in our content network.

We reviewed your account and can confirm clicks from these sites. However, we found that these clicks are valid, and there is no activity that suggests you have been charged for invalid clicks. The clicks charged fit a pattern of normal user behaviour. As part of our review, the team looked through dozens of data points--including IP addresses, IP blocks, geographic concentrations, network activity, browser patterns, click timings, and any proprietary signals. However, none of those suggest an automated attack, nor collusion from unethical users. The clicks accrued reflect normal user traffic.

Many of the sites that you listed are parked domain sites. A parked domain site is an undeveloped web page belonging to a domain name registrar or domain name holder. Our AdSense for domains programme places targeted AdWords ads on parked domain sites that are part of the Google Network.

Users are brought to parked domain sites when they enter the URL of an undeveloped web page in a browser's address bar.

We've found that AdWords ads displayed on parked domain sites receive clicks from well-qualified leads within the advertisers' markets. In general, we've noticed that the return on investment gained on these pages is equal to or better than that gained on other pages in the search and content networks. However, if you aren't satisfied with the value of the traffic, you can prevent your ads from showing on parked domain sites by using the Site and Category Exclusion tool. Learn how at https://adwords.google.com/support/bin/answer.py?answer=86695&hl=en_GB.

I hope that this information helps address your concern. Please let me assure you that your security is a top priority for Google, and we will continue to monitor all clicks on your ads to prevent abuse. Let us know if you have further questions or if we can be of any more assistance. For more information about steps we take to combat invalid click activity, please visit https://adwords.google.com/support/bin/answer.py?answer=6114&hl=en_US.

Sincerely,

<Google Employee>

The Ad Traffic Quality Team

Rupert’s More Detailed Description

Hi <Google Employee>,

Thank you for getting back to me. I'm afraid I am still doubtful of the validity of these sites. Allow me to illustrate my concerns with some examples:

Red Flag 1

There are 10 sites in the list with a 100% CTR (radiolluvia.com even has 200%):

Domain

Clicks

Impressions

CTR

radiolluvia.com

2

1

200.00%

net-ebooks.com

1

1

100.00%

umtsfree.net

1

1

100.00%

littleabout.com

1

1

100.00%

mtncareer.com

1

1

100.00%

jonefm.com

1

1

100.00%

radiobendele.com

1

1

100.00%

iphalloween.info

1

1

100.00%

pdfee.com

1

1

100.00%

rf-online.com

1

1

100.00%

None of them contain any relevant content (which would be fine - I understand AdSense can never be a science), but most of them consist only of AdSense links. Are you really suggesting that users happen upon a parked domain by typing the URL above into the browser and happen to be in the market for radio planning software (which is what we make)?

We don’t make a mass-market product so I would never expect a high CTR on content networks even when targeting radio engineers – only a minority of them are even actively seeking our type of tool.

Red Flag 2

There are at least three sets of pages that contain practically the same content and come from the same IP subrange.

Set 1: gsmsandwich.com, jonefm.com, keonong.com, mtncareer.com, rf-indo.com, smsgupsup.com. They all look like this:

Set1  

Set 2: umtsfree.net, xlgprs.net, xlgprs.com, ir-hot.com. They all look like this:

Set2

I find it unlikely that in the space of one week, four people were using these sites as some sort of search or index portal (have they not heard of Google? J) and either searched to or browsed to our advert and found it relevant enough to click on. This was after only 254 impressions. Contrast this with Google’s own sponsored search results, which yielded only three hits for 2,051 impressions during the same period. If you really believe these statistics are true, surely you should buy this company immediately because they are 10 times better at advert placement than you are. Perhaps you should consider a smiling coed on the homepage?

Set 3: gamezerm.com, and radiobendele.com both have the same IP and look the same:

Set3

Can it be a coincidence that they have high CTR (50% and 100% respectively)?

In general, all of the sites in these sets have dubious registration details, often using the same registration anonymity service.

Red Flag 3

The click-through path is curious for the links from these sites. For instance, the umtsfree.net site links go through five 302 redirects before landing at the intended target. Here is the chain of URLs:

http://umtsfree.net/forward302.aspx?epc=eWpDPeDkCn7%2fPAWjJDHHizukuSQO4Z0sDU8KfdC9FQ%2b5yWjWwcxv5hXcA5nQpS0OqiEn07sYHTHFe%2fX9vFjDwUcSW01%2bS4WPEL0m7%2fX3z100tRxVe1Mg2zzaXK862vPp7hIJBvAoVV9DPRmnuG%2fkV0w5tbowrxB4AbcTtxa0Bsr%2fCztN7vTUOE0hGYneCC9V5jEY3PRhY5SAeWBCuCp7NzUBODKuSrYrmWbY4g3PHs9mBH08pqUSaY75VuOBggtVC6D5WjIQEZuFNJS10GQ6Bu%2f0JpRTb3xpAWZf4bPOguFyT3zwx6udcQe031GVCTob%2bAk5n3HzuAg2AOTMKncWxG%2bPl6vLUW4DWYQil2ZmY2ILRGYWgOHHAfIlNM1AHowYkUvb%2bBrYHbEQgD6PTID5%2fuaj3OsxHAwLVlhrL3uuu1S3zI4g9mOUab2fnM8yr%2brQmzu2a6UmflkA8s6PaAElxBNg9kZnshsusDIleugD02G6c%2bCTybRaQ0D1IYTOcfyeJLFDejgK2GqObGWs9Nm6J32886U0STHIAz75%2f%2b2snbAtAJQT48cwhAH%2fNQ%2fpaJiHkSON1cIxd7oFroekPJ8iyDhbYZ3VP1TJ0Z7HoHj6HKjeDaemj6LFb1Le0uwGKeLe%2bKc6LxdhYBtjD%2bXnGi0LkIajkmbqWe73rvyLNoRhtd1SsqgDT7wxMUBmPIaisppXnfp%2b8%2ftjiL8R9LU6QFh%2f8%2b4aTQGsrMOTyj55aZ4hx8l3UJck6utVoeVax7%2bOQACwyLYoyyvI4ml9DVsz%2f9Mh4WnmfFdVgPEqIxmJ%2bwnNIlKzX31GePRuLmgLHdeItkfnMPnUWIA8FB485RmGfrEwbF07d7v5JLcYuL4V62CKyW8dl3m89EvsbxcbOTOQN85CQsa5fKdgUZaZ2j1kFgi7Oj9J3MJ10oCQ5OjkZnoBWXZPFsfGxQTxg%2fGxh9k18jep%2b5sesYVB1jRYuej3ptGZBGivoDvEkFR%2fpxCbqB494irMYSWLmmx8c%2frOVZYeIe3XV9P7cBJ4da%2bcrLgJneN2nhKCOX0BDZsw%2bR1L93vZ6LgjgvFgolOFTVpkeF12ecpQWJg5jzm0AnoUhGdj%2fXzFJoJbgaxLnvBsFGql9%2f%2bYyJqMr8URZuovttYKmDemHYS0

http://rc12.overture.com/d/sr/?xargs=15KPjg141SnJamwr%2DocLXBROWAylwaxca58cluD5l4GtZf5iMxXOV4aaTCm8dxTOVxv1PdzPSW%5FqYSL%5FT5kPOJGweKQVWJGuXpjdLJxYw6Nq2jUNEbsYRzy%2DLvmIZGOX0E2laEOd%2D5mO7acZdRD05mjddAwByR%2D%5Flqw8yzxu4IQevVig0sskqFc5Z17tQp9bnAXOx7TLome97vhXfFfZwQ%2D%2DxDke%2DgSygTLyyj4WYa9VeHJi58obDIYo0L3ZbKzoLLOKeswIYJfRXG%2DYe62VuOrU6t8txuN2zT3r4MzgFZJP%5F%2DIlWJ3Ulvvv%2DbgfDfP4074wP1CfzqVHz3dxM5PXU3E5OufGXnbWw99E%5FOfpRQIMSv2xOO

http://clickserve.uk.dartsearch.net/link/click?lid=43000000042332928&ds_s_kwgid=58000000000470369&ds_e_adid=8770229031&ds_e_matchtype=standard&ds_e_kwdid=86608522531&ds_e_kwgid=5777302919&ds_url_v=2

http://ad-emea.doubleclick.net/clk;160208746;22377034;b;u=ds&sv1=42332928&sv2=2009111264&sv3=84754;%3fhttp://www.marshallward.co.uk/?aff=yahoo?&affsrc=acquisition&cm_mmc=yahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

http://fls.doubleclick.net/act;sit=530730;spot=1529997;~dc_rdr=?http%3A//www.marshallward.co.uk/%3Faff%3Dyahoo%3F%26affsrc%3Dacquisition%26cm_mmc%3Dyahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

http://www.marshallward.co.uk/?aff=yahoo?&affsrc=acquisition&cm_mmc=yahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

This would allow the site operator to monitor clicks, which is a legitimate thing to do. It is also something you would need if you wanted to monitor and reward agents clicking the links for you. I appreciate that this in itself is not a “smoking gun”.

Conclusion

I suggest that at least the sites in these sets are fraudulent. The fact that they evaded the data mining checks you mention suggests to me that they were generated by a network of geographically distributed agents. These might be humans, encouraged by a share of the AdSense revenue or they might be an autonomous bot-net with a smart pattern of click behaviour (chaotic perhaps). Frankly the click-pattern doesn’t seem that smart to me, so I think there is another type of fraudster out there that is much smarter and rarely hits the same advertiser twice, this would prevent detection by the advertiser, and would only be detectable by Google themselves.

Let me be clear that I have no reason to doubt the validity of sponsored search or Gmail content adverts and I continue to use these networks, but I have serious doubts about the public AdSense network. I also do not argue that it is still good value for money (even if 1/3 of it is click-fraud), but nobody likes to be ripped off and it is in both our interests to fix it.

What do we expect? We are not looking for refund on these clicks (frankly I have wasted more money typing this email), instead we would like Google to consider my arguments, and if they agree, to improve their detection process. Excluding these sites is a poor option as I will have to spend time every day weeding out spammers from our content network placements. I appreciate that there will always be a number of frauds that are impossible to detect algorithmically, and that you are locked into an arms race with the fraudsters, but there seems to be more you could be doing to improve automatic detection.

Best regards,

Rupert Rawnsley.

Google’s Response

Hello Rupert,

Thank you for your reply and for providing us with the additional information. We appreciate your patience as we work to resolve this issue.

I can confirm the information you have mentioned. Many of the sites in question seem to have the same templates and show only AdSense ads.

However, the clicks that you accrued from these sites are valid. As mentioned in our previous email, these sites are a part of the domain park network. The sites in question do not have any specific content, but are simply "parked" for interested users to purchase the site from the domain hosting company. Also, domain parked sites can be former functioning websites whose domain name contracts have expired. Since these sites are largely created for temporary purposes, the template used may be the same across several websites. This is the reason you may find the same images or the same layout across several of these sites.

Once an interested user purchases the site or renews the domain name registration, the site is automatically removed from the domain parked network by the hosting company. Parked domain sites offer users ads that are relevant to the text they entered. In addition, some parked domain sites include a search box, which allows users to further refine their search. We've found that AdWords ads displayed on parked domain sites receive clicks from well-qualified leads within the advertisers' markets.

In general, we've noticed that the return on investment gained on these pages is equal to or better than that gained on other pages in the search and content networks.

We are sorry to learn that you are disappointed with the quality of clicks you accrued from these sites. Please be assured that the clicks are valid.

We strongly suggest you consider using the site-and-category exclusion tool to prevent your ads from showing on the domain park network.

Thank you for your patience and understanding.

Sincerely,

<Google Employee>

The Ad Traffic Quality Team

1 comment:

Blog Admin said...


I am quite interesting in google adsense topic, I hope you will elaborate more on it in future posts.
I am new to adsense and blogger.
Thanks for sharing..