Sunday, January 03, 2010

Content Delivery Networks

My company has a very specific requirement: we need to get our application to any desktop in the world in less than three minutes. There are business drivers for this that I shall not go into; basically it is so that potential customers don’t get bored waiting for our application to install and run. We are currently failing to do that for all users, and we suspect we are losing customers because we fall at the first fence.

The Problem is Discovered

Our installer is about 50MB, which is not huge, but we have been seeing an enormous variation in deployment times to various parts of the world. Currently we use a UK-based hosting service with high symmetric bandwidth, but routine log analysis revealed that the install times for some users exceeded 10 minutes, and many did not complete. A quick web search revealed that this is a well known problem, so well known in fact that there are many commercial solutions that come under the generic title of Content Delivery Networks (CDNs). The big players are companies like Akami and Limelight, but I am allergic to companies that won’t tell you the price, and I suspect our needs are too modest to be worth their while addressing. There is however a new class of companies like GoGrid emerging and there are established hosting players like Amazon (with CloudFront) and Rackspace (using Limelights CDN network) who are offering CDNs. The new-kid-on-the-block is Microsoft, who beta-launched the Azure CDN solution just as my investigations began.

CDN, like all hosting, is a highly commodified product. There are certainly modest differences in terms of things like upload flexibility (Azure stinks), clever torrent links (Amazon S3 rocks), and general UI friendliness, but there were no showstoppers. The only really important metrics are speed, reliability, and cost. Cost was easy, everyone who didn’t make it clear on their website in the first two minutes was discarded (are you starting to understand our business drivers now?), and the remaining companies were all so cheap that it wasn’t worth worrying about. This is because we are talking about a very small amount of data 50MB x 100 installs per month = 5GB and the pricing is never more than about 25 cents per GB. These businesses are built for large streaming media and Flash media files, not for tiny desktop installers like ours.

Reliability next: we are not particularly concerned about reliability given that we are statistically unlikely to lose enough business in the difference between four nines and five nines to make it worth basing a decision on. Everybody can do four nines.

So that left speed, which comes in two flavors: latency and bandwidth. Latency is critical for that snappy website that puts your shop window in front of the customer in less than a few seconds (which is sometimes all you have). Incidentally, I didn’t come across any CDN webhosts, particularly ones that support ASP.NET, but you have to imagine it is coming from Azure. In our case, bandwidth was going to dominate so that is what we needed to know about.

During my research, I came across Ryan Kearney’s comparison of CDN providers. He gives a great round-up of the price and features of many of the providers, as well as some latency statistics for a handful of international locations. He was kind enough to host a file for my test rig on his Rackspace account, which was much appreciated.

So there are plenty of CDN providers, but very little information available to allow you to compare them. For instance, India and China are two very important markets for us, but what is the bandwidth to them from each of the providers? Clearly we needed to do some measurements.

The Game is Afoot

How do you measure the bandwidth of a host to every country in the world? Well, there are many companies that offer website monitoring and will alert you if your website goes down, some of these have international monitoring capabilities, and some of them have page download time statistics. However, to get an accurate picture of download speeds you need a fairly sizable file so that the bandwidth lag dominates other factors such as DNS resolution or server latency. Only one web monitoring service actually downloaded the whole file, allowing us to make an accurate estimate of bandwidth. They are WebSitePulse, and I could not have done this analysis without them. They have the most monitoring stations in the world, the most detailed statistics, and a 30 day free trial, which I used for this investigation. I highly recommend them to anyone looking for sophisticated, international, web site monitoring.

We created a test file called Test1MB.zip, which was a zip file that was truncated to exactly 1MB. A zip file is largely incompressible and the extension stops most servers from trying (actually few offer HTTP compression, which is a serious omission but beyond the scope of this post). This was mounted on multiple hosts and WebSitePulse was configured to download the files periodically. The WebSitePulse trial limits you to 20 monitor stations at a time (and excludes Auckland and Melbourne), and I didn’t have access to all of the hosts from the beginning, so the statistics are not done to laboratory standards. However, the statistical picture that emerges is reliable enough to allow business decisions to be made.

The Runners and Riders

Host CDN Capable Notes
RapidSwitch No Our current host and representative of good quality hosting in the UK.
Azure CDN Yes Still in beta, and we literally started using it the day it opened, so there were teething problems.
Rackspace Yes Huge player in hosting and cloud computing.
Amazon CloudFront Yes CDN at the front, Amazon’s S3 at the back. Nominally still in beta, but frankly charging for something means you must be judged as a commercial product.
Amazon S3 No Our S3 hosting is in the US, so this is the standard candle for US-based cloud hosting.
GoGrid CDN Yes A high number of international points-of-presence, and more on the way.

Very few of the CDN companies offer free trials for some reason, but I think all are pay-as-you-go, which costs pennies for what we want. It took a bit of back-and-forth to get my GoGrid account set up, but their Twitter guy was great at fixing the problem once I made him aware of it. This meant that there are slightly less results for GoGrid. The whole trial ran for the best part of a month with roughly 15 minute poll times for every host. I had to change things around a bit as I went along to stay within the T&C’s of the WebSitePulse trial – you get $1000 to play with in total.

The following locations were monitored: Amsterdam, Bangalore, Beijing (2 monitors), Boston, Brisbane, Buenos Aires, Chicago, Dusseldorf, Guangzhou, Hong Kong, Houston, London, Los Angeles, Miami, Montreal, Mumbai, Munich, New York, Paris, San Francisco, Sao Paulo, Seattle, Shanghai, Singapore, Stockholm, Sydney (2 monitors), Tokyo, Toronto, Trumbull, Vancouver, Washington

The Results

The summary of the results is shown below:

Host Uptime Average 1MB DL Time (s)
GoGrid

100.00%

2.03

Rackspace CDN

100.00%

2.70

Amazon CloudFront

100.00%

4.46

Azure CDN

99.52%

4.67

Amazon S3

100.00%

5.04

RapidSwitch

99.98%

7.43

Here are the detailed results for all of the monitoring stations and hosts sorted into average download time order:

  GoGrid Rackspace Amazon CloudFront Azure CDN Amazon S3 RapidSwitch Average

New York

0.12

0.19

0.24

1.00

0.50

1.39

0.57

Boston

0.17

0.24

0.42

1.06

0.54

1.31

0.62

Trumbull

0.16

0.39

0.55

1.38

0.50

1.36

0.72

Washington

0.21

0.36

0.98

1.04

0.34

1.80

0.79

Houston

0.24

0.34

0.30

0.73

1.20

2.20

0.84

Paris

0.23

0.31

0.40

2.39

1.78

0.24

0.89

Dusseldorf

0.20

0.29

0.18

3.08

1.92

0.27

0.99

Amsterdam

0.16

0.15

0.47

2.43

2.70

0.24

1.03

Chicago

0.05

0.19

1.95

1.02

1.60

1.48

1.05

San Francisco

0.30

0.30

0.22

1.58

1.83

2.31

1.09

London

0.15

0.37

0.40

3.68

1.89

0.18

1.11

Vancouver

0.15

0.41

0.23

1.57

1.64

2.69

1.12

Toronto

0.40

0.91

0.48

2.58

1.99

1.67

1.34

Seattle

0.16

0.31

0.21

2.15

1.64

3.66

1.36

Munich

0.63

0.29

0.31

4.17

3.06

0.67

1.52

Miami

0.35

4.06

0.70

1.72

0.83

2.25

1.65

Stockholm

0.71

0.20

0.82

4.48

4.82

0.67

1.95

Los Angeles

0.29

0.39

0.34

2.99

3.81

6.47

2.38

Sao Paulo

2.53

2.77

2.70

2.79

3.49

3.50

2.96

Brisbane

0.45

0.42

3.40

4.26

5.96

6.00

3.42

Tokyo

2.00

1.01

1.40

3.04

4.35

8.84

3.44

Sydney

1.17

1.25

3.19

4.15

5.81

5.78

3.56

Bangalore

3.32

0.76

1.75

4.61

8.33

2.66

3.57

Sydney 2

0.18

4.63

3.74

5.07

7.41

5.29

4.39

Montreal

1.01

1.57

1.46

12.44

2.14

8.40

4.51

Mumbai

4.23

2.96

2.00

4.87

10.40

3.70

4.69

Buenos Aires

5.80

7.03

6.41

7.60

6.25

5.33

6.40

Singapore

3.52

1.27

3.24

9.66

8.67

13.62

6.66

Hong Kong

1.24

1.76

1.24

7.65

9.28

26.92

8.02

Beijing 2

5.72

8.23

11.62

7.95

11.20

17.53

10.38

Guangzhou

5.43

10.99

19.48

8.23

10.30

10.91

10.89

Beijing

8.50

14.33

23.92

13.37

16.29

71.55

24.66

Shanghai

17.32

20.56

52.27

19.37

23.83

24.16

26.25

Average

2.03

2.70

4.46

4.67

5.04

7.43

4.39

image

Here are the raw stats if you would like to do any further analysis of your own.

Conclusions

Clearly GoGrid and Rackspace are the best providers from the hosts tested. GoGrid has the best average performance and is unbeaten to almost all of the monitoring stations.

Asia is very badly served by all the hosts tested. Obviously there are dedicated hosting services for Asia, but the whole point of a CDN is that it is global. I expect partnerships are being drafted as I type.

Amazon S3 barely outperforms CloudFront on average, but peak download times per city are much better in some cases.

Montreal did much worse than I expected given that Canada is so well connected to the US.

Amazon and Azure CDN’s both perform equally well, although the uptime of Azure looks bad. Actually the Azure uptime was only really bad for the first few days, after that it was very good, so it is probably not a fair measure.

Did We Win?

Our original aim was to move 50MB in less than three minutes. Therefore our target time for 1MB is 180 / 50 = 3.6 seconds. Even with the fastest CDN host, we are still failing to meet this target for several cities. For Shanghai, we are a factor of five off. And of course this is before we get from where the monitoring stations are (which is probably a well connected hub) out to users at the network edge.

So big-iron can help us make significant improvements for very little effort and cost, but the war goes on. I might tell you how we finally win in a future post. Hint: we make the installer smaller.

Google AdSense Fraud

As I mentioned in my last post, we stopped advertising through Google’s Content Network very soon after we started experimenting with AdSense when we thought we had detected significant fraud. Having not spent much money by this time, the amounts involved were relatively small (less than £100), but the fraction was high. We considered at least 1/3 of our clicks to be fraudulent: deliberately, criminally fraudulent.

Proud of our forensic IT skills, we rushed to Do-No-Evil Google to report our discovery. Our report contained the bare minimum of facts; we were so convinced it screamed fraud we did not bother to go into much more detail than the websites and the Click Through Rates (CTRs). After a week we had a polite and considered response that attempted to persuade us that this was not fraud. Clearly more evidence was required, so we put together a verbose description of the main points that alerted us, and sent it off in expectation of an apology if not a seat on the board for our fraud-busting smarts.

Sadly they refused / failed to be convinced and eventually we had to agree to disagree. Obviously we will continue to use Google for advertising – what choice is there for an online business? But we firmly believe they need to put their house in order. If 1/3 of their revenue is fraudulent they will lose consumer confidence and possibly face sanctions for complicity.

Below I have included the email chain. The name of the operative has been removed as the problem is systemic not personal. The key point, and one I should have put front-and-center (in CAPS perhaps?), is under Red Flag 2 – Set 2. Namely that a badly named, unlinked, and parked-domain website achieved ten times the CTR of Google’s own homepage.

I would very much like to hear from anyone with similar experiences, or from anyone (at Google or otherwise) who disagrees with any of the points we made. Our transaction volume is small and our market is niche, so I am aware we need more data to make a statistically significant conclusion.

Initial Inquiry

[From Rupert to Google on 5/11/09]

Nearly *one third* of all content based CPC placements were obvious click fraud.

Many of the sites have exactly the same content and have a CTR of 100% (or greater in one case!).

I find it incredible that the market leader cannot spot something like this automatically. The fraudsters are not even trying to disguise it.

[Information about the site URLs and the fraud period]

Google’s Response

Hello Rupert,

We have received your request for an invalid clicks investigation. Thank you for your patience while we reviewed your account. I apologize for our delayed response. I understand you are concerned about the quality of clicks you have accrued from certain sites in our content network.

We reviewed your account and can confirm clicks from these sites. However, we found that these clicks are valid, and there is no activity that suggests you have been charged for invalid clicks. The clicks charged fit a pattern of normal user behaviour. As part of our review, the team looked through dozens of data points--including IP addresses, IP blocks, geographic concentrations, network activity, browser patterns, click timings, and any proprietary signals. However, none of those suggest an automated attack, nor collusion from unethical users. The clicks accrued reflect normal user traffic.

Many of the sites that you listed are parked domain sites. A parked domain site is an undeveloped web page belonging to a domain name registrar or domain name holder. Our AdSense for domains programme places targeted AdWords ads on parked domain sites that are part of the Google Network.

Users are brought to parked domain sites when they enter the URL of an undeveloped web page in a browser's address bar.

We've found that AdWords ads displayed on parked domain sites receive clicks from well-qualified leads within the advertisers' markets. In general, we've noticed that the return on investment gained on these pages is equal to or better than that gained on other pages in the search and content networks. However, if you aren't satisfied with the value of the traffic, you can prevent your ads from showing on parked domain sites by using the Site and Category Exclusion tool. Learn how at https://adwords.google.com/support/bin/answer.py?answer=86695&hl=en_GB.

I hope that this information helps address your concern. Please let me assure you that your security is a top priority for Google, and we will continue to monitor all clicks on your ads to prevent abuse. Let us know if you have further questions or if we can be of any more assistance. For more information about steps we take to combat invalid click activity, please visit https://adwords.google.com/support/bin/answer.py?answer=6114&hl=en_US.

Sincerely,

<Google Employee>

The Ad Traffic Quality Team

Rupert’s More Detailed Description

Hi <Google Employee>,

Thank you for getting back to me. I'm afraid I am still doubtful of the validity of these sites. Allow me to illustrate my concerns with some examples:

Red Flag 1

There are 10 sites in the list with a 100% CTR (radiolluvia.com even has 200%):

Domain

Clicks

Impressions

CTR

radiolluvia.com

2

1

200.00%

net-ebooks.com

1

1

100.00%

umtsfree.net

1

1

100.00%

littleabout.com

1

1

100.00%

mtncareer.com

1

1

100.00%

jonefm.com

1

1

100.00%

radiobendele.com

1

1

100.00%

iphalloween.info

1

1

100.00%

pdfee.com

1

1

100.00%

rf-online.com

1

1

100.00%

None of them contain any relevant content (which would be fine - I understand AdSense can never be a science), but most of them consist only of AdSense links. Are you really suggesting that users happen upon a parked domain by typing the URL above into the browser and happen to be in the market for radio planning software (which is what we make)?

We don’t make a mass-market product so I would never expect a high CTR on content networks even when targeting radio engineers – only a minority of them are even actively seeking our type of tool.

Red Flag 2

There are at least three sets of pages that contain practically the same content and come from the same IP subrange.

Set 1: gsmsandwich.com, jonefm.com, keonong.com, mtncareer.com, rf-indo.com, smsgupsup.com. They all look like this:

Set1  

Set 2: umtsfree.net, xlgprs.net, xlgprs.com, ir-hot.com. They all look like this:

Set2

I find it unlikely that in the space of one week, four people were using these sites as some sort of search or index portal (have they not heard of Google? J) and either searched to or browsed to our advert and found it relevant enough to click on. This was after only 254 impressions. Contrast this with Google’s own sponsored search results, which yielded only three hits for 2,051 impressions during the same period. If you really believe these statistics are true, surely you should buy this company immediately because they are 10 times better at advert placement than you are. Perhaps you should consider a smiling coed on the homepage?

Set 3: gamezerm.com, and radiobendele.com both have the same IP and look the same:

Set3

Can it be a coincidence that they have high CTR (50% and 100% respectively)?

In general, all of the sites in these sets have dubious registration details, often using the same registration anonymity service.

Red Flag 3

The click-through path is curious for the links from these sites. For instance, the umtsfree.net site links go through five 302 redirects before landing at the intended target. Here is the chain of URLs:

http://umtsfree.net/forward302.aspx?epc=eWpDPeDkCn7%2fPAWjJDHHizukuSQO4Z0sDU8KfdC9FQ%2b5yWjWwcxv5hXcA5nQpS0OqiEn07sYHTHFe%2fX9vFjDwUcSW01%2bS4WPEL0m7%2fX3z100tRxVe1Mg2zzaXK862vPp7hIJBvAoVV9DPRmnuG%2fkV0w5tbowrxB4AbcTtxa0Bsr%2fCztN7vTUOE0hGYneCC9V5jEY3PRhY5SAeWBCuCp7NzUBODKuSrYrmWbY4g3PHs9mBH08pqUSaY75VuOBggtVC6D5WjIQEZuFNJS10GQ6Bu%2f0JpRTb3xpAWZf4bPOguFyT3zwx6udcQe031GVCTob%2bAk5n3HzuAg2AOTMKncWxG%2bPl6vLUW4DWYQil2ZmY2ILRGYWgOHHAfIlNM1AHowYkUvb%2bBrYHbEQgD6PTID5%2fuaj3OsxHAwLVlhrL3uuu1S3zI4g9mOUab2fnM8yr%2brQmzu2a6UmflkA8s6PaAElxBNg9kZnshsusDIleugD02G6c%2bCTybRaQ0D1IYTOcfyeJLFDejgK2GqObGWs9Nm6J32886U0STHIAz75%2f%2b2snbAtAJQT48cwhAH%2fNQ%2fpaJiHkSON1cIxd7oFroekPJ8iyDhbYZ3VP1TJ0Z7HoHj6HKjeDaemj6LFb1Le0uwGKeLe%2bKc6LxdhYBtjD%2bXnGi0LkIajkmbqWe73rvyLNoRhtd1SsqgDT7wxMUBmPIaisppXnfp%2b8%2ftjiL8R9LU6QFh%2f8%2b4aTQGsrMOTyj55aZ4hx8l3UJck6utVoeVax7%2bOQACwyLYoyyvI4ml9DVsz%2f9Mh4WnmfFdVgPEqIxmJ%2bwnNIlKzX31GePRuLmgLHdeItkfnMPnUWIA8FB485RmGfrEwbF07d7v5JLcYuL4V62CKyW8dl3m89EvsbxcbOTOQN85CQsa5fKdgUZaZ2j1kFgi7Oj9J3MJ10oCQ5OjkZnoBWXZPFsfGxQTxg%2fGxh9k18jep%2b5sesYVB1jRYuej3ptGZBGivoDvEkFR%2fpxCbqB494irMYSWLmmx8c%2frOVZYeIe3XV9P7cBJ4da%2bcrLgJneN2nhKCOX0BDZsw%2bR1L93vZ6LgjgvFgolOFTVpkeF12ecpQWJg5jzm0AnoUhGdj%2fXzFJoJbgaxLnvBsFGql9%2f%2bYyJqMr8URZuovttYKmDemHYS0

http://rc12.overture.com/d/sr/?xargs=15KPjg141SnJamwr%2DocLXBROWAylwaxca58cluD5l4GtZf5iMxXOV4aaTCm8dxTOVxv1PdzPSW%5FqYSL%5FT5kPOJGweKQVWJGuXpjdLJxYw6Nq2jUNEbsYRzy%2DLvmIZGOX0E2laEOd%2D5mO7acZdRD05mjddAwByR%2D%5Flqw8yzxu4IQevVig0sskqFc5Z17tQp9bnAXOx7TLome97vhXfFfZwQ%2D%2DxDke%2DgSygTLyyj4WYa9VeHJi58obDIYo0L3ZbKzoLLOKeswIYJfRXG%2DYe62VuOrU6t8txuN2zT3r4MzgFZJP%5F%2DIlWJ3Ulvvv%2DbgfDfP4074wP1CfzqVHz3dxM5PXU3E5OufGXnbWw99E%5FOfpRQIMSv2xOO

http://clickserve.uk.dartsearch.net/link/click?lid=43000000042332928&ds_s_kwgid=58000000000470369&ds_e_adid=8770229031&ds_e_matchtype=standard&ds_e_kwdid=86608522531&ds_e_kwgid=5777302919&ds_url_v=2

http://ad-emea.doubleclick.net/clk;160208746;22377034;b;u=ds&sv1=42332928&sv2=2009111264&sv3=84754;%3fhttp://www.marshallward.co.uk/?aff=yahoo?&affsrc=acquisition&cm_mmc=yahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

http://fls.doubleclick.net/act;sit=530730;spot=1529997;~dc_rdr=?http%3A//www.marshallward.co.uk/%3Faff%3Dyahoo%3F%26affsrc%3Dacquisition%26cm_mmc%3Dyahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

http://www.marshallward.co.uk/?aff=yahoo?&affsrc=acquisition&cm_mmc=yahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

This would allow the site operator to monitor clicks, which is a legitimate thing to do. It is also something you would need if you wanted to monitor and reward agents clicking the links for you. I appreciate that this in itself is not a “smoking gun”.

Conclusion

I suggest that at least the sites in these sets are fraudulent. The fact that they evaded the data mining checks you mention suggests to me that they were generated by a network of geographically distributed agents. These might be humans, encouraged by a share of the AdSense revenue or they might be an autonomous bot-net with a smart pattern of click behaviour (chaotic perhaps). Frankly the click-pattern doesn’t seem that smart to me, so I think there is another type of fraudster out there that is much smarter and rarely hits the same advertiser twice, this would prevent detection by the advertiser, and would only be detectable by Google themselves.

Let me be clear that I have no reason to doubt the validity of sponsored search or Gmail content adverts and I continue to use these networks, but I have serious doubts about the public AdSense network. I also do not argue that it is still good value for money (even if 1/3 of it is click-fraud), but nobody likes to be ripped off and it is in both our interests to fix it.

What do we expect? We are not looking for refund on these clicks (frankly I have wasted more money typing this email), instead we would like Google to consider my arguments, and if they agree, to improve their detection process. Excluding these sites is a poor option as I will have to spend time every day weeding out spammers from our content network placements. I appreciate that there will always be a number of frauds that are impossible to detect algorithmically, and that you are locked into an arms race with the fraudsters, but there seems to be more you could be doing to improve automatic detection.

Best regards,

Rupert Rawnsley.

Google’s Response

Hello Rupert,

Thank you for your reply and for providing us with the additional information. We appreciate your patience as we work to resolve this issue.

I can confirm the information you have mentioned. Many of the sites in question seem to have the same templates and show only AdSense ads.

However, the clicks that you accrued from these sites are valid. As mentioned in our previous email, these sites are a part of the domain park network. The sites in question do not have any specific content, but are simply "parked" for interested users to purchase the site from the domain hosting company. Also, domain parked sites can be former functioning websites whose domain name contracts have expired. Since these sites are largely created for temporary purposes, the template used may be the same across several websites. This is the reason you may find the same images or the same layout across several of these sites.

Once an interested user purchases the site or renews the domain name registration, the site is automatically removed from the domain parked network by the hosting company. Parked domain sites offer users ads that are relevant to the text they entered. In addition, some parked domain sites include a search box, which allows users to further refine their search. We've found that AdWords ads displayed on parked domain sites receive clicks from well-qualified leads within the advertisers' markets.

In general, we've noticed that the return on investment gained on these pages is equal to or better than that gained on other pages in the search and content networks.

We are sorry to learn that you are disappointed with the quality of clicks you accrued from these sites. Please be assured that the clicks are valid.

We strongly suggest you consider using the site-and-category exclusion tool to prevent your ads from showing on the domain park network.

Thank you for your patience and understanding.

Sincerely,

<Google Employee>

The Ad Traffic Quality Team

Wednesday, December 16, 2009

Three Months With Google AdWords

We recently changed our business plan and embraced online selling. We were obviously aware it existed before, we just didn't have anything that you could effectively sell in that way. Online selling means online advertising, and we all know that means Google. This gave us our first look at AdWords - Google's advertising OS - and we were impressed, but we soon discovered some unexpected things that you might find interesting.

The Anatomy of Google AdWords

AdWords has two distinct channels: search and content network. Search is the "Sponsored Links" section that you see at the side and top of search results. Content network is everywhere else that Google puts adverts, which is principally the AdSense network and GMail. By default, adverts are delivered across all channels and there are numerous analyses for seeing how they perform. We have been advertising with a very modest budget to a niche market for three months, but in that time we have had some eye-opening experiences.

AdWords is Expensive

You don't pay when Google shows an advert, only when someone clicks on it (a click-through). However, a click-through from the search page costs us £2.50 and from the content network it is about 80p. I assumed it would be much less than this, but then I have led a sheltered life. I am not saying it is not worth the money; it just costs more than I expected.

You Are Google's Bitch

As an online business, we rely on people being able to find us online (duh!). Either they are looking for us explicitly or are looking for the type of thing we sell. 70% of people will do this using Google's organic search, so if Google decide your site is suspect or undesirable in some way and decide to de-list you, your market just got a whole lot smaller. There is no discussion process, no independent authority to whom you can appeal, no cost to Google, and no way back. Sites can be demoted or completely blocked erroneously and without warning or explanation just for something as simple as putting light text on a light background (because it looks like you are trying to fool the GoogleBot into promoting your search rank). This is a dangerous monopoly and it needs to be addressed, and I had never really thought about it before, but it is not my central thesis today.

AdSense is Suspect

In the first week, most of our spending was going into the AdSense network, but when we looked at the target sites were disappointed by the placement choices. More seriously than that, we found strong evidence of click fraud. I don't want to divert from my central thesis at this point (it's coming I promise), so this will be the subject of my next posting. I only raise the point now to say that we suspended AdSense adverts, so all "content network" placements were on GMail only.

Search is Irrelevant

Here is the really interesting thing: In the three months we have been advertising we have had 20,000 impressions on the search pages but 1,000,000 impressions on GMail. Just 2% of the eye-time that we bought from Google came from search.

Assuming our figures are typical (and this is by no means certain given the niche nature of our business) it begs the question: why does Google stay in the search game? The most obvious answer is that people think, as we did, that search engines are the primary online advertising platform, so they take their marketing dollars to Google, only to have Google spend them elsewhere. Therefore, Google must maintain their dominance in the search game or companies will move their budgets elsewhere based on the incorrect assumption that less people would see their adverts.

Google is Vulnerable

It also implies Google is vulnerable. Ignoring AdSense for a moment (which is risky, but work with me here), GMail is not the biggest WebMail provider by a long chalk. Plucking the first numbers I could find off the web I see that GMail has about 90m users compared to Microsoft and Yahoo's 250m apiece (see why Yahoo is still a valuable asset now?).

It appears to me that in taking Google on at search, Microsoft and Yahoo are charging at the cape rather than the bull. Show people that you are the advertising channels of choice and hit Google where it hurts.

Thursday, December 03, 2009

Bran Ferren - Leadership



Bran Ferren is an imagineering alumnus and co-founder of the coolest company in the world (with the simplest website).

At two hours, this lecture is long, but it flies by (and I won't normally watch anything on the Internet longer than 4 minutes). Bran is so lucid and inspiring that his enthusiasm is infectious.

I was reminded of some old Feynman lectures I have seen, also to MIT students as I recall. Feynman's lecture was outside (it was the 60's) and the questions were about anti-gravity (it was the 60's), Ferren's lecture was inside and mentioned business and the military-industrial complex without being heckled (it wasn't the 60's). Whether this is progress or not is left as an exercise for the reader.

I can't remember who turned me on to this, so apologies for omitting credit. Probably BoingBoing.

WARNING: Not safe for people with pogonophobia!

Thursday, October 15, 2009

ClickOnce Deployment Problems

We have recently moved to ClickOnce for all our application deployment needs, and it is fantastic in so many ways, but there are still some scenarios where it fails. I will collect these together in one post and keep it up-to-date as new information becomes available. If anyone else has any ClickOnce war-stories, please add them to the comments.

Useful Links



MSDN Server and Client Configuration Issues in ClickOnce Deployments

ClickOnce and Setup & Deployment Projects MSDN forum.

RobinDotNet's Blog has some useful tips, including .NET 4.0 logging. Robin is an MVP and ClickOnce forum moderator.

Proxy Servers



UPDATE: Thanks to Robin for this link to a fix for the proxy issue.

If your customers are behind an authenticated proxy server, ClickOnce may fail to work. This is because the authentication settings are usually stored in Internet Explorer, but the ClickOnce deployment libraries (System.Deployment.Application) do not use them by default. The bug is documented here, but opinion varies about whether it should be fixed or not: Microsoft says it shouldn't, everybody else says it should.

When it goes wrong a typical error report looks like this:


PLATFORM VERSION INFO
Windows : 5.2.3790.131072 (Win32NT)
Common Language Runtime : 2.0.50727.3053
System.Deployment.dll : 2.0.50727.3053 (netfxsp.050727-3000)
mscorwks.dll : 2.0.50727.3053 (netfxsp.050727-3000)
dfdll.dll : 2.0.50727.3053 (netfxsp.050727-3000)
dfshim.dll : 2.0.50727.3053 (netfxsp.050727-3000)

SOURCES
Deployment url : http://overtureonline.com/Applications/Keima/Overture/Overture.application

ERROR SUMMARY
Below is a summary of the errors, details of these errors are listed later in the log.
* Activation of http://OvertureOnline.com/Applications/Keima/Overture/Overture.application resulted in exception. Following failure messages were detected:
+ Downloading http://overtureonline.com/Applications/Keima/Overture/Overture.application did not succeed.
+ The remote server returned an error: (407) Proxy Authentication Required.

COMPONENT STORE TRANSACTION FAILURE SUMMARY
No transaction error was detected.

WARNINGS
There were no warnings during this operation.

OPERATION PROGRESS STATUS
* [25/09/2009 15:59:15] : Activation of http://OvertureOnline.com/Applications/Keima/Overture/Overture.application has started.

ERROR DETAILS
Following errors were detected during this operation.
* [25/09/2009 15:59:16] System.Deployment.Application.DeploymentDownloadException (Unknown subtype)
- Downloading http://overtureonline.com/Applications/Keima/Overture/Overture.application did not succeed.
- Source: System.Deployment
- Stack trace:
at System.Deployment.Application.SystemNetDownloader.DownloadSingleFile(DownloadQueueItem next)
at System.Deployment.Application.SystemNetDownloader.DownloadAllFiles()
at System.Deployment.Application.FileDownloader.Download(SubscriptionState subState)
at System.Deployment.Application.DownloadManager.DownloadManifestAsRawFile(Uri& sourceUri, String targetPath, IDownloadNotification notification, DownloadOptions options, ServerInformation& serverInformation)
at System.Deployment.Application.DownloadManager.DownloadDeploymentManifestDirectBypass(SubscriptionStore subStore, Uri& sourceUri, TempFile& tempFile, SubscriptionState& subState, IDownloadNotification notification, DownloadOptions options, ServerInformation& serverInformation)
at System.Deployment.Application.DownloadManager.DownloadDeploymentManifestBypass(SubscriptionStore subStore, Uri& sourceUri, TempFile& tempFile, SubscriptionState& subState, IDownloadNotification notification, DownloadOptions options)
at System.Deployment.Application.ApplicationActivator.PerformDeploymentActivation(Uri activationUri, Boolean isShortcut, String textualSubId, String deploymentProviderUrlFromExtension, BrowserSettings browserSettings, String& errorPageUrl)
at System.Deployment.Application.ApplicationActivator.ActivateDeploymentWorker(Object state)
--- Inner Exception ---
System.Net.WebException
- The remote server returned an error: (407) Proxy Authentication Required.
- Source: System
- Stack trace:
at System.Net.HttpWebRequest.GetResponse()
at System.Deployment.Application.SystemNetDownloader.DownloadSingleFile(DownloadQueueItem next)

COMPONENT STORE TRANSACTION DETAILS
No transaction information is available.


A smart customer of ours actually found an answer: she ran an NTLM proxy, but we shouldn't expect all our customers to be computer geniuses.

The recommended solution is for the customer to edit their machine.config file - I can't see how that could possibly go badly ;-)

Internet Explorer Security


If the Internet Explorer security option "Run components not signed with Authenticode" option is set to "Disable", then your .application file will not work. The reason why would be clearer if the wording was "Run components not signed with Authenticode certificates that I trust", because even if your installer is actually signed, it is not necessarily a trusted publisher on the customers machine.

The error looks like this:


PLATFORM VERSION INFO
Windows : 6.0.6001.65536 (Win32NT)
Common Language Runtime : 4.0.20506.1
System.Deployment.dll : 4.0.20506.1 (Beta1.020506-0100)
clr.dll : 4.0.20506.1 (Beta1.020506-0100)
dfdll.dll : 4.0.20506.1 (Beta1.020506-0100)
dfshim.dll : 4.0.20428.1 (Beta1.020428-0100)

SOURCES
Deployment url : http://cadenza.keima.co.uk/Applications/Keima/Test/ConsoleApplicationClickOnce/ConsoleApplicationClickOnce.application
Server : Microsoft-IIS/7.0
X-Powered-By : ASP.NET

ERROR SUMMARY
Below is a summary of the errors, details of these errors are listed later in the log.
* Activation of http://cadenza.keima.co.uk/Applications/Keima/Test/ConsoleApplicationClickOnce/ConsoleApplicationClickOnce.application resulted in exception. Following failure messages were detected:
+ Your Web browser settings do not allow you to run unsigned applications.

COMPONENT STORE TRANSACTION FAILURE SUMMARY
No transaction error was detected.

WARNINGS
There were no warnings during this operation.

OPERATION PROGRESS STATUS
* [10/15/2009 3:21:06 PM] : Activation of http://cadenza.keima.co.uk/Applications/Keima/Test/ConsoleApplicationClickOnce/ConsoleApplicationClickOnce.application has started.

ERROR DETAILS
Following errors were detected during this operation.
* [10/15/2009 3:21:06 PM] System.Deployment.Application.InvalidDeploymentException (Manifest)
- Your Web browser settings do not allow you to run unsigned applications.
- Source: System.Deployment
- Stack trace:
at System.Deployment.Application.ApplicationActivator.BrowserSettings.Validate(String manifestPath)
at System.Deployment.Application.ApplicationActivator.PerformDeploymentActivation(Uri activationUri, Boolean isShortcut, String textualSubId, String deploymentProviderUrlFromExtension, BrowserSettings browserSettings, String& errorPageUrl)
at System.Deployment.Application.ApplicationActivator.ActivateDeploymentWorker(Object state)

COMPONENT STORE TRANSACTION DETAILS
No transaction information is available.


I don't know of any tidy workaround for this. If the client installs your certificate prior to product installation then it will work, but then it's no longer ClickOnce is it? One might argue that they asked for this by changing the setting, but it is more likely that their IT department did it and they know nothing about it. The irony is that if you run the bootstrap setup.exe, as you must from browsers like Chrome, then it works fine, but this means you are forcing clients to run a full-trust executable instead of a potentially low-trust .application file.

UPDATE: Robin points out in the comments below that this is an IE6 only issue.

Missing Local Files


A ClickOnce install puts files in a special folder called Apps (%localappdata%\Apps on Vista at least). I'm not sure the exact purpose of all of the folders and files, but I do know if you delete some of them it can cause web-launch failures. For instance, I installed a simple application, deleted the folder which held the application binary files and then tried to relaunch the application. This failed with the following message:


PLATFORM VERSION INFO
Windows : 6.0.6001.65536 (Win32NT)
Common Language Runtime : 4.0.20506.1
System.Deployment.dll : 4.0.20506.1 (Beta1.020506-0100)
clr.dll : 4.0.20506.1 (Beta1.020506-0100)
dfdll.dll : 4.0.20506.1 (Beta1.020506-0100)
dfshim.dll : 4.0.20428.1 (Beta1.020428-0100)

SOURCES
Deployment url : http://cadenza.keima.co.uk/Applications/Keima/Test/ConsoleApplicationClickOnce/ConsoleApplicationClickOnce.application
Server : Microsoft-IIS/7.0
X-Powered-By : ASP.NET

IDENTITIES
Deployment Identity : ConsoleApplicationClickOnce.application, Version=1.0.0.1, Culture=neutral, PublicKeyToken=39bbeb210852ebe6, processorArchitecture=msil

APPLICATION SUMMARY
* Online only application.

ERROR SUMMARY
Below is a summary of the errors, details of these errors are listed later in the log.
* Activation of http://cadenza.keima.co.uk/Applications/Keima/Test/ConsoleApplicationClickOnce/ConsoleApplicationClickOnce.application resulted in exception. Following failure messages were detected:
+ The directory name is invalid. (Exception from HRESULT: 0x8007010B)

COMPONENT STORE TRANSACTION FAILURE SUMMARY
No transaction error was detected.

WARNINGS
There were no warnings during this operation.

OPERATION PROGRESS STATUS
* [10/15/2009 3:32:28 PM] : Activation of http://cadenza.keima.co.uk/Applications/Keima/Test/ConsoleApplicationClickOnce/ConsoleApplicationClickOnce.application has started.
* [10/15/2009 3:32:28 PM] : Processing of deployment manifest has successfully completed.

ERROR DETAILS
Following errors were detected during this operation.
* [10/15/2009 3:32:28 PM] System.Runtime.InteropServices.COMException
- The directory name is invalid. (Exception from HRESULT: 0x8007010B)
- Source: System.Deployment
- Stack trace:
at System.Deployment.Application.NativeMethods.CorLaunchApplication(UInt32 hostType, String applicationFullName, Int32 manifestPathsCount, String[] manifestPaths, Int32 activationDataCount, String[] activationData, PROCESS_INFORMATION processInformation)
at System.Deployment.Application.ComponentStore.ActivateApplication(DefinitionAppId appId, String activationParameter, Boolean useActivationParameter)
at System.Deployment.Application.SubscriptionStore.ActivateApplication(DefinitionAppId appId, String activationParameter, Boolean useActivationParameter)
at System.Deployment.Application.ApplicationActivator.Activate(DefinitionAppId appId, AssemblyManifest appManifest, String activationParameter, Boolean useActivationParameter)
at System.Deployment.Application.ApplicationActivator.PerformDeploymentActivation(Uri activationUri, Boolean isShortcut, String textualSubId, String deploymentProviderUrlFromExtension, BrowserSettings browserSettings, String& errorPageUrl)
at System.Deployment.Application.ApplicationActivator.ActivateDeploymentWorker(Object state)

COMPONENT STORE TRANSACTION DETAILS
* Transaction at [10/15/2009 3:32:28 PM]
+ System.Deployment.Internal.Isolation.StoreOperationSetDeploymentMetadata
- Status: Set
- HRESULT: 0x0
+ System.Deployment.Internal.Isolation.StoreTransactionOperationType (27)
- HRESULT: 0x0


Thanks to Russell Christopher for blogging the only solution I have found that works: delete everything in the Apps folder. This should be safe given that ClickOnce applications repair themselves if completely removed. Apart from Start Menu shortcuts, they are completely defined by the files on the disk (I think).

Obviously this is the clients fault for messing with the files in the first place, but I was surprised it didn't repair itself.

Plain Text / XML .application File


Pathology: When the .application file is clicked, the browser shows the plain text XML file instead of launching the ClickOnce installer.

I only get this problem with IE6, and in my case it is a MIME issue. My server labels .application files with the generic application/octet-stream MIME types, and IE6 doesn't do a good job at deducing the correct type by inspecting the content.

It appears that IE7 and above have more sophisticated ways of resolving the .application file MIME than simply looking at the MIME type that the server reports, which is often difficult or impossible to change for exotic types like application/x-ms-application if you don't have root access to the server. In this case, you can serve up the EXE bootstrapper instead of the .application file to anyone with IE6.

Clue 1: Probably a server MIME encoding issue.
Clue 2: More evidence of MIME encoding.

The ClickOnce Graveyard


Here lies all the problems I am currently experiencing for which I do not have a definitive solution. As they are resolved, I will move them up into the body of the main article.

Setup EXE Bootstrapper Fails to Launch


Pathology: On a virgin XP installation, the EXE bootstrapper fails to launch.

Clues: Saving the bootstrapper and launching it manually throws a dialog with the following example text:

Unable to satisfy all prerequisites for [Application]. Setup cannot continue until all system components have been successfully installed.

Prerequisite check for system component .NET Framework 3.5 SP1 failed with the following error message:
"Installation of the .NET Framework 3.5 SP1 requires Windows XP SP2, Windows 2003 SP1, Windows Vista, or later. Contact your application vendor."

See the setup log file located at
[Log file location]

This doesn't happen on XP SP2 or above, but I don't know how to make this a prerequisite. More to the point, I don't know how to throw a friendly dialog if this prerequisite is not met.

Thursday, October 01, 2009

Sinister discovery of the day

Whilst researching international post-codes (don't ask), I happened upon a Wiki page about Chinese post-codes (not a euphemism). Note the codes (000000–009999) reserved for Taiwan.

Friday, September 11, 2009

Apology for Alan Turing

The British Government apologised for their persecution of Alan Turning following an online petition. I'm delighted by this for both highlighting the abominable treatment of gay people and for bringing my hero Turing to peoples attention again.

As a signatory of the online petition, I received a copy of the apology in full in an email (which I thought was a nice touch). Here it is reproduced in full:

"Thank you for signing this petition. The Prime Minister has written a response. Please read below.

Prime Minister: 2009 has been a year of deep reflection – a chance for Britain, as a nation, to commemorate the profound debts we owe to those who came before. A unique combination of anniversaries and events have stirred in us that sense of pride and gratitude which characterise the British experience. Earlier this year I stood with Presidents Sarkozy and Obama to honour the service and the sacrifice of the heroes who stormed the beaches of Normandy 65 years ago. And just last week, we marked the 70 years which have passed since the British government declared its willingness to take up arms against Fascism and declared the outbreak of World War Two. So I am both pleased and proud that, thanks to a coalition of computer scientists, historians and LGBT activists, we have this year a chance to mark and celebrate another contribution to Britain’s fight against the darkness of dictatorship; that of code-breaker Alan Turing.

Turing was a quite brilliant mathematician, most famous for his work on breaking the German Enigma codes. It is no exaggeration to say that, without his outstanding contribution, the history of World War Two could well have been very different. He truly was one of those individuals we can point to whose unique contribution helped to turn the tide of war. The debt of gratitude he is owed makes it all the more horrifying, therefore, that he was treated so inhumanely. In 1952, he was convicted of ‘gross indecency’ – in effect, tried for being gay. His sentence – and he was faced with the miserable choice of this or prison - was chemical castration by a series of injections of female hormones. He took his own life just two years later.

Thousands of people have come together to demand justice for Alan Turing and recognition of the appalling way he was treated. While Turing was dealt with under the law of the time and we can't put the clock back, his treatment was of course utterly unfair and I am pleased to have the chance to say how deeply sorry I and we all are for what happened to him. Alan and the many thousands of other gay men who were convicted as he was convicted under homophobic laws were treated terribly. Over the years millions more lived in fear of conviction.

I am proud that those days are gone and that in the last 12 years this government has done so much to make life fairer and more equal for our LGBT community. This recognition of Alan’s status as one of Britain’s most famous victims of homophobia is another step towards equality and long overdue.

But even more than that, Alan deserves recognition for his contribution to humankind. For those of us born after 1945, into a Europe which is united, democratic and at peace, it is hard to imagine that our continent was once the theatre of mankind’s darkest hour. It is difficult to believe that in living memory, people could become so consumed by hate – by anti-Semitism, by homophobia, by xenophobia and other murderous prejudices – that the gas chambers and crematoria became a piece of the European landscape as surely as the galleries and universities and concert halls which had marked out the European civilisation for hundreds of years. It is thanks to men and women who were totally committed to fighting fascism, people like Alan Turing, that the horrors of the Holocaust and of total war are part of Europe’s history and not Europe’s present.

So on behalf of the British government, and all those who live freely thanks to Alan’s work I am very proud to say: we’re sorry, you deserved so much better.

Gordon Brown

If you would like to help preserve Alan Turing's memory for future generations, please donate here: http://www.bletchleypark.org.uk/

Petition information - http://petitions.number10.gov.uk/turing/"


I won't link to the Downing Street site version because the comments might as well have come from the Daily Mail's letters page.

Friday, August 28, 2009

BBC News: File-sharers' TV tastes revealed

The BBC have a story based on some download statistics for TV & film that makes interesting reading. If the TV network executives read this and think "threat" rather than "opportunity" they are missing a trick. What if they uploaded the torrent themselves simultaneously with transmission including the adverts? Nobody is going to bother with an illegal version if this one is available immediately and is of high quality. Only a hardcore pirate would bother with an ad-free version because this would lag slightly behind the official version in terms of time to release, but crucially the number of seeds, and hence the download speed, would be significantly lower.

Yes people could zap through the ads, but that is no different to "TIVO" on broadcast anyway and only a schmuck watches things live these days anyway. If only 2% of people sat through the ads, that is still 1 million people you don't have today in a demographic you would normally struggle to reach.

There is a secret power in "global" as well. Advertisers talk about "playground repeats", which is when slogans and jingles are repeated in the school yard. If your adverts start to permeate the global culture a synergy emerges where the phrases and concepts become a lingua franca in online conversations.

Infrastructure requirements: a laptop, a big disk, and a decent uplink; not bad if you want to reach more than 50 million people worldwide. If it really takes off you could even do away with that expensive array of radio towers.

A halfway solution is no good. Custom players, region and DRM locking, proprietary standards, etc... will always be a niche play against the power of free. Open it up and let the community build the software and hardware to support it. Stick to what you know: making TV shows.

I don't pretend that this is a good solution for films, but it is perfect for TV.

Friday, August 21, 2009

ASP.NET File Permissions

Short Story:

Use Process Monitor to help resolve ASP.NET file permission problems.

Long Story:

I have some files on one machine that I would like to make available through one of our company websites. We use IIS for our hosting, so I created a file share on the machine with the files and linked to it using a Virtual Directory. Now when resources are accessed via IIS they do so as the IUSR account (unless other authentication options are enabled), so I added read permissions for this user to my files security settings and to the sharing permissions (I always forget to do both), but it did not work. There may be some fancy ASP logging that tells you why, but I have always struggled to get the ASP logs to do anything very useful. Several smart people on the Internet recommended using Process Monitor from Microsoft (previously from Sysinternals). Everything that the Sysinternals guys make is made of pure awesome, and this is no exception. It allows you to search for the ACCESS DENIED operation (a CreateFile function call), and this shows all of the relevant details, including the user credentials used. In my case it was IUSR, or actually /IUSR_, which is some sort of domain equivalent. Exactly the user I had set the permissions for, so why wasn't it working?

Running Process Monitor on the file hosting machine instead reveals the problem: the file access was impersonating a different user, namely /$. It seems that this is the default behaviour across machine boundaries unless you are explicitly impersonating a designated user, in which case those credentials are passed intact.

The fix I chose may not be the recommended solution, but it worked for me: explictly impersonate IUSR to force the credentials to be passed across the machine boundary. To do this you need the IUSR password, which can be obtained by following this helpful advice. Whatever you do, don't change the IUSR password or you may unwittingly open a portal to a new dimension of pain.

Monday, July 20, 2009

Richard Feynman: The Character of Physical Law

Feynman's incredible lecture series is now freely available on video here: http://research.microsoft.com/apps/tools/tuva/index.html.

It is wonderfully complete, including the VT header, which I think should begin all videos.

It's all made possible by someone called "Gates".