Tuesday, November 29, 2011

Android Permissions - Protection Levels

Android applications declare the permissions they are likely to require in their manifest (a short file that describes the contents of the 'package'). This allows the system to sandbox them from critical resources and gives the user some indication of what havoc they might reap. That's the theory at least, but the first time I installed an application and read the permissions page I had no idea what they were on about! Clearly this system needs to be changed, but that is not what I want to talk about today.

As an application writer I need to know the protection level of these permissions, i.e. which of these permissions are normal (can cause the user no real harm), dangerous (might require a greater level of trust, such as the ability to read SMS messages), signature (only granted to applications that are signed by the people who built the OS), signatureOrSystem (like signature, but also allowed if they have been pre-installed in a system folder). I was surprised to find no easy reference for this in the documentation, but I did find the relevant information in the source.

You can of course probe the android package itself for this information, which is useful if you don't have access to the particular version of Android you are running. Here is some code that does just that:

// Get the permissions for the core android package
PackageInfo packageInfo = getPackageManager().getPackageInfo("android", PackageManager.GET_PERMISSIONS);
if (packageInfo.permissions != null) {
  // For each defined permission
  for (PermissionInfo permission : packageInfo.permissions) {
    // Dump permission info
    String protectionLevel;
    switch(permission.protectionLevel) {
    case PermissionInfo.PROTECTION_NORMAL : protectionLevel = "normal"break;
    case PermissionInfo.PROTECTION_DANGEROUS : protectionLevel = "dangerous"break;
    case PermissionInfo.PROTECTION_SIGNATURE : protectionLevel = "signature"break;
    case PermissionInfo.PROTECTION_SIGNATURE_OR_SYSTEM : protectionLevel = "signatureOrSystem"break;
    default : protectionLevel = "<unknown>"break;
    }
    Log.i("PermissionCheck", permission.name + " " + protectionLevel);
  }
}

...and here are the results in case you need to know them at a glance...

PermissionProtection Level
android.intent.category.MASTER_CLEAR.permission.C2D_MESSAGEsignature
android.permission.ACCESS_CACHE_FILESYSTEMsignatureOrSystem
android.permission.ACCESS_CHECKIN_PROPERTIESsignatureOrSystem
android.permission.ACCESS_COARSE_LOCATIONdangerous
android.permission.ACCESS_FINE_LOCATIONdangerous
android.permission.ACCESS_LOCATION_EXTRA_COMMANDSnormal
android.permission.ACCESS_MOCK_LOCATIONdangerous
android.permission.ACCESS_NETWORK_STATEnormal
android.permission.ACCESS_SURFACE_FLINGERsignature
android.permission.ACCESS_WIFI_STATEnormal
android.permission.ACCOUNT_MANAGERsignature
android.permission.ASEC_ACCESSsignature
android.permission.ASEC_CREATEsignature
android.permission.ASEC_DESTROYsignature
android.permission.ASEC_MOUNT_UNMOUNTsignature
android.permission.ASEC_RENAMEsignature
android.permission.AUTHENTICATE_ACCOUNTSdangerous
android.permission.BACKUPsignatureOrSystem
android.permission.BATTERY_STATSnormal
android.permission.BIND_APPWIDGETsignatureOrSystem
android.permission.BIND_DEVICE_ADMINsignature
android.permission.BIND_INPUT_METHODsignature
android.permission.BIND_WALLPAPERsignatureOrSystem
android.permission.BLUETOOTHdangerous
android.permission.BLUETOOTH_ADMINdangerous
android.permission.BRICKsignature
android.permission.BROADCAST_PACKAGE_REMOVEDsignature
android.permission.BROADCAST_SMSsignature
android.permission.BROADCAST_STICKYnormal
android.permission.BROADCAST_WAP_PUSHsignature
android.permission.CALL_PHONEdangerous
android.permission.CALL_PRIVILEGEDsignatureOrSystem
android.permission.CAMERAdangerous
android.permission.CHANGE_BACKGROUND_DATA_SETTINGsignature
android.permission.CHANGE_COMPONENT_ENABLED_STATEsignature
android.permission.CHANGE_CONFIGURATIONdangerous
android.permission.CHANGE_NETWORK_STATEdangerous
android.permission.CHANGE_WIFI_MULTICAST_STATEdangerous
android.permission.CHANGE_WIFI_STATEdangerous
android.permission.CLEAR_APP_CACHEdangerous
android.permission.CLEAR_APP_USER_DATAsignature
android.permission.CONTROL_LOCATION_UPDATESsignatureOrSystem
android.permission.COPY_PROTECTED_DATAsignature
android.permission.DELETE_CACHE_FILESsignatureOrSystem
android.permission.DELETE_PACKAGESsignatureOrSystem
android.permission.DEVICE_POWERsignature
android.permission.DIAGNOSTICsignature
android.permission.DISABLE_KEYGUARDnormal
android.permission.DUMPdangerous
android.permission.EXPAND_STATUS_BARnormal
android.permission.FACTORY_TESTsignature
android.permission.FLASHLIGHTnormal
android.permission.FORCE_BACKsignature
android.permission.FORCE_STOP_PACKAGESsignature
android.permission.GET_ACCOUNTSnormal
android.permission.GET_PACKAGE_SIZEnormal
android.permission.GET_TASKSdangerous
android.permission.GLOBAL_SEARCHsignatureOrSystem
android.permission.GLOBAL_SEARCH_CONTROLsignature
android.permission.HARDWARE_TESTsignature
android.permission.INJECT_EVENTSsignature
android.permission.INSTALL_LOCATION_PROVIDERsignatureOrSystem
android.permission.INSTALL_PACKAGESsignatureOrSystem
android.permission.INTERNAL_SYSTEM_WINDOWsignature
android.permission.INTERNETdangerous
android.permission.KILL_BACKGROUND_PROCESSESnormal
android.permission.MANAGE_ACCOUNTSdangerous
android.permission.MANAGE_APP_TOKENSsignature
android.permission.MASTER_CLEARsignatureOrSystem
android.permission.MODIFY_AUDIO_SETTINGSdangerous
android.permission.MODIFY_PHONE_STATEdangerous
android.permission.MOUNT_FORMAT_FILESYSTEMSdangerous
android.permission.MOUNT_UNMOUNT_FILESYSTEMSdangerous
android.permission.MOVE_PACKAGEsignatureOrSystem
android.permission.PACKAGE_USAGE_STATSsignature
android.permission.PERFORM_CDMA_PROVISIONINGsignatureOrSystem
android.permission.PERSISTENT_ACTIVITYdangerous
android.permission.PROCESS_OUTGOING_CALLSdangerous
android.permission.READ_CALENDARdangerous
android.permission.READ_CONTACTSdangerous
android.permission.READ_FRAME_BUFFERsignature
android.permission.READ_INPUT_STATEsignature
android.permission.READ_LOGSdangerous
android.permission.READ_OWNER_DATAdangerous
android.permission.READ_PHONE_STATEdangerous
android.permission.READ_SMSdangerous
android.permission.READ_SYNC_SETTINGSnormal
android.permission.READ_SYNC_STATSnormal
android.permission.READ_USER_DICTIONARYdangerous
android.permission.REBOOTsignatureOrSystem
android.permission.RECEIVE_BOOT_COMPLETEDnormal
android.permission.RECEIVE_MMSdangerous
android.permission.RECEIVE_SMSdangerous
android.permission.RECEIVE_WAP_PUSHdangerous
android.permission.RECORD_AUDIOdangerous
android.permission.REORDER_TASKSdangerous
android.permission.RESTART_PACKAGESnormal
android.permission.SEND_SMSdangerous
android.permission.SET_ACTIVITY_WATCHERsignature
android.permission.SET_ALWAYS_FINISHdangerous
android.permission.SET_ANIMATION_SCALEdangerous
android.permission.SET_DEBUG_APPdangerous
android.permission.SET_ORIENTATIONsignature
android.permission.SET_PREFERRED_APPLICATIONSsignature
android.permission.SET_PROCESS_LIMITdangerous
android.permission.SET_TIMEsignatureOrSystem
android.permission.SET_TIME_ZONEdangerous
android.permission.SET_WALLPAPERnormal
android.permission.SET_WALLPAPER_COMPONENTsignatureOrSystem
android.permission.SET_WALLPAPER_HINTSnormal
android.permission.SHUTDOWNsignature
android.permission.SIGNAL_PERSISTENT_PROCESSESdangerous
android.permission.STATUS_BARsignatureOrSystem
android.permission.STOP_APP_SWITCHESsignature
android.permission.SUBSCRIBED_FEEDS_READnormal
android.permission.SUBSCRIBED_FEEDS_WRITEdangerous
android.permission.SYSTEM_ALERT_WINDOWdangerous
android.permission.UPDATE_DEVICE_STATSsignature
android.permission.USE_CREDENTIALSdangerous
android.permission.VIBRATEnormal
android.permission.WAKE_LOCKdangerous
android.permission.WRITE_APN_SETTINGSdangerous
android.permission.WRITE_CALENDARdangerous
android.permission.WRITE_CONTACTSdangerous
android.permission.WRITE_EXTERNAL_STORAGEdangerous
android.permission.WRITE_GSERVICESsignatureOrSystem
android.permission.WRITE_OWNER_DATAdangerous
android.permission.WRITE_SECURE_SETTINGSsignatureOrSystem
android.permission.WRITE_SETTINGSdangerous
android.permission.WRITE_SMSdangerous
android.permission.WRITE_SYNC_SETTINGSdangerous
android.permission.WRITE_USER_DICTIONARYnormal
com.android.browser.permission.READ_HISTORY_BOOKMARKSdangerous
com.android.browser.permission.WRITE_HISTORY_BOOKMARKSdangerous

Thursday, June 30, 2011

Cloud Production

I've always been fascinated by 3D printers and recently ordered one from the crowd-funded Huxley project.

This project has raised far more than the initial expectation, and has forced the company that runs it to increase it's production capabilities by an order of magnitude. Now because these printers can print parts for themselves, eMaker are reaching out to other 3D printer owners to help them cope with the demand.

Imagine a future where you order something online and rather than coming from a local warehouse it is manufactured in a local facility that can make anything. As long as the quality is acceptable you don't have to know or care where it came from. The benefits of this from a production and supply chain point of view are enormous and efficiencies in the supply chain would mean cheaper goods for consumers. Couple this with the environmental benefits (less transportation and waste) and you have a game-changing technology: Cloud Production.

Friday, May 27, 2011

Passpack - Online password management

Did I bore you about Passpack yet? If not read on...



Passpack is a website that manages your passwords and other login details. It is simple to use and allows you to share passwords with colleagues and family members.

For instance, if you add a link to the login page for the site, it will auto-complete the login fields (using a bookmarklet).

I can also recommend the automatic password generation, which helps you avoid principal danger of password reuse.

It has a neat system whereby the passwords are decrypted locally in the browser using your security pass-phrase. This means that even the Passpack folks can't see your passwords. Of course this means you should keep you pass-phrase written down somewhere - I recommend keeping it with your will :-)

For extra simplicity, you can login with your ID from Google, Facebook, Twitter, or and OpenID provider. This doesn't help with the pass-phrase, but it stops you needing two passwords: one to login and one to decrypt your data.

Until true SSO is a reality, this makes identity management much simpler.

Thursday, May 05, 2011

ShareSafe.TV

I often want to share YouTube videos with my kids, but they are surrounded by links to other videos, which can often be unsuitable. ShareSafe.TV displays only the video you want to show and nothing else. Use their link generator or just add v/<video id> to the end of their URL.

Here is an example: http://www.safeshare.tv/v/YXM3wrIhcwY

Wednesday, April 20, 2011

MSTest and 64bit

This post is about running MSTest for applications that target mixed platforms.

If you are lucky enough to be able to write your applications in pure .NET, then you may never encounter 32bit/64bit platform issues. However, if any dependent library or plug-in is compiled for a specific architecture, then your whole application must be run in that mode. This is why the default Window's Internet Explorer is still 32bit despite the 64bit version shipping since Vista: it has to be the same architecture as any legacy plug-ins. By contrast, Notepad doesn't have any plug-ins, so it can get away with being 64bit only.

My companies applications rely on many native libraries, which are obviously compiled for specific architectures (x86 and x64). Deploying an application for multiple target processors is a complex subject in itself that can be solved with a range of strategies from dynamic library linking to processor-specific installers, but however you deploy, your application will behave differently in these two different modes so they must both be tested.

For better or for worse, we use MSTest to control application quality. Since the release of Visual Studio 2010 this has been able to run in 64-bit mode as well as 32-bit mode, but there are certain subtleties that complicate the practical aspects of administering your tests.

To understand the problem, consider the way MSTest works: Testing is done using two programs MSTest.exe and QTAgent32.exe. MSTest is told what assemblies to load and it scans those assemblies (using reflection) to find any classes and methods annotated as tests using the various test attributes. To do this it must be able to load the assembly and all its dependent assemblies and because MSTest is a 32bit process, none of these assemblies can be exclusively 64bit. Once loaded, MSTest instructs QTAgent32 to run these tests, which means QTAgent32 must load the assemblies itself and execute the test methods, but because it is also a 32bit process it cannot load 64bit assemblies either.

In Visual Studio 2010 a new version of QTAgent32 was added called QTAgent.exe, which can run 64bit assemblies. This means that even though MSTest is still 32bit, QTAgent can execute in full 64bit mode so that pure .NET assemblies can now be tested in 32bit and 64bit mode. However, it still doesn't easily allow applications with mixed-mode assemblies to be tested in 64bit mode because they cannot be loaded by MSTest in the first place.

One interesting solution to this is to force MSTest.exe to be a 64bit application. This implies that MSTest is actually pure .NET code anyway, but has been forced to run in x86 mode. If you are going down this road, note that MSTest relies on various registry entries (HKLM\SOFTWARE\Microsoft\VisualStudio\10.0\EnterpriseTools\QualityTools\TestType and HKLM\SOFTWARE\Microsoft\VisualStudio\10.0\Licenses) to decide which extensions it can handle and what features are licensed for use, and that these are installed by default to the WOW6432Node registry "shadow" branch. To run in 64bit mode you must copy some of the registry entries over as well as editing the binaries themselves.

There is an alternative approach that doesn't involve editing executable files and local machine registry settings (which can be a pain across a large development team). Our application builds for two distinct platform targets x86 and x64 (note however that most assemblies are compiled as "Any CPU" except the ones that contain native code) and test projects are only built in the x86 solution configuration. This ensures that the code in their bin folder is 32bit compatible, and they can therefore be loaded into MSTest. Also, the tests are configured to run against the binaries in the actual application deployment folder rather than running in their binary folder using the new root folder feature of Visual Studio 2010:



In this case we have used an environment variable that signifies where to find the deployed binaries. Whether you do this or not, it seems to always want a full path for one reason or another, which can make supporting multiple development environments a challenge.

We make one of these test configurations for each target platform, remembering to change the Hosts section that controls 32bit and 64bit execution. Then we can run both configurations from the command line like this:

mstest /testsettings:WorkStation32.testrunconfig /testmetadata:SOLUTION.vsmdi
mstest /testsettings:WorkStation64.testrunconfig /testmetadata:SOLUTION.vsmdi

The trick here is that in both cases MSTest will load the 32bit binaries to decide what tests to run, but the different configuration files will control if QTAgent32 or QTAgent is used. Note that this cannot work with the /noisolation switch, because MSTest cannot host the 64bit binaries.

The disadvantage of running your unit tests against the deployment folder is that your tests are less "clean" and test failures could take longer to diagnose. The advantage is that the tests are being run on code as it will appear in the wild, which can include complex deployment features such as assembly obfuscation.

This system will work on development desktops and build servers. It may give the Visual Studio IDE pause for thought occasionally, but it is fundamentally compatible, which is one of the only real advantages of MSTest in the first place.

Friday, February 18, 2011

IBM Watson plays Jeopardy

This really changes everything: http://www.viddler.com/explore/engadget/videos/2393/

The most amazing computer demonstration I have ever seen.

Wednesday, May 12, 2010

Microsoft's Click-to-Run and Office Automation

Today's lesson: When Office is installed using Click-to-Run, it doesn't support automation.

We use Excel automation via C# in our application, and when testing against the new versions of Office we hit a bump in the road. Office Home and Business 2010 typically installs via Click-to-Run, which is designed to have a small footprint, and as such does not register itself for programmatic automation.

So when you hear this sound: "Retrieving the COM class factory for component with CLSID {00024500-0000-0000-C000-000000000046} failed due to the following error: 80040154" even though Excel is apparently installed, you probably have this issue.

More here about Excel automation and the expected registry keys.

It is suggested that there will be an alternative MSI-based installer that presumably will not have this problem.

Wednesday, February 24, 2010

Vertical Alignment in CSS

I know I should use CSS, but sometimes I fall off the wagon and use tables instead. This usually happens when I want to vertically align content. Here are some very clear hints and tips about vertical alignment that may help me kick the habit: http://phrogz.net/CSS/vertical-align/index.html

Sunday, January 03, 2010

Content Delivery Networks

My company has a very specific requirement: we need to get our application to any desktop in the world in less than three minutes. There are business drivers for this that I shall not go into; basically it is so that potential customers don’t get bored waiting for our application to install and run. We are currently failing to do that for all users, and we suspect we are losing customers because we fall at the first fence.

The Problem is Discovered

Our installer is about 50MB, which is not huge, but we have been seeing an enormous variation in deployment times to various parts of the world. Currently we use a UK-based hosting service with high symmetric bandwidth, but routine log analysis revealed that the install times for some users exceeded 10 minutes, and many did not complete. A quick web search revealed that this is a well known problem, so well known in fact that there are many commercial solutions that come under the generic title of Content Delivery Networks (CDNs). The big players are companies like Akami and Limelight, but I am allergic to companies that won’t tell you the price, and I suspect our needs are too modest to be worth their while addressing. There is however a new class of companies like GoGrid emerging and there are established hosting players like Amazon (with CloudFront) and Rackspace (using Limelights CDN network) who are offering CDNs. The new-kid-on-the-block is Microsoft, who beta-launched the Azure CDN solution just as my investigations began.

CDN, like all hosting, is a highly commodified product. There are certainly modest differences in terms of things like upload flexibility (Azure stinks), clever torrent links (Amazon S3 rocks), and general UI friendliness, but there were no showstoppers. The only really important metrics are speed, reliability, and cost. Cost was easy, everyone who didn’t make it clear on their website in the first two minutes was discarded (are you starting to understand our business drivers now?), and the remaining companies were all so cheap that it wasn’t worth worrying about. This is because we are talking about a very small amount of data 50MB x 100 installs per month = 5GB and the pricing is never more than about 25 cents per GB. These businesses are built for large streaming media and Flash media files, not for tiny desktop installers like ours.

Reliability next: we are not particularly concerned about reliability given that we are statistically unlikely to lose enough business in the difference between four nines and five nines to make it worth basing a decision on. Everybody can do four nines.

So that left speed, which comes in two flavors: latency and bandwidth. Latency is critical for that snappy website that puts your shop window in front of the customer in less than a few seconds (which is sometimes all you have). Incidentally, I didn’t come across any CDN webhosts, particularly ones that support ASP.NET, but you have to imagine it is coming from Azure. In our case, bandwidth was going to dominate so that is what we needed to know about.

During my research, I came across Ryan Kearney’s comparison of CDN providers. He gives a great round-up of the price and features of many of the providers, as well as some latency statistics for a handful of international locations. He was kind enough to host a file for my test rig on his Rackspace account, which was much appreciated.

So there are plenty of CDN providers, but very little information available to allow you to compare them. For instance, India and China are two very important markets for us, but what is the bandwidth to them from each of the providers? Clearly we needed to do some measurements.

The Game is Afoot

How do you measure the bandwidth of a host to every country in the world? Well, there are many companies that offer website monitoring and will alert you if your website goes down, some of these have international monitoring capabilities, and some of them have page download time statistics. However, to get an accurate picture of download speeds you need a fairly sizable file so that the bandwidth lag dominates other factors such as DNS resolution or server latency. Only one web monitoring service actually downloaded the whole file, allowing us to make an accurate estimate of bandwidth. They are WebSitePulse, and I could not have done this analysis without them. They have the most monitoring stations in the world, the most detailed statistics, and a 30 day free trial, which I used for this investigation. I highly recommend them to anyone looking for sophisticated, international, web site monitoring.

We created a test file called Test1MB.zip, which was a zip file that was truncated to exactly 1MB. A zip file is largely incompressible and the extension stops most servers from trying (actually few offer HTTP compression, which is a serious omission but beyond the scope of this post). This was mounted on multiple hosts and WebSitePulse was configured to download the files periodically. The WebSitePulse trial limits you to 20 monitor stations at a time (and excludes Auckland and Melbourne), and I didn’t have access to all of the hosts from the beginning, so the statistics are not done to laboratory standards. However, the statistical picture that emerges is reliable enough to allow business decisions to be made.

The Runners and Riders

Host CDN Capable Notes
RapidSwitch No Our current host and representative of good quality hosting in the UK.
Azure CDN Yes Still in beta, and we literally started using it the day it opened, so there were teething problems.
Rackspace Yes Huge player in hosting and cloud computing.
Amazon CloudFront Yes CDN at the front, Amazon’s S3 at the back. Nominally still in beta, but frankly charging for something means you must be judged as a commercial product.
Amazon S3 No Our S3 hosting is in the US, so this is the standard candle for US-based cloud hosting.
GoGrid CDN Yes A high number of international points-of-presence, and more on the way.

Very few of the CDN companies offer free trials for some reason, but I think all are pay-as-you-go, which costs pennies for what we want. It took a bit of back-and-forth to get my GoGrid account set up, but their Twitter guy was great at fixing the problem once I made him aware of it. This meant that there are slightly less results for GoGrid. The whole trial ran for the best part of a month with roughly 15 minute poll times for every host. I had to change things around a bit as I went along to stay within the T&C’s of the WebSitePulse trial – you get $1000 to play with in total.

The following locations were monitored: Amsterdam, Bangalore, Beijing (2 monitors), Boston, Brisbane, Buenos Aires, Chicago, Dusseldorf, Guangzhou, Hong Kong, Houston, London, Los Angeles, Miami, Montreal, Mumbai, Munich, New York, Paris, San Francisco, Sao Paulo, Seattle, Shanghai, Singapore, Stockholm, Sydney (2 monitors), Tokyo, Toronto, Trumbull, Vancouver, Washington

The Results

The summary of the results is shown below:

Host Uptime Average 1MB DL Time (s)
GoGrid

100.00%

2.03

Rackspace CDN

100.00%

2.70

Amazon CloudFront

100.00%

4.46

Azure CDN

99.52%

4.67

Amazon S3

100.00%

5.04

RapidSwitch

99.98%

7.43

Here are the detailed results for all of the monitoring stations and hosts sorted into average download time order:

  GoGrid Rackspace Amazon CloudFront Azure CDN Amazon S3 RapidSwitch Average

New York

0.12

0.19

0.24

1.00

0.50

1.39

0.57

Boston

0.17

0.24

0.42

1.06

0.54

1.31

0.62

Trumbull

0.16

0.39

0.55

1.38

0.50

1.36

0.72

Washington

0.21

0.36

0.98

1.04

0.34

1.80

0.79

Houston

0.24

0.34

0.30

0.73

1.20

2.20

0.84

Paris

0.23

0.31

0.40

2.39

1.78

0.24

0.89

Dusseldorf

0.20

0.29

0.18

3.08

1.92

0.27

0.99

Amsterdam

0.16

0.15

0.47

2.43

2.70

0.24

1.03

Chicago

0.05

0.19

1.95

1.02

1.60

1.48

1.05

San Francisco

0.30

0.30

0.22

1.58

1.83

2.31

1.09

London

0.15

0.37

0.40

3.68

1.89

0.18

1.11

Vancouver

0.15

0.41

0.23

1.57

1.64

2.69

1.12

Toronto

0.40

0.91

0.48

2.58

1.99

1.67

1.34

Seattle

0.16

0.31

0.21

2.15

1.64

3.66

1.36

Munich

0.63

0.29

0.31

4.17

3.06

0.67

1.52

Miami

0.35

4.06

0.70

1.72

0.83

2.25

1.65

Stockholm

0.71

0.20

0.82

4.48

4.82

0.67

1.95

Los Angeles

0.29

0.39

0.34

2.99

3.81

6.47

2.38

Sao Paulo

2.53

2.77

2.70

2.79

3.49

3.50

2.96

Brisbane

0.45

0.42

3.40

4.26

5.96

6.00

3.42

Tokyo

2.00

1.01

1.40

3.04

4.35

8.84

3.44

Sydney

1.17

1.25

3.19

4.15

5.81

5.78

3.56

Bangalore

3.32

0.76

1.75

4.61

8.33

2.66

3.57

Sydney 2

0.18

4.63

3.74

5.07

7.41

5.29

4.39

Montreal

1.01

1.57

1.46

12.44

2.14

8.40

4.51

Mumbai

4.23

2.96

2.00

4.87

10.40

3.70

4.69

Buenos Aires

5.80

7.03

6.41

7.60

6.25

5.33

6.40

Singapore

3.52

1.27

3.24

9.66

8.67

13.62

6.66

Hong Kong

1.24

1.76

1.24

7.65

9.28

26.92

8.02

Beijing 2

5.72

8.23

11.62

7.95

11.20

17.53

10.38

Guangzhou

5.43

10.99

19.48

8.23

10.30

10.91

10.89

Beijing

8.50

14.33

23.92

13.37

16.29

71.55

24.66

Shanghai

17.32

20.56

52.27

19.37

23.83

24.16

26.25

Average

2.03

2.70

4.46

4.67

5.04

7.43

4.39

image

Here are the raw stats if you would like to do any further analysis of your own.

Conclusions

Clearly GoGrid and Rackspace are the best providers from the hosts tested. GoGrid has the best average performance and is unbeaten to almost all of the monitoring stations.

Asia is very badly served by all the hosts tested. Obviously there are dedicated hosting services for Asia, but the whole point of a CDN is that it is global. I expect partnerships are being drafted as I type.

Amazon S3 barely outperforms CloudFront on average, but peak download times per city are much better in some cases.

Montreal did much worse than I expected given that Canada is so well connected to the US.

Amazon and Azure CDN’s both perform equally well, although the uptime of Azure looks bad. Actually the Azure uptime was only really bad for the first few days, after that it was very good, so it is probably not a fair measure.

Did We Win?

Our original aim was to move 50MB in less than three minutes. Therefore our target time for 1MB is 180 / 50 = 3.6 seconds. Even with the fastest CDN host, we are still failing to meet this target for several cities. For Shanghai, we are a factor of five off. And of course this is before we get from where the monitoring stations are (which is probably a well connected hub) out to users at the network edge.

So big-iron can help us make significant improvements for very little effort and cost, but the war goes on. I might tell you how we finally win in a future post. Hint: we make the installer smaller.

Google AdSense Fraud

As I mentioned in my last post, we stopped advertising through Google’s Content Network very soon after we started experimenting with AdSense when we thought we had detected significant fraud. Having not spent much money by this time, the amounts involved were relatively small (less than £100), but the fraction was high. We considered at least 1/3 of our clicks to be fraudulent: deliberately, criminally fraudulent.

Proud of our forensic IT skills, we rushed to Do-No-Evil Google to report our discovery. Our report contained the bare minimum of facts; we were so convinced it screamed fraud we did not bother to go into much more detail than the websites and the Click Through Rates (CTRs). After a week we had a polite and considered response that attempted to persuade us that this was not fraud. Clearly more evidence was required, so we put together a verbose description of the main points that alerted us, and sent it off in expectation of an apology if not a seat on the board for our fraud-busting smarts.

Sadly they refused / failed to be convinced and eventually we had to agree to disagree. Obviously we will continue to use Google for advertising – what choice is there for an online business? But we firmly believe they need to put their house in order. If 1/3 of their revenue is fraudulent they will lose consumer confidence and possibly face sanctions for complicity.

Below I have included the email chain. The name of the operative has been removed as the problem is systemic not personal. The key point, and one I should have put front-and-center (in CAPS perhaps?), is under Red Flag 2 – Set 2. Namely that a badly named, unlinked, and parked-domain website achieved ten times the CTR of Google’s own homepage.

I would very much like to hear from anyone with similar experiences, or from anyone (at Google or otherwise) who disagrees with any of the points we made. Our transaction volume is small and our market is niche, so I am aware we need more data to make a statistically significant conclusion.

Initial Inquiry

[From Rupert to Google on 5/11/09]

Nearly *one third* of all content based CPC placements were obvious click fraud.

Many of the sites have exactly the same content and have a CTR of 100% (or greater in one case!).

I find it incredible that the market leader cannot spot something like this automatically. The fraudsters are not even trying to disguise it.

[Information about the site URLs and the fraud period]

Google’s Response

Hello Rupert,

We have received your request for an invalid clicks investigation. Thank you for your patience while we reviewed your account. I apologize for our delayed response. I understand you are concerned about the quality of clicks you have accrued from certain sites in our content network.

We reviewed your account and can confirm clicks from these sites. However, we found that these clicks are valid, and there is no activity that suggests you have been charged for invalid clicks. The clicks charged fit a pattern of normal user behaviour. As part of our review, the team looked through dozens of data points--including IP addresses, IP blocks, geographic concentrations, network activity, browser patterns, click timings, and any proprietary signals. However, none of those suggest an automated attack, nor collusion from unethical users. The clicks accrued reflect normal user traffic.

Many of the sites that you listed are parked domain sites. A parked domain site is an undeveloped web page belonging to a domain name registrar or domain name holder. Our AdSense for domains programme places targeted AdWords ads on parked domain sites that are part of the Google Network.

Users are brought to parked domain sites when they enter the URL of an undeveloped web page in a browser's address bar.

We've found that AdWords ads displayed on parked domain sites receive clicks from well-qualified leads within the advertisers' markets. In general, we've noticed that the return on investment gained on these pages is equal to or better than that gained on other pages in the search and content networks. However, if you aren't satisfied with the value of the traffic, you can prevent your ads from showing on parked domain sites by using the Site and Category Exclusion tool. Learn how at https://adwords.google.com/support/bin/answer.py?answer=86695&hl=en_GB.

I hope that this information helps address your concern. Please let me assure you that your security is a top priority for Google, and we will continue to monitor all clicks on your ads to prevent abuse. Let us know if you have further questions or if we can be of any more assistance. For more information about steps we take to combat invalid click activity, please visit https://adwords.google.com/support/bin/answer.py?answer=6114&hl=en_US.

Sincerely,

<Google Employee>

The Ad Traffic Quality Team

Rupert’s More Detailed Description

Hi <Google Employee>,

Thank you for getting back to me. I'm afraid I am still doubtful of the validity of these sites. Allow me to illustrate my concerns with some examples:

Red Flag 1

There are 10 sites in the list with a 100% CTR (radiolluvia.com even has 200%):

Domain

Clicks

Impressions

CTR

radiolluvia.com

2

1

200.00%

net-ebooks.com

1

1

100.00%

umtsfree.net

1

1

100.00%

littleabout.com

1

1

100.00%

mtncareer.com

1

1

100.00%

jonefm.com

1

1

100.00%

radiobendele.com

1

1

100.00%

iphalloween.info

1

1

100.00%

pdfee.com

1

1

100.00%

rf-online.com

1

1

100.00%

None of them contain any relevant content (which would be fine - I understand AdSense can never be a science), but most of them consist only of AdSense links. Are you really suggesting that users happen upon a parked domain by typing the URL above into the browser and happen to be in the market for radio planning software (which is what we make)?

We don’t make a mass-market product so I would never expect a high CTR on content networks even when targeting radio engineers – only a minority of them are even actively seeking our type of tool.

Red Flag 2

There are at least three sets of pages that contain practically the same content and come from the same IP subrange.

Set 1: gsmsandwich.com, jonefm.com, keonong.com, mtncareer.com, rf-indo.com, smsgupsup.com. They all look like this:

Set1  

Set 2: umtsfree.net, xlgprs.net, xlgprs.com, ir-hot.com. They all look like this:

Set2

I find it unlikely that in the space of one week, four people were using these sites as some sort of search or index portal (have they not heard of Google? J) and either searched to or browsed to our advert and found it relevant enough to click on. This was after only 254 impressions. Contrast this with Google’s own sponsored search results, which yielded only three hits for 2,051 impressions during the same period. If you really believe these statistics are true, surely you should buy this company immediately because they are 10 times better at advert placement than you are. Perhaps you should consider a smiling coed on the homepage?

Set 3: gamezerm.com, and radiobendele.com both have the same IP and look the same:

Set3

Can it be a coincidence that they have high CTR (50% and 100% respectively)?

In general, all of the sites in these sets have dubious registration details, often using the same registration anonymity service.

Red Flag 3

The click-through path is curious for the links from these sites. For instance, the umtsfree.net site links go through five 302 redirects before landing at the intended target. Here is the chain of URLs:

http://umtsfree.net/forward302.aspx?epc=eWpDPeDkCn7%2fPAWjJDHHizukuSQO4Z0sDU8KfdC9FQ%2b5yWjWwcxv5hXcA5nQpS0OqiEn07sYHTHFe%2fX9vFjDwUcSW01%2bS4WPEL0m7%2fX3z100tRxVe1Mg2zzaXK862vPp7hIJBvAoVV9DPRmnuG%2fkV0w5tbowrxB4AbcTtxa0Bsr%2fCztN7vTUOE0hGYneCC9V5jEY3PRhY5SAeWBCuCp7NzUBODKuSrYrmWbY4g3PHs9mBH08pqUSaY75VuOBggtVC6D5WjIQEZuFNJS10GQ6Bu%2f0JpRTb3xpAWZf4bPOguFyT3zwx6udcQe031GVCTob%2bAk5n3HzuAg2AOTMKncWxG%2bPl6vLUW4DWYQil2ZmY2ILRGYWgOHHAfIlNM1AHowYkUvb%2bBrYHbEQgD6PTID5%2fuaj3OsxHAwLVlhrL3uuu1S3zI4g9mOUab2fnM8yr%2brQmzu2a6UmflkA8s6PaAElxBNg9kZnshsusDIleugD02G6c%2bCTybRaQ0D1IYTOcfyeJLFDejgK2GqObGWs9Nm6J32886U0STHIAz75%2f%2b2snbAtAJQT48cwhAH%2fNQ%2fpaJiHkSON1cIxd7oFroekPJ8iyDhbYZ3VP1TJ0Z7HoHj6HKjeDaemj6LFb1Le0uwGKeLe%2bKc6LxdhYBtjD%2bXnGi0LkIajkmbqWe73rvyLNoRhtd1SsqgDT7wxMUBmPIaisppXnfp%2b8%2ftjiL8R9LU6QFh%2f8%2b4aTQGsrMOTyj55aZ4hx8l3UJck6utVoeVax7%2bOQACwyLYoyyvI4ml9DVsz%2f9Mh4WnmfFdVgPEqIxmJ%2bwnNIlKzX31GePRuLmgLHdeItkfnMPnUWIA8FB485RmGfrEwbF07d7v5JLcYuL4V62CKyW8dl3m89EvsbxcbOTOQN85CQsa5fKdgUZaZ2j1kFgi7Oj9J3MJ10oCQ5OjkZnoBWXZPFsfGxQTxg%2fGxh9k18jep%2b5sesYVB1jRYuej3ptGZBGivoDvEkFR%2fpxCbqB494irMYSWLmmx8c%2frOVZYeIe3XV9P7cBJ4da%2bcrLgJneN2nhKCOX0BDZsw%2bR1L93vZ6LgjgvFgolOFTVpkeF12ecpQWJg5jzm0AnoUhGdj%2fXzFJoJbgaxLnvBsFGql9%2f%2bYyJqMr8URZuovttYKmDemHYS0

http://rc12.overture.com/d/sr/?xargs=15KPjg141SnJamwr%2DocLXBROWAylwaxca58cluD5l4GtZf5iMxXOV4aaTCm8dxTOVxv1PdzPSW%5FqYSL%5FT5kPOJGweKQVWJGuXpjdLJxYw6Nq2jUNEbsYRzy%2DLvmIZGOX0E2laEOd%2D5mO7acZdRD05mjddAwByR%2D%5Flqw8yzxu4IQevVig0sskqFc5Z17tQp9bnAXOx7TLome97vhXfFfZwQ%2D%2DxDke%2DgSygTLyyj4WYa9VeHJi58obDIYo0L3ZbKzoLLOKeswIYJfRXG%2DYe62VuOrU6t8txuN2zT3r4MzgFZJP%5F%2DIlWJ3Ulvvv%2DbgfDfP4074wP1CfzqVHz3dxM5PXU3E5OufGXnbWw99E%5FOfpRQIMSv2xOO

http://clickserve.uk.dartsearch.net/link/click?lid=43000000042332928&ds_s_kwgid=58000000000470369&ds_e_adid=8770229031&ds_e_matchtype=standard&ds_e_kwdid=86608522531&ds_e_kwgid=5777302919&ds_url_v=2

http://ad-emea.doubleclick.net/clk;160208746;22377034;b;u=ds&sv1=42332928&sv2=2009111264&sv3=84754;%3fhttp://www.marshallward.co.uk/?aff=yahoo?&affsrc=acquisition&cm_mmc=yahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

http://fls.doubleclick.net/act;sit=530730;spot=1529997;~dc_rdr=?http%3A//www.marshallward.co.uk/%3Faff%3Dyahoo%3F%26affsrc%3Dacquisition%26cm_mmc%3Dyahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

http://www.marshallward.co.uk/?aff=yahoo?&affsrc=acquisition&cm_mmc=yahoo-_-Generic-_-Generic-Catalogue-Keywords---Broad-_-catalogs

This would allow the site operator to monitor clicks, which is a legitimate thing to do. It is also something you would need if you wanted to monitor and reward agents clicking the links for you. I appreciate that this in itself is not a “smoking gun”.

Conclusion

I suggest that at least the sites in these sets are fraudulent. The fact that they evaded the data mining checks you mention suggests to me that they were generated by a network of geographically distributed agents. These might be humans, encouraged by a share of the AdSense revenue or they might be an autonomous bot-net with a smart pattern of click behaviour (chaotic perhaps). Frankly the click-pattern doesn’t seem that smart to me, so I think there is another type of fraudster out there that is much smarter and rarely hits the same advertiser twice, this would prevent detection by the advertiser, and would only be detectable by Google themselves.

Let me be clear that I have no reason to doubt the validity of sponsored search or Gmail content adverts and I continue to use these networks, but I have serious doubts about the public AdSense network. I also do not argue that it is still good value for money (even if 1/3 of it is click-fraud), but nobody likes to be ripped off and it is in both our interests to fix it.

What do we expect? We are not looking for refund on these clicks (frankly I have wasted more money typing this email), instead we would like Google to consider my arguments, and if they agree, to improve their detection process. Excluding these sites is a poor option as I will have to spend time every day weeding out spammers from our content network placements. I appreciate that there will always be a number of frauds that are impossible to detect algorithmically, and that you are locked into an arms race with the fraudsters, but there seems to be more you could be doing to improve automatic detection.

Best regards,

Rupert Rawnsley.

Google’s Response

Hello Rupert,

Thank you for your reply and for providing us with the additional information. We appreciate your patience as we work to resolve this issue.

I can confirm the information you have mentioned. Many of the sites in question seem to have the same templates and show only AdSense ads.

However, the clicks that you accrued from these sites are valid. As mentioned in our previous email, these sites are a part of the domain park network. The sites in question do not have any specific content, but are simply "parked" for interested users to purchase the site from the domain hosting company. Also, domain parked sites can be former functioning websites whose domain name contracts have expired. Since these sites are largely created for temporary purposes, the template used may be the same across several websites. This is the reason you may find the same images or the same layout across several of these sites.

Once an interested user purchases the site or renews the domain name registration, the site is automatically removed from the domain parked network by the hosting company. Parked domain sites offer users ads that are relevant to the text they entered. In addition, some parked domain sites include a search box, which allows users to further refine their search. We've found that AdWords ads displayed on parked domain sites receive clicks from well-qualified leads within the advertisers' markets.

In general, we've noticed that the return on investment gained on these pages is equal to or better than that gained on other pages in the search and content networks.

We are sorry to learn that you are disappointed with the quality of clicks you accrued from these sites. Please be assured that the clicks are valid.

We strongly suggest you consider using the site-and-category exclusion tool to prevent your ads from showing on the domain park network.

Thank you for your patience and understanding.

Sincerely,

<Google Employee>

The Ad Traffic Quality Team