Detecting survey fraud: the best way to catch ‘bad actors’

So the data from your latest new product concept test is in, and the level of interest is rather higher than you expected. What’s going on? Is the product really that good? Can you trust the data? You may have asked yourself, “how many of these responses are made up?”.

Unfortunately, the answer is that a lot of survey data is indeed made up, or of low quality. And unless you have appropriate fraud detection and data checks in place, those bad responses are skewing your data and potentially misleading you.

Why would people lie in a survey to get 50 cents?

Good question. The answer is, because they can, and there is a very low chance of getting caught, and no punishment if you do get caught.

How widespread is the problem?

In short, very. We at Faster Horses recently conducted self-funded ‘research on research’ to understand the extent of the problem. As previously reported, we found that just one in three survey responses was completely free from suspicion of cheating, laziness or fraud.

In this study, we included 14 markers or red flags to measure different types of bad quality data. These included methods commonly used in the research industry, such as attention checks or ‘red herring’ questions, such “Please select the word ‘RED’ below” or “What is 2+2?”, and checks for “straight-liners” – people who give the same answer to a series of attitudinal statements.

But these industry staples barely scratch the surface of fraud and poor quality data. Just 1% of respondents fail red herrings, and only 4% are straight-liners.

Digging deeper

The next step is to conduct laziness checks: people claiming to be very familiar with brands that are made up (typically picking up 4-5% of respondents); and inconsistencies in attitudes – people saying they support same sex marriage but also, later in the survey that they oppose it (picking up between 11%-18% of respondents, depending on the question).

The gold standard

The most reliable ways to detect cheats and fraudsters in survey research are (1) to examine verbatim responses to open ended questions; and (2) measure time spend reading a product concept.

Not all surveys include a product concept evaluation – hence our preference for scrutinising verbatims.

But if you don’t include open ended questions in your surveys and go through every single response, you are inviting fraud. Simple as that.

We routinely detect between 15-20% of respondents providing poor quality open-ended responses – from invalid strings (“kjhkjhljkh”), to copy and paste of the question itself, profanity, or nonsense answers (“Donald Duck” as the answer to ‘who is the Prime Minister?’).

A typical product concept in a survey has approximately 200 words, which takes an average reader about 25 seconds to process. Measuring the length of time respondents spend reading a concept will give you a good idea of how much attention people are paying to the survey. For example, it’s a fair bet that those spending 5 seconds or less on a 200-word product description are not taking the task seriously. Analysing the length of time spent reading a concept allows us to detect about 25% of respondents who are not paying attention.

Total number of ‘fraud flags’

In our ‘research on research’ study, 33% of respondents raised no flags. A further 46% raised 1 or 2 flags – likely honest mistakes or a brief lapse of concentration – while 20% raised 3 or more flags (and 5% failed every test we included).

There is no right or wrong answer for how many ‘red flags’ need to be raised before you decide to exclude a respondent as fraudulent or poor quality. It is up to each researcher-client relationship to make that decision. The point is, it is critically important that you have the checks in place, and monitor the level of fraud / poor quality responses in your survey data.

What impact is this having on my survey data?

Glad you asked.

That is the topic of our next article, coming out in the next few days. So stay tuned.

But you should reasonably expect to see inflated brand awareness levels, particularly for smaller brands (although you may see lower awareness for big brands); and much higher interest levels / purchase intent for product concepts.

Follow Faster Horses to make sure you don’t miss our detailed analysis of the impact that poor quality and fraudulent respondents and having on your data.