Election Poll Methods

Checking representativeness of our data we compare the actual vote distribution from the previous election to the distribution in our sample, and notice some skewness.

In our sample we have an over-representation of PFP voters and non-voters, and an under-presentation of UP and URSM voters. To account for this, we calculate scaling factors for each party, based on the ratio of the two distributions.

For example, for National Alliance:

*For new parties it is not possible to calculate a scaling factor.
**For USparty (n = 6) and ECE (n = 3), sample size is low, making their scaling factors unreliable.

The closer the ratio is to 1, the more accurately the group is represented in our sample (NB, since it is a ratio; factor 2 and 0.5, or 3 and 0.33, have the same distance to 1).

The number of observations per group also impacts accuracy. We prefer at least 30 observations per party, as this increases our chances of a normal sampling distribution and with that representativeness.

Why do scaling factors matter?

If a party is under- or over-represented in our sample, the preferred votes for those parties will likely be under- or over-represented as well. As such, the raw numbers of these preferred votes will not necessarily reflect actual population views.

Possible reasons for skewness in our sample.

Reviewing our data, we see that 80% of our total responses came in on the same day as the announcement in The Daily Herald. People got to know about the poll, mostly through WhatsApp (65%), The Daily Herald (18%), and Facebook (11%). The poll was also announced in radio programs and other platforms.

We can therefore conclude that the news about the poll spread fast and mostly through personal networks. This could indicate that not all groups overlap.

Other explanations could be interest and motivation to take the poll, or comfort-level with taking an online survey. The poll could be taken on a smart phone, tablet, or desktop, meaning that some who do not have a (properly functioning) smartphone, would need additional effort to take it on another device.

Additional representativeness checks

The number of registered voters for the upcoming election is 22,747. Our sample size, after data cleaning, n = 591 is substantial compared to this. Note that due to not all respondents filling out all questions, percentages and counts may slightly differ from item to item.

The gender distribution in our sample is representative:

Data cleaning

Ideally, we include constraints to limit individuals to single responses. Due to time and resource constraints, this was not possible. We therefore checked for valid contributions after collecting data, by reviewing a combination of IP addresses, time stamps, and scores.

Since knowledge on how to take the poll more than once was not restricted to any group, we can also assume that these entries even out over large numbers.

We want to stress the value of accurate information. The purpose of the poll is to provide insights into real life phenomena. The more accurate the insights, the more value it creates for all. When we see a doctor for our health, we value their honest diagnosis, the same applies to the analysis of the poll.

Margins of errors

If we were to replicate our poll, we would obtain difference outcomes. Therefore, we calculate a 10% margin of error for some of the percentages we find based on our sample. This corresponds to a 90% confidence interval.

What this means in practise is, if we were to retake our poll several times, we would find a number between the lower and upper margin of our confidence 90% of the time. The larger our sample size, the smaller our margin of error.