The methodology followed within this study is based on the idea that when twitter features are incorporated into poll-based regression models, the predictions get significantly better compared to more conventional poll-based regression. Intuitively, one may think that the Twitter-based features in such a model act in some form to “correct” the polls. But is achieving better results than poll-based regression good enough? What if there is a polling outlier on the very last day before the elections? Would our Twitter features be able to correct that and, if so, to what extent? Would it be preferable if we didn’t consider the polls conducted over the last period before the election day, but instead rely strictly on our Twitter features during that period, to avoid such phenomena?
In order to answer to these questions, we have decided to test two different models:
Poll enhancer: We have been following polls until the very last day before the elections (6th of May), extracting daily features from them to serve as our target. In this approach, our final election prediction is heavily influenced by the polls, especially those conducted during the final days before the Election Day.
Reduced poll-influence: In order to reduce the influence of opinion polls in our final prediction model, as well as to test the impact of Twitter features, we have removed poll-based features that were used by the Poll enhancer within the last week before the Election Day. Thus, by using polls published up to one week before the Election Day and Twitter data tracked between 21st of March and 6th of May, we have trained a model that shifts away from the latest polls and makes a more independent election prediction.
The results presented in the home page are the ones produced by Poll enhancer, while the ones presented here refer to the Reduced poll-influence. Quite interestingly, we find that there exist important differences in their outputs. The election results will tell which of the two models gave a better approximation of the final outcome. However, in order to understand the importance of the contribution of the various parameters across different days, we plan to further investigate the two models as part of our post election analysis.
Lib Dem: 8.34