Pre-election 2026 parliamentary elections campaign monitoring: additional insights from Facebook
Published Saturday 11 April 2026 at 20:22
Executive summary
This report conducted under the Bulgarian–Romanian Digital Media Observatory (BROD) covers the second weekly monitoring cycle of Facebook public content during the Bulgarian pre-election campaign. Its purpose is descriptive and exploratory: to characterise the volume, distribution, actor composition, and narrative patterns of publicly accessible Facebook content in a defined pre-election window, using computational methods applied to a structured corpus. No causal claims about influence, no sentiment judgements, and no claims of representativeness beyond the observation window are made.
A keyword search for „избори“ ("elections" in Bulgarian) via Meta Content Library returned more than 7 000 posts from public pages, groups, and profiles across the seven-day window (March 30 – April 5, 2026). Of them, less than a third remained in the downloadable subset that includes posts from pages with 15K+ likes or followers, as well as posts from profiles that are verified or have 25K+ followers.
| Option 1 | Content type: links and shares & photos | Find text in images - yes | 7,300 estimated total results |
| Option 2 | Content type: links and shares & photos | Find text in images - no | 6,500 estimated total results |
| Option 3 | Content type: links and shares | 3,900 estimated total results |
Option 1 selected, downloadable subset: 1,300 estimated total results
All posts were publicly accessible at the time of collection; no private or group-restricted content was included.
The narrative monitoring framework was drawn from an existing media and social media listening document provided by the research team. It comprises 11 narrative clusters, 23 sub-clusters, and 181 keyword and phrase search terms. Keyword matching is purely lexical; no supervised classifier was applied.
The analysis was organised into the following completed stages:
Stage 1 – Data cleaning and classification.
Stage 2 – Engagement statistics and network analysis: descriptive statistics (non-parametric), reaction-type breakdown, temporal patterns, and top-actor ranking by engagement. Two network graphs were constructed: an owner-to-surface bipartite network (51 nodes, 54 edges), and a co-text coordination network (100 nodes, 141 edges, 35 Louvain communities).

Stage 3 – Per-actor analysis across all 23 actors (that actually have measurable presence in the dataset -- not the full register count): mention counts derived from text regex on both the owner axis and the mention axis, per-actor engagement metrics, and reaction-type profiles.
Stage 4 – Narrative keyword analysis: lexical matching against the 181-term framework, producing match counts by narrative cluster and by actor.
Stage 5 – Bulgarian NLP pipeline using stanza 1.11.1 (Bulgarian model): full corpus lemmatisation, part-of-speech tagging, and named entity recognition across 140,608 tokens and 9,444 unique lemmas after stopword filtering. Outputs include corpus-wide lemma frequency tables, per-actor lemma profiles on both axes, log-likelihood distinctive vocabulary per actor, collocate analysis in a ±5-token window for 18 narrative keyword stems, KWIC concordance (472 lines), hashtag frequency and co-occurrence analysis, and URL and domain extraction from all four URL-bearing columns (514 URL instances across 453 posts, 137 unique domains).
Key methodological decisions
Several decisions warrant explicit statement.
No sentiment claims are made anywhere in the analysis.
Exact-text duplicates were retained rather than deduplicated, because coordinated identical posting is treated as a substantive phenomenon rather than noise. Posts attributed to ИТН, for example, are 97 % reshares with 58 exact-text duplicates; removing them would suppress the coordination signal entirely.
Named entity recognition from the stanza Bulgarian model was retracted after post-hoc verification revealed systematic unreliability for entity counting in this corpus. Entity frequencies reported in the analysis derive from direct regex-based counts, not from the NER output. This substitution is conservative: regex counts are transparent and reproducible, though they do not resolve ambiguous entity references.
Keyword matching for narrative classification establishes that a narrative term is present, not that a post endorses, promotes, or opposes the associated frame. Directionality was partially resolved for a subset of terms through Stage 5 collocate analysis. Collocates of "суверенитет" (защита, защитавам, европейски, лидер), for instance, indicate predominantly pro-EU defensive usage, which partially resolves the directionality problem for narrative cluster N1. Where collocate evidence is insufficient, directionality remains unresolved and is reported as such.
The mention axis and the owner axis are treated as analytically distinct throughout. A post may be owned by one actor while mentioning several others; conflating these axes inflates mention counts and misattributes engagement.
Limitations
The observation window of one week is short relative to the full campaign cycle. Patterns identified here may not hold across the broader pre-election period, and no such generalisation is attempted.
The corpus is bounded by what Meta Content Library returned for the specified query parameters and by the downloadable subset restrictions which can be quite limiting and hinder valuable insights into a low-resource language. Retrieval completeness cannot be independently verified; systematic omissions, if present, are not detectable from the dataset alone.
Keyword matching carries well-understood precision and recall trade-offs. The 181-term narratives framework was designed for broad coverage; false positives are expected at some rate and cannot be quantified without manual annotation, which has not been performed.
The 200 posts matched via OCR image-text extraction constitute a lower-reliability subset. Error rates for Bulgarian Cyrillic text in naturalistic image conditions are not characterised for this specific pipeline; findings derived primarily from this subset should be treated with corresponding caution.
No human annotation or manual content coding has been conducted, and computational outputs have not been validated against a gold-standard labelled sample. The current analysis is better described as a structured quantitative description than as a full mixed-methods content analysis.
Finally, engagement metrics reflect platform-mediated amplification dynamics and not audience attitudes or reach in any direct sense. Reaction counts measure a behavioural signal whose interpretation requires caution, particularly given the extreme concentration of attention (Gini = 0.856) across a small number of posts and actors.