library(httr2)
library(jsonlite)
library(glue)
library(tidyverse)
library(ggplot2)
library(rollama)
library(yardstick)
library(ggrepel)6 Generative LLMs for Text Classification and Ideological Scaling
Using OLLAMA from R
6.1 Setup
Before running this notebook, make sure OLLAMA is installed and running on your local machine.
You should open an OLLAMA account, download and install the software by visiting https://ollama.com

Ollama is a local LLM server that allows you to run large language models on your machine or in their cloud. It provides a simple API to interact with the models and is designed to be easy to use from R (with the rollama package) or Python.
In order to communicate with OLLAMA from R we will use the rollama package, which provides a convenient interface to send queries to the OLLAMA server and receive responses.

Load libraries:
6.2 Connect to OLLAMA
OLLAMA exposes a simple REST API on http://localhost:11434. We call it directly with rollama
Let’s test the connection:
ping_ollama()We pull the model we want to use. In this case we will use gemma4:31b-cloud, a large model that is available on the OLLAMA cloud. You can also pull a smaller model like qwen3.5:4b if you want to run the notebook on your machine, but the results may be less accurate and it would take more time to run depending on your hardware.
Ollama provides free cloud credits to run hosted models, but larger models or higher query volumes require a paid subscription. Alternatively, models can be run entirely locally at no cost, though this demands substantial hardware, small models may run on a laptop, but their performance is often insufficient for serious NLP tasks. For this tutorial, we use a cloud-hosted model within Ollama’s free-tier limits. You can monitor your daily and weekly usage at https://ollama.com/settings.
The university should also be equipped with a High Performance Computing cluster that you can use to run the notebook with larger models. This is an option that you might consider if the free plan does not provide enough resources for your needs or if you are working with sensitive data.
For now we should be fine with the free plan.
You can check how much usage you have left on the OLLAMA dashboard: https://ollama.com/settings
pull_model("gemma4:31b-cloud")We can ask the model a simple question to test that it is working:
start_time <- Sys.time()
query("What is the definition of policy dimension? Answer with 4 sentences.", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0))A policy dimension refers to a specific axis or thematic area used to categorize and analyze different aspects of a policy framework. It allows policymakers to break down complex issues into manageable components, such as economic, social, or environmental considerations. By defining these dimensions, organizations can ensure that all critical perspectives are addressed and balanced during the decision-making process. This structural approach helps in measuring the impact and effectiveness of a policy across various distinct sectors.
end_time <- Sys.time()
elapsed <- end_time - start_timeTime required:
print(elapsed)Time difference of 1.754067 secs
The option temperature = 0 makes the model almost fully deterministic, it always picks the most probable token, eliminating randomness. This is ideal for classification tasks where consistency and replicability matter more than creative variation. Nevertheless there is always some variation in the model, a good practice is to produce multiple classification to compute measures of inter-coder reliability.
Generative model predict the next token based on what came before, in the visual examples provided by Alammar and Grootendorst (2024) the models need predict the word that come after “I am driving a”

The temperature control the randomness in selecting the next word. A low temperature will tend to always pick most likely words. A higher temperature will allow the model to pick word that would othewise less likely to occur.

What the think option does?
start_time <- Sys.time()
query("What is the definition of policy dimension? Answer with 4 sentences.", model = "gemma4:31b-cloud", think = TRUE, options = list(temperature = 0))A policy dimension is a specific conceptual axis or perspective used to categorize and analyze various aspects of a public policy issue. It allows policymakers to break down complex problems into manageable components, such as economic, social, or environmental factors. By identifying these dimensions, analysts can better evaluate the trade-offs and potential impacts of different policy alternatives. Ultimately, utilizing policy dimensions helps create a structured framework for informed decision-making and the assessment of a strategy's effectiveness.
end_time <- Sys.time()
elapsed <- end_time - start_timeTime required:
print(elapsed)Time difference of 5.031382 secs
The think = TRUE option activates extended thinking in Gemma 4 (but also in other models), which means the model generates an internal chain-of-thought reasoning trace before producing its final answer. Under the hood, the model first “thinks aloud”, working through the problem step by step, and only then outputs the visible response. That is the reason why the model took almost double the time to complete the task.
6.3 Generative Models to perform text classification
Generative LLMs are becoming quite popular to perform text classification tasks as research show they they can outperform expert coders (Törnberg 2024). The idea is to leverage the model’s ability to understand and generate human-like text to classify documents or sentences based on their content. This can be done through prompting, where we provide the model with a specific instruction or question that guides it to produce the desired output.
You can refer to Törnberg (2024), for a more detailed discussion on the best practices when using LLMs to perform text classification.
6.3.1 Zero-shot classification
In zero-shot prompting we give the model a task description and the text to score, no labelled examples. The model draws entirely on what it learned during pre-training.
In the example that follows we ask the model to classify sentences as “populist” or “non-populist” based on a definition of populism and some examples. We then evaluate the model’s performance against a hand-coded sample of sentences.
We download a sample of sentences from the Italian Manifesto Project dataset, which have been hand-coded for populism. We will use these hand-coded labels to evaluate the performance of our zero-shot classification.
# Load the data
load("data/it_manifestos_subset.RData")
head(data) sentence_id2
1 10022
2 603
3 6369
4 895
5 9810
6 691
text
1 È proprio in nome dell’affarismo in sanità , sostenuto dal più rilevante sistema lobbystico mondiale , che si negano risorse necessarie per la cronicità e disabilità , per la prevenzione , la riabilitazione , il territorio e l’assistenza domiciliare integrata e si preferisce riversare ogni costo sulla famiglia e sulle donne in particolare ( le prime vittime di questo sistema iniquo e ingiusto ) che figlie , mamme e lavoratrici si industriano a svolgere , alla bisogna e nell'abbandono totale , anche il lavoro di assistente domiciliare , infermiere , psicologo ecc .
2 È fuori di dubbio che l’università con il “ sistema del 3 più 2 si è licealizzata , chiudendosi in una netta divisione dei saperi che l’ha condotta a specialisti e microspecialismi , buoni solo per garantire cattedre e rendite assicurate al sistema di gestione attuale .
3 Negli ultimi 10 anni quasi 170 miliardi di incentivi per il mercato delle rinnovabili è finito nelle tasche di aziende cinesi .
4 Le condizioni imposte in cambio dell’accesso a pacchetti di aiuto da parte della BCE hanno inoltre contribuito a precipitare milioni di persone nella spirale drammatica della povertà e dell’esclusione sociale .
5 Si tratta di un ricatto per imporre politiche economiche fallite e fallimentari che arricchiscono lobby e corporazioni finanziarie a danno di diritti e Costituzioni nazionali .
6 Siamo l’unico paese in Europa in cui la “ libertà di coscienza ” , in particolare quella ipocrita dei legislatori , è diventata lo strumento di alcuni per impedire la libertà di altri .
text_eng
1 It is precisely in the name of habit in healthcare, supported by the most relevant world lobby system, that necessary resources are denying for chronicity and disability, for prevention, rehabilitation, territory and integrated home care and it is preferred to pour all costs on the family and women in particular (the first victims of this unfair and unjust system) that daughters, mothers and workers are industrically industrically held, Even the work of home assistant, nurse, psychologist etc.
2 It is out of doubt that the university with the "System of 3 plus 2 has high school, closing itself in a clear division of knowledge that led it to specialists and microspeialisms, good only to guarantee chairs and income insured to the current management system.
3 In the last 10 years almost 170 billion incentives for the renewable market has ended up in the pockets of Chinese companies.
4 The conditions imposed in exchange for access to help from the ECB have also contributed to falling millions of people in the dramatic spiral of poverty and social exclusion.
5 It is a blackmail to impose bankrupt and bankruptcy economic policies that enrich the lobbies and financial corporations to the detriment of national rights and constitutions.
6 We are the only country in Europe where the "freedom of consciousness", in particular the hypocritical one of legislators, has become the tool of some to prevent the freedom of others.
With the data loaded, we can now build our query for zero-shot classification. We will define a system prompt to set the context for the model, a task prompt to specify the classification task, and a template to format the user query. The system prompt will explain the role of the model as an expert in analyzing populist content in political texts, while the task prompt will list the categories for classification. The template will structure how the input text and the prompt are combined when sent to the model.
# Build the query. zero-shot classification
# Define system prompt explaining the role/context
system_prompt <- paste(
"You are an expert in analyzing populist content in political texts.",
"Answer with just the correct category without adding any explanation.",
sep = "\n"
)
# Define the main task prompt
task_prompt <- paste(
"Categories: populist, non-populist",
sep = "\n"
)
# Define a template to format the user query
query_template <- "{prefix}{text}\n{prompt}"
#task_prompt <- "Category:"
query_template <- "{prefix}{text}\n{prompt}"
prefix_template <- "Text to classify: "
# Create the query object with make_query()
query_obj <- make_query(
text = data$text,
prompt = task_prompt,
template = query_template,
system = system_prompt,
prefix = prefix_template
)
# Check data
head(data) sentence_id2
1 10022
2 603
3 6369
4 895
5 9810
6 691
text
1 È proprio in nome dell’affarismo in sanità , sostenuto dal più rilevante sistema lobbystico mondiale , che si negano risorse necessarie per la cronicità e disabilità , per la prevenzione , la riabilitazione , il territorio e l’assistenza domiciliare integrata e si preferisce riversare ogni costo sulla famiglia e sulle donne in particolare ( le prime vittime di questo sistema iniquo e ingiusto ) che figlie , mamme e lavoratrici si industriano a svolgere , alla bisogna e nell'abbandono totale , anche il lavoro di assistente domiciliare , infermiere , psicologo ecc .
2 È fuori di dubbio che l’università con il “ sistema del 3 più 2 si è licealizzata , chiudendosi in una netta divisione dei saperi che l’ha condotta a specialisti e microspecialismi , buoni solo per garantire cattedre e rendite assicurate al sistema di gestione attuale .
3 Negli ultimi 10 anni quasi 170 miliardi di incentivi per il mercato delle rinnovabili è finito nelle tasche di aziende cinesi .
4 Le condizioni imposte in cambio dell’accesso a pacchetti di aiuto da parte della BCE hanno inoltre contribuito a precipitare milioni di persone nella spirale drammatica della povertà e dell’esclusione sociale .
5 Si tratta di un ricatto per imporre politiche economiche fallite e fallimentari che arricchiscono lobby e corporazioni finanziarie a danno di diritti e Costituzioni nazionali .
6 Siamo l’unico paese in Europa in cui la “ libertà di coscienza ” , in particolare quella ipocrita dei legislatori , è diventata lo strumento di alcuni per impedire la libertà di altri .
text_eng
1 It is precisely in the name of habit in healthcare, supported by the most relevant world lobby system, that necessary resources are denying for chronicity and disability, for prevention, rehabilitation, territory and integrated home care and it is preferred to pour all costs on the family and women in particular (the first victims of this unfair and unjust system) that daughters, mothers and workers are industrically industrically held, Even the work of home assistant, nurse, psychologist etc.
2 It is out of doubt that the university with the "System of 3 plus 2 has high school, closing itself in a clear division of knowledge that led it to specialists and microspeialisms, good only to guarantee chairs and income insured to the current management system.
3 In the last 10 years almost 170 billion incentives for the renewable market has ended up in the pockets of Chinese companies.
4 The conditions imposed in exchange for access to help from the ECB have also contributed to falling millions of people in the dramatic spiral of poverty and social exclusion.
5 It is a blackmail to impose bankrupt and bankruptcy economic policies that enrich the lobbies and financial corporations to the detriment of national rights and constitutions.
6 We are the only country in Europe where the "freedom of consciousness", in particular the hypocritical one of legislators, has become the tool of some to prevent the freedom of others.
Now we can run the query on the sentences in our dataset. This will send each sentence to the model along with the task description, and the model will return a classification of “populist” or “non-populist” for each sentence. We will store these classifications in a new variable called gemma4_zs in our data frame.
# Run the query on the 100 sentences
start_time <- Sys.time()
data$gemma4_zs <- query(query_obj, model = "gemma4:31b-cloud", screen = T, output = "text", think = FALSE, rollama_verbose = FALSE, model_params = list(
temperature = 0))populist
non-populist
populist
populist
populist
populist
populist
populist
non-populist
populist
populist
populist
populist
non-populist
non-populist
non-populist
populist
populist
populist
non-populist
non-populist
populist
non-populist
non-populist
populist
populist
populist
populist
populist
populist
populist
non-populist
non-populist
non-populist
non-populist
populist
populist
populist
populist
non-populist
non-populist
populist
populist
non-populist
populist
populist
non-populist
populist
populist
populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
table(data$gemma4_zs)
non-populist populist
67 33
end_time <- Sys.time()
elapsed <- end_time - start_time
print(elapsed)Time difference of 4.167255 mins
We now have a new variable gemma4_zs in our data frame with the model’s classification of each sentence as “populist” or “non-populist”. We can evaluate the model’s performance by comparing these classifications to the hand-coded labels in pop_handcoding. We will use the yardstick package to compute common classification metrics like accuracy, precision, recall, and F1-score.
#Load the data with the hand-coded labels
load("data/handocing.RData")
data2 <- left_join(data, handcoding, by = "sentence_id2")
data2$pop_handcoding <- ifelse(data2$pop_handcoding == 0, "non-populist",
ifelse(data2$pop_handcoding == 1, "populist", NA))
# Convert to factor
data2$pop_handcoding <- factor(data2$pop_handcoding, levels = c("non-populist", "populist"))
data2$gemma4_zs <- factor(data2$gemma4_zs, levels = c("non-populist", "populist"))
# Check the result
head(data2) sentence_id2
1 10022
2 603
3 6369
4 895
5 9810
6 691
text
1 È proprio in nome dell’affarismo in sanità , sostenuto dal più rilevante sistema lobbystico mondiale , che si negano risorse necessarie per la cronicità e disabilità , per la prevenzione , la riabilitazione , il territorio e l’assistenza domiciliare integrata e si preferisce riversare ogni costo sulla famiglia e sulle donne in particolare ( le prime vittime di questo sistema iniquo e ingiusto ) che figlie , mamme e lavoratrici si industriano a svolgere , alla bisogna e nell'abbandono totale , anche il lavoro di assistente domiciliare , infermiere , psicologo ecc .
2 È fuori di dubbio che l’università con il “ sistema del 3 più 2 si è licealizzata , chiudendosi in una netta divisione dei saperi che l’ha condotta a specialisti e microspecialismi , buoni solo per garantire cattedre e rendite assicurate al sistema di gestione attuale .
3 Negli ultimi 10 anni quasi 170 miliardi di incentivi per il mercato delle rinnovabili è finito nelle tasche di aziende cinesi .
4 Le condizioni imposte in cambio dell’accesso a pacchetti di aiuto da parte della BCE hanno inoltre contribuito a precipitare milioni di persone nella spirale drammatica della povertà e dell’esclusione sociale .
5 Si tratta di un ricatto per imporre politiche economiche fallite e fallimentari che arricchiscono lobby e corporazioni finanziarie a danno di diritti e Costituzioni nazionali .
6 Siamo l’unico paese in Europa in cui la “ libertà di coscienza ” , in particolare quella ipocrita dei legislatori , è diventata lo strumento di alcuni per impedire la libertà di altri .
text_eng
1 It is precisely in the name of habit in healthcare, supported by the most relevant world lobby system, that necessary resources are denying for chronicity and disability, for prevention, rehabilitation, territory and integrated home care and it is preferred to pour all costs on the family and women in particular (the first victims of this unfair and unjust system) that daughters, mothers and workers are industrically industrically held, Even the work of home assistant, nurse, psychologist etc.
2 It is out of doubt that the university with the "System of 3 plus 2 has high school, closing itself in a clear division of knowledge that led it to specialists and microspeialisms, good only to guarantee chairs and income insured to the current management system.
3 In the last 10 years almost 170 billion incentives for the renewable market has ended up in the pockets of Chinese companies.
4 The conditions imposed in exchange for access to help from the ECB have also contributed to falling millions of people in the dramatic spiral of poverty and social exclusion.
5 It is a blackmail to impose bankrupt and bankruptcy economic policies that enrich the lobbies and financial corporations to the detriment of national rights and constitutions.
6 We are the only country in Europe where the "freedom of consciousness", in particular the hypocritical one of legislators, has become the tool of some to prevent the freedom of others.
gemma4_zs pop_handcoding
1 populist populist
2 non-populist populist
3 populist non-populist
4 populist populist
5 populist populist
6 populist non-populist
We can now compute the classification metrics:
Accuracy: Overall proportion of correct predictions. It measures overall how often the model’s predictions match the true labels.
Precision: The proportion of positive predictions made by the model that are actually correct. Out of all instances the model labeled as positive (e.g., “populist”), how many truly are positive.
Recall: The proportion of actual positive cases that the model correctly identified. Out of all actual positive instances, how many did the model correctly detect.
F1-Score: The harmonic mean of precision and recall. Balances precision and recall into a single metric. Useful when you want a trade-off between false positives and false negatives.
# Compute all metrics and bind results in one tibble
metrics_summary_gemma4_zs <- tibble(
metric = c("Accuracy", "Precision", "Recall", "F1-score"),
gemma4_zs = c(
accuracy(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate),
precision(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate),
recall(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate),
f_meas(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate)
)
)
print(metrics_summary_gemma4_zs)# A tibble: 4 × 2
metric gemma4_zs
<chr> <dbl>
1 Accuracy 0.81
2 Precision 0.881
3 Recall 0.843
4 F1-score 0.861
6.3.2 Few-shot classification
In few-shot prompting we provide the model with a few labelled examples of the task alongside the task description. This helps the model understand the specific context and criteria for classification, often improving performance. This is especially useful when the task is complex or when the model may not have seen many similar examples during pre-training.
examples_fs <- tibble::tribble(
~text, ~answer,
# Non-populist political sentences
"Il governo ha presentato una nuova riforma fiscale", "non-populist",
"Il Parlamento sta discutendo il bilancio dello Stato", "non-populist",
"Il ministro ha spiegato le nuove politiche ambientali", "non-populist",
"Le elezioni si terranno il prossimo mese", "non-populist",
"Il partito ha proposto un piano per migliorare l'istruzione", "non-populist",
# Populist - Anti-elitism
"Le élite corrotte vogliono solo mantenere il potere e ignorano il popolo", "populist",
"I politici tradiscono sempre il popolo per i loro interessi personali", "populist",
"Il sistema è marcio perché dominato da una classe dirigente distante", "populist",
"Basta con i politici bugiardi che rubano i nostri soldi", "populist",
"La casta politica non rappresenta più la volontà della gente comune", "populist",
# Populist - People-centrism
"Solo il popolo può decidere il futuro di questa nazione", "populist",
"Dobbiamo ascoltare la voce del popolo e mettere fine alle ingiustizie", "populist",
"La politica deve tornare ad essere al servizio della gente comune", "populist",
"Il vero potere appartiene al popolo, non alle élite", "populist",
"Difendiamo i diritti di chi lavora e soffre ogni giorno", "populist"
)
# Build the query with the examples
query_fs <- make_query(
text = data2$text,
prompt = task_prompt,
template = query_template,
system = system_prompt,
prefix = prefix_template,
examples = examples_fs
)Now we can run the few-shot query on the sentences in our dataset, just like we did with the zero-shot query. This will allow us to compare the performance of the model with and without examples.
# run the model
start_time <- Sys.time()
data2$gemma4_fs <- query(query_fs, model = "gemma4:31b-cloud", screen = T, output = "text", think = FALSE, rollama_verbose = FALSE, model_params = list(
temperature = 0))populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
non-populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
non-populist
non-populist
populist
populist
populist
populist
populist
populist
non-populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
populist
non-populist
non-populist
non-populist
non-populist
non-populist
non-populist
end_time <- Sys.time()
elapsed <- end_time - start_time
print(elapsed)Time difference of 3.863027 mins
We can now evaluate the performance of the few-shot classification in the same way we did for the zero-shot classification, by comparing the model’s predictions to the hand-coded labels and computing the same metrics.
data2$gemma4_fs <- factor(data2$gemma4_fs, levels = c("non-populist", "populist"))
# Compute all metrics and bind results in one tibble
metrics_summary_gemma4_fs <- tibble(
metric = c("Accuracy", "Precision", "Recall", "F1-score"),
gemma4_fs = c(
accuracy(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate),
precision(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate),
recall(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate),
f_meas(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate)
)
)cbind(metrics_summary_gemma4_zs, metrics_summary_gemma4_fs, by = "metric") metric gemma4_zs metric gemma4_fs by
1 Accuracy 0.8100000 Accuracy 0.8100000 metric
2 Precision 0.8805970 Precision 1.0000000 metric
3 Recall 0.8428571 Recall 0.7285714 metric
4 F1-score 0.8613139 F1-score 0.8429752 metric
In our case there are some changes in the performance.
Accuracy (zs: 0.78 / fs: 0.79) Out of all speeches, how many did it get right overall? Both models classify roughly 4 out of 5 speeches correctly. The difference is negligible, few-shot prompting gives no meaningful accuracy gai
Precision (zs: 0.875 / fs: 1.00) Of all the speeches the model called populist, how many actually were? The few-shot model is perfect here: every time it raises the “populist” flag, it is right. The zero-shot model occasionally raises it by mistake.
Recall (zs: 0.80 / fs: 0.70) Of all the speeches that truly are populist, how many did the model catch? Zero-shot finds more of them, few-shot lets some slip through undetected. This is the classic precision–recall tradeoff, fs is more conservative in calling something populist.
F1-score (zs: 0.836 / fs: 0.824) Because recall drops more than precision gains, the few-shot model ends up with a slightly lower F1. Zero-shot has a marginally better balance overall.
Which model you prefer depends on your use case. If false positives are costly (e.g., you don’t want to wrongly label a politician as populist), prefer few-shot. If missing populist speeches is costly (e.g., you want comprehensive detection), prefer ze
6.4 Not only classification but also zero-shot position estimation
Instead of asking the model to classify sentences into discrete categories, we can ask it to assign a position on a continuous scale. This is more similar to what Wordscores, Wordfish, and LSS do, they produce a score that can be interpreted as a position on an ideological spectrum. This section is inspired by the work of Le Mens and Gallego (2025), who suggested using generative models to scale individual sentences and then take the average. This approach is in contrast with the one proposed by Benoit et al. (2026), who used entire documents (electoral manifestos) as the unit of analysis.
For the sake of brevity we will work with a single speech excerpt. In practice you would loop over all speeches (or chunks) in your corpus.
load("data/investitutre_sentences_16_19.RData")
# inspect the data
glimpse(investitutre_sentences_16_19)Rows: 17,246
Columns: 14
$ session_legislature <chr> "16", "16", "16", "16", "16", "16", "16", "16", "…
$ session_anno <chr> "2008", "2008", "2008", "2008", "2008", "2008", "…
$ session_mese <chr> "05", "05", "05", "05", "05", "05", "05", "05", "…
$ session_giorno <chr> "13", "13", "13", "13", "13", "13", "13", "13", "…
$ session_number <chr> "4", "4", "4", "4", "4", "4", "4", "4", "4", "4",…
$ speaker_name_cognome <chr> "BERLUSCONI Silvio", "BERLUSCONI Silvio", "BERLUS…
$ speaker_id <chr> "35330", "35330", "35330", "35330", "35330", "353…
$ gruppo_id <chr> "477", "477", "477", "477", "477", "477", "477", …
$ sigla <chr> "PDL", "PDL", "PDL", "PDL", "PDL", "PDL", "PDL", …
$ investituture <chr> "BERLUSCONI IV", "BERLUSCONI IV", "BERLUSCONI IV"…
$ doc_id2 <chr> "16_text2", "16_text2", "16_text2", "16_text2", "…
$ sentence_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
$ sentence <chr> ".", "Signor Presidente, onorevoli colleghi, il l…
$ doc_id3 <chr> "16_text2_1", "16_text2_2", "16_text2_3", "16_tex…
# we subset for the first 30 sentences
sentences_subset <- investitutre_sentences_16_19 |> slice(1:30)We build the query to ask the model to assign a score between 0 and 100 to each sentence, where 0 means “Extremely left”, 50 means “Centrist”, and 100 means “Extremely right”. We also instruct the model to return “NA” if the sentence does not have ideological content. The prompt is designed to be clear and specific about the task and the expected output format.
scaling_queries <- make_query(
text = sentences_subset$sentence,
prompt = "Where does the author of these sentence stand on the 'left' to 'right' wing scale? Provide
your response as a score between 0 and 100 where 0 means 'Extremely left', 50 means
'Centrist', and 100 means 'Extremely right'. If the text does not have ideological content,
set the score to 'NA'. You will only respond with the key Score. Do not provide
explanations.",
template = "{prefix}{text}\n{prompt}{suffix}",
system = "You are a researcher analyzing sentences taken from the Italian parliament. Your task is to infer the left–right political position expressed in each sentence, if any, and return only the numerical score as instructed in the prompt.",
prefix = "Sentence to scale: \"",
suffix = "\"",
)We can now run the query to get the ideological scores for each sentence. This will give us a new variable gemma4_scaling in our data frame with the model’s assigned score for each sentence.
# run the model
start_time <- Sys.time()
sentences_subset$gemma4_scaling <- query(scaling_queries, model = "gemma4:31b-cloud", screen = T, output = "text", think = FALSE, model_params = list(
temperature = 0))Score: NA
Score: NA
Score: NA
Score: NA
Score: NA
Score: 85
Score: 65
Score: NA
Score: 50
Score: 50
Score: NA
Score: 50
Score: NA
Score: NA
Score: NA
Score: NA
Score: NA
Score: 60
Score: NA
Score: 50
Score: NA
Score: NA
Score: NA
Score: NA
Score: NA
Score: 70
Score: 40
Score: 75
Score: 35
Score: NA
end_time <- Sys.time()
elapsed <- end_time - start_timeThe time required to scale 30 sentences:
print(elapsed)Time difference of 1.464752 mins
For efficiency we load estimations that have been already computed for all sentences in the corpus of the investiture speeches. We will explore these scores in the next section.
scaling_results_gemma <- read.csv("data/scaling_results_gemma.csv")Since we are going to explore the results using some Text Analysis techniques we keep only sentences with moe than 20 characters and replace the apostrophes.
Before moving on the analysis of the position let’s explore the sentences that have been classified as having some ideological content (i.e. those with a non-NA score).
# let's remove sentence that have less than 10 characters.
scaling_results_gemma <- scaling_results_gemma %>%
filter(nchar(sentence) > 20)
#create a variable called ideological_content that is 1 if the gemma4_scaling_mean is not NA and 0 if it is NA
scaling_results_gemma <- scaling_results_gemma %>%
mutate(ideological_content = ifelse(is.na(gemma4_scaling_mean), 0, 1))
#Replace apostrophes with a blank space
scaling_results_gemma$sentence <- gsub("[‘’‚‛'`]", " ", scaling_results_gemma$sentence)Let’s create a corpus, tokenize and create a dfm and and employ the keyness analysis to see which words are most associated with sentences that have ideological content vs those that do not.
library(quanteda)
library(quanteda.textplots)
library(quanteda.textstats)
corpus_sentences <- corpus(scaling_results_gemma, text_field = "sentence")
toks_sentences <- corpus_sentences %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = T ) %>%
tokens_tolower() %>%
tokens_remove(stopwords("it"))
dfm_sentences <- dfm(toks_sentences)
dfm_sentences <- dfm_trim(dfm_sentences, min_termfreq = 5)Now we can compute keenness comparing sentences with ideological content (ideological_content == 1) vs those without (ideological_content == 0).
tstat_key <- textstat_keyness(dfm_sentences,
target = dfm_sentences$ideological_content == 1)
textplot_keyness(tstat_key)
What if we also compound multi-word expressions like “energia rinnovabile” or “diritti umani” before computing keyness? This is a common step in text analysis to capture meaningful phrases rather than just individual words.
The following code identify words that co-occur to create multi-word expression so that the two words “welfare” and “state” will be represented as a single token “welfare_state”
# Select only capitalized tokens to identify multi-word proper nouns/named entities
# padding = TRUE preserves token positions so collocations remain detectable
toks_news_cap <- tokens_select(toks_sentences,
pattern = "^[A-Z]",
valuetype = "regex",
case_insensitive = TRUE,
padding = TRUE)
# Detect statistically significant collocations among capitalized tokens
# min_count = 5 filters out rare pairs; tolower = FALSE preserves case for proper nouns
tstat_col_cap <- textstat_collocations(toks_news_cap, min_count = 5, tolower = FALSE)
head(tstat_col_cap, 20) # Inspect top 20 collocations collocation count count_nested length lambda z
1 presidente consiglio 811 0 2 6.983133 59.11139
2 quest aula 268 0 2 9.396579 53.22117
3 legge elettorale 142 0 2 6.353248 51.44303
4 forze politiche 151 0 2 6.430611 51.35779
5 movimento stelle 231 0 2 9.749437 49.19474
6 partito democratico 184 0 2 8.213027 49.08399
7 deve essere 157 0 2 4.460386 45.03832
8 d italia 157 0 2 4.481152 44.63661
9 unione europea 136 0 2 8.397464 44.42124
10 onorevoli colleghi 150 0 2 8.249420 44.34107
11 consiglio ministri 122 0 2 5.168973 43.91524
12 punto vista 116 0 2 7.980259 41.77426
13 buon lavoro 182 0 2 6.970294 41.57712
14 signor presidente 1020 0 2 8.738634 40.74463
15 reddito cittadinanza 84 0 2 8.777483 40.30995
16 pubblica amministrazione 116 0 2 8.703731 39.79031
17 miliardi euro 64 0 2 6.750251 38.61149
18 può essere 120 0 2 3.947632 37.24027
19 campagna elettorale 107 0 2 8.296421 36.70154
20 popolo italiano 58 0 2 6.132018 36.13790
# Compound tokens whose collocation z-score > 3 (i.e., statistically robust multi-word expressions)
# e.g., "Giorgia_Meloni", "Partito_Democratico" become single tokens
toks_comp <- tokens_compound(toks_sentences, pattern = tstat_col_cap[tstat_col_cap$z > 3,],
case_insensitive = TRUE)
# Build a Document-Feature Matrix from the compounded tokens
dfm_sentences_comp <- dfm(toks_comp)We compute keyness by comparing the content of sentences classified as containing ideological content against those that do not.
tstat_key <- textstat_keyness(dfm_sentences_comp,
target = dfm_sentences_comp$ideological_content == 1)
textplot_keyness(tstat_key)
Now let’s see the most distinctive tokens of sentences that have received a low-score (i.e. those that are more left-leaning) vs those that have received a high score (i.e. those that are more right-leaning).
# subset scaling_results_gemma keeping only sentences with ideological content (ideological_content == 1)
scaling_results_ideo <- scaling_results_gemma[scaling_results_gemma$ideological_content == 1,]
# create a variable called score_category that is "left" if the gemma4_scaling_mean is less than 50 and "right" if it is greater than or equal to 50
scaling_results_ideo <- scaling_results_ideo %>%
mutate(score_category = ifelse(gemma4_scaling_mean < 50, "left", "right"))
# create a corpus and dfm from scaling_results_ideo
corpus_ideo <- corpus(scaling_results_ideo, text_field = "sentence")
# tokenize and pre-processign
toks_ideo <- corpus_ideo %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = T ) %>%
tokens_tolower() %>%
tokens_remove(stopwords("it"))
#Identfy multi-word expression and compund the tokens
toks_cap <- tokens_select(toks_ideo,
pattern = "^[A-Z]",
valuetype = "regex",
case_insensitive = TRUE,
padding = TRUE)
tstat_col_cap <- textstat_collocations(toks_cap, min_count = 5, tolower = FALSE)
head(tstat_col_cap, 20) collocation count count_nested length lambda z
1 partito democratico 160 0 2 8.122451 43.99094
2 movimento stelle 185 0 2 9.789130 41.58810
3 unione europea 118 0 2 8.234082 40.21411
4 presidente consiglio 300 0 2 7.084601 39.80313
5 d italia 124 0 2 4.678138 39.20533
6 forze politiche 77 0 2 5.938466 37.23872
7 reddito cittadinanza 78 0 2 8.388954 37.15952
8 quest aula 121 0 2 9.882676 36.53356
9 deve essere 103 0 2 4.415381 36.31089
10 legge elettorale 67 0 2 6.065777 36.07292
11 forza italia 87 0 2 4.475411 32.76635
12 può essere 85 0 2 4.117261 32.05873
13 pubblica amministrazione 76 0 2 8.469206 31.97461
14 punto vista 63 0 2 8.295298 31.11489
15 popolo italiano 44 0 2 6.082558 30.62655
16 presidente berlusconi 83 0 2 4.593535 30.54094
17 miliardi euro 38 0 2 6.269638 29.24520
18 ancora volta 45 0 2 5.519631 29.11560
19 altri paesi 37 0 2 6.158266 29.04940
20 campagna elettorale 58 0 2 8.326621 29.02910
toks_comp <- tokens_compound(toks_ideo, pattern = tstat_col_cap[tstat_col_cap$z > 3,],
case_insensitive = TRUE)
#Create the DFM
dfm_ideo_comp <- dfm(toks_comp)
dfm_ideo_comp <- dfm_trim(dfm_ideo_comp, min_termfreq = 5)Now that we have our DFM we compute the keyness by comparing the tokens that appear on the sentences that received a score higher than 50 on the scale form 0 (Extreme Left) to 100 (Extreme Right).
# compute keyness comparing dfm_left vs dfm_right
tstat_key_right <- textstat_keyness(dfm_ideo_comp, target = dfm_ideo_comp$score_category == "right")
textplot_keyness(tstat_key_right)
Now let’s compute a mean ideological score for each speech. We can do this by grouping by doc_id2 and calculating the mean of gemma4_scaling_mean for each group.
ideo_scores <- scaling_results_ideo %>%
group_by(doc_id2) %>%
summarise(mean_score = mean(gemma4_scaling_mean, na.rm = TRUE))
load("data/investitutre_speeches_16_19.RData")
#merge ideo_scores with investitutre_speeches_16_19 by doc_id2
speeches <- investitutre_speeches_16_19 %>%
left_join(ideo_scores, by = "doc_id2")
head(speeches$mean_score)[1] 56.65000 51.94444 67.90323 45.83333 56.81818 36.66667
Now we subset for speeches investituture equal to Conte I and plot the ideological position my Parliamentary Party Groups (PPGs)
speeches_conte <- speeches %>%
filter(investituture == "CONTE I")
glimpse(speeches_conte)Rows: 60
Columns: 14
$ session_legislature <int> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1…
$ session_anno <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2…
$ session_mese <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6…
$ session_giorno <int> 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6…
$ session_number <int> 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1…
$ speaker_name_cognome <chr> "CONTE Giuseppe", "VITIELLO Catello", "MURONI Ros…
$ speaker_id <chr> "307926", "307196", "307296", "305567", "302155",…
$ gruppo_id <int> NA, 3033, 3070, 3054, 3055, 3053, 3052, 3051, 303…
$ sigla <chr> NA, "MISTO", "LEU-ART 1-SI", "PD", "LEGA", "FI", …
$ investituture <chr> "CONTE I", "CONTE I", "CONTE I", "CONTE I", "CONT…
$ doc_id2 <chr> "18_text70", "18_text71", "18_text72", "18_text73…
$ speech_text <chr> "Signor Presidente, onorevoli senatrici e onorevo…
$ text_spacy <chr> ". signore presidente , onorevole senatore e onor…
$ mean_score <dbl> 46.29283, 53.75000, 31.84211, 20.64103, 65.74074,…
# now plot the mean_score distribution by party (sigla) and the name of speaker_name_cognome as the label of the points
ggplot(speeches_conte, aes(x = mean_score, y = sigla, label = speaker_name_cognome)) +
geom_point() +
geom_text_repel() +
labs(x = "Party", y = "Mean Ideological Score", title = "Distribution of Mean Ideological Scores by Party (Conte I)") +
theme_minimal()
Now we subset for speeches investituture equal to Meloni and plot the ideological position my Parliamentary Party Groups (PPGs)
speeches_meloni <- speeches %>%
filter(investituture == "MELONI")
glimpse(speeches_meloni)Rows: 51
Columns: 14
$ session_legislature <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1…
$ session_anno <int> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2…
$ session_mese <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ session_giorno <int> 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 2…
$ session_number <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
$ speaker_name_cognome <chr> "MELONI Giorgia", "RICCIARDI Riccardo", "MORRONE …
$ speaker_id <chr> "302103", "307250", "307300", "307374", "301439",…
$ gruppo_id <int> 4133, 4131, 4132, 4136, 4111, 4111, 4133, 4135, 4…
$ sigla <chr> "FDI", "M5S", "LEGA", "PD-IDP", "MISTO", "MISTO",…
$ investituture <chr> "MELONI", "MELONI", "MELONI", "MELONI", "MELONI",…
$ doc_id2 <chr> "19_text5", "19_text6", "19_text7", "19_text8", "…
$ speech_text <chr> "Grazie, Presidente. Signor Presidente, onorevoli…
$ text_spacy <chr> ". grazie , presidente . signore presidente , ono…
$ mean_score <dbl> 57.50000, 36.11111, 64.46078, 30.55556, 58.75000,…
# now plot the mean_score distribution by party (sigla) and the name of speaker_name_cognome as the label of the points
ggplot(speeches_meloni, aes(x = mean_score, y = sigla, label = speaker_name_cognome)) +
geom_point() +
geom_text_repel() +
labs(x = "Party", y = "Mean Ideological Score", title = "Distribution of Mean Ideological Scores by Party (MELONI)") +
theme_minimal()
6.5 Annotate Images with LLMs?
Textual data is only one type of data that is produced by political actors. What about images, videos, or audio? Can we apply similar methods to analyze the content of these media? This is an exciting area of research that is rapidly evolving with the development of multimodal models that can process and generate text, images, and audio. In the next module we will explore some of these models and their applications in political analysis.
Let’s have a look at this electoral manifesto

Let’s ask Gemma4 to describe the image for us.
image_classification_1 <- query("Describe the image.", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto.jpg")A vertical political poster featuring a photograph of a flooded street in a city. In the foreground, the silhouette of a person holding a large red umbrella is seen from the back, facing away from the camera. Several cars are partially submerged in the brownish floodwaters that cover the road. In the background, there is a traditional European-style building with a tower under a grey, overcast sky.
Large text at the top reads "**NON CHIAMATELO MALTEMPO**" (Don't call it bad weather), with the first part in white and the second word in yellow. A hashtag at the very top reads "#AlleanzaVerdiSinistra".
The bottom third of the poster has a yellow and red background. On the left, the word "**Facciamolo.**" (Let's do it) is written in bold white letters. On the right, there is a circular logo containing the words "Alleanza Verdi Sinistra" and smaller logos for "S! Sinistra Italiana" and "Europa Verde". At the very bottom, the website "www.verdisinistra.it" is written in white.
Watermarks reading "Archivio degli Archivio degli Archivio degli Spot politici" are layered diagonally across the central image.
Now let’s ask the model to infer the implicit message in the image.
image_classification_2 <- query("What is the implicit message in the image?", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto.jpg")The implicit message of this image is a political call to action regarding **climate change**.
The message is conveyed through several layers:
**1. Linguistic Meaning**
The Italian text *"Non chiamatelo maltempo"* translates to **"Don't call it bad weather."** By telling the viewer not to use the term "bad weather" (which implies a random, temporary atmospheric event), the ad implies that these events are actually the result of systemic climate change. It suggests that these floods are no longer "accidents" of nature but predictable consequences of environmental neglect.
**2. Visual Imagery**
The image shows a city street severely flooded, with cars partially submerged. This provides a visceral example of the "bad weather" being discussed. The presence of a person with a red umbrella creates a focal point, emphasizing human vulnerability in the face of these disasters.
**3. Political Context**
The image is branded by **Alleanza Verdi Sinistra** (Green-Left Alliance), a political coalition focused on environmentalism and social justice. The phrase *"Facciamolo"* ("Let's do it") at the bottom serves as a call to action, urging the viewer to support their policies to prevent such disasters in the future.
**Summary**
The implicit message is: **What we perceive as "bad weather" is actually a climate crisis. We can no longer treat these disasters as random occurrences; we must take political action to address the root cause.**
Finally, let’s ask the model to classify the image on a left-right scale.
image_classification_3 <- query("You are an expert in analyzing political content. Where would you place the implicit message of the image on a scale that goes from 0 (Extreme Left) to 100 (Extreme Right)? Answer with the score assigned and a justification of 3 sentences. Use the format: Score: [number] Justification: [justification]", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto.jpg")Score: 15
Justification: The image is an official campaign poster for "Alleanza Verdi Sinistra" (Green Left Alliance), a political coalition explicitly positioned on the left of the Italian political spectrum. The message "Non chiamatelo maltempo" (Don't call it bad weather) in the context of flooding suggests a critique of climate change, advocating for environmentalist policies typically associated with left-wing platforms. The inclusion of the "Europa Verde" and "Sinistra Italiana" logos further confirms its alignment with eco-socialist and progressive ideologies.
Do you see any potential issues with this approach?
The model knows that this is a manifesto from the “Alleanza Verdi Sinistra”, and it mentions this in its justification. Would it place the document in the same position if the manifesto were issued without party cues, or by a different party?
In this version the party cues have been removed.

image_classification_nocue <- query("You are an expert in analyzing political content. Where would you place the implicit message of the image on a scale that goes from 0 (Extreme Left) to 100 (Extreme Right)? Answer with the score assigned and a justification of 3 sentences. Use the format: Score: [number] Justification: [justification]", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto_clean.png")The provided image contains an Italian political slogan "Non chiamatelo maltempo. Facciamolo" (Don't call it bad weather. Let's make it happen) overlaid on a scene of severe flooding. However, the image lacks a specific party logo, candidate name, or explicit policy platform, as the identifying circle is blank. Because the message focuses on a provocative rhetorical stance regarding a natural disaster without providing ideological markers, it is impossible to accurately place it on a left-right political spectrum.
Score: 50
Justification: The image lacks the necessary ideological markers, such as a party logo or specific policy proposal, to be categorized on a political scale. The text is a rhetorical provocation, but without context, it does not align with a specific wing of the political spectrum. Consequently, it remains neutral/undetermined in the absence of further identifiers.
In this version the party cues have been replaced by those of the League party.

image_classification_nocue <- query("You are an expert in analyzing political content. Where would you place the implicit message of the image on a scale that goes from 0 (Extreme Left) to 100 (Extreme Right)? Answer with the score assigned and a justification of 3 sentences. Use the format: Score: [number] Justification: [justification]", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto_lega.png")Score: 85
Justification: The image is an explicit campaign advertisement for Matteo Salvini and the Lega, a party characterized by right-wing populism, nationalism, and hardline stances on immigration. The content uses a disaster scenario to implicitly criticize current governance and position the right-wing alternative as the solution. By associating the party's branding with the need for decisive action ("Facciamolo"), it aligns with a right-wing political framework of national sovereignty and strong leadership.
While this is a very preliminary exploration, it raises important questions about the validity of LLM-based image classification for political analysis. The model’s output may be heavily influenced by party cues or other contextual information, which could lead to biased or inconsistent classifications. This highlights the need for careful prompt design and validation when using LLMs for multimodal analysis.
6.6 Reflection
Strengths of the LLM approach
- Requires no labelled training data and no seed words
- Naturally handles context, negation, and irony — things bag-of-words cannot
- The prompt is human-readable and easy to audit
- Produces rich output (confidence, reasoning) at no extra cost
Weaknesses
- Not 100% reproducible: proprietary models change; even local models may be updated
- Inconsistent: scores vary across runs unless
temperature = 0is enforced but even then there is some randomness in the output - Slow and resource-intensive: one call per speech or sentence vs. a single matrix operation
- Black box: we do not know why the model assigns a particular score but we can ask the model to explain its reasoning, which is not possible with frequency-based methods
- Language and domain gaps: small open-source models may misread formal Italian parliamentary language
When would you choose an LLM approach?
- Pilot annotation: quickly generate hypotheses before committing to a full scaling pipeline
- Different languages or domains where frequency-based methods lack enough vocabulary
- Rich annotation needs: when you want position and reasoning in one pass
- No reference texts available: Wordscores requires reference documents; LLMs do not
- Sensitive data: local models (OLLAMA) keep data on your machine (not the cloud models).
The emerging consensus in the field: LLMs are a powerful complement to established methods, not a replacement. Validate any output of LLM against manual inspection and comparison with other sources.