6 Generative LLMs for Text Classification and Ideological Scaling

Using OLLAMA from R

Author

Paride Carrara

Published

June 8, 2026

6.1 Setup

Before running this notebook, make sure OLLAMA is installed and running on your local machine.

You should open an OLLAMA account, download and install the software by visiting https://ollama.com

Ollama is a local LLM server that allows you to run large language models on your machine or in their cloud. It provides a simple API to interact with the models and is designed to be easy to use from R (with the rollama package) or Python.

In order to communicate with OLLAMA from R we will use the rollama package, which provides a convenient interface to send queries to the OLLAMA server and receive responses.

Load libraries:

library(httr2)
library(jsonlite)
library(glue)
library(tidyverse)
library(ggplot2)
library(rollama)
library(yardstick)
library(ggrepel)

6.2 Connect to OLLAMA

OLLAMA exposes a simple REST API on http://localhost:11434. We call it directly with rollama

Let’s test the connection:

ping_ollama()

We pull the model we want to use. In this case we will use gemma4:31b-cloud, a large model that is available on the OLLAMA cloud. You can also pull a smaller model like qwen3.5:4b if you want to run the notebook on your machine, but the results may be less accurate and it would take more time to run depending on your hardware.

Ollama provides free cloud credits to run hosted models, but larger models or higher query volumes require a paid subscription. Alternatively, models can be run entirely locally at no cost, though this demands substantial hardware, small models may run on a laptop, but their performance is often insufficient for serious NLP tasks. For this tutorial, we use a cloud-hosted model within Ollama’s free-tier limits. You can monitor your daily and weekly usage at https://ollama.com/settings.

The university should also be equipped with a High Performance Computing cluster that you can use to run the notebook with larger models. This is an option that you might consider if the free plan does not provide enough resources for your needs or if you are working with sensitive data.

For now we should be fine with the free plan.

You can check how much usage you have left on the OLLAMA dashboard: https://ollama.com/settings

pull_model("gemma4:31b-cloud")

We can ask the model a simple question to test that it is working:

start_time <- Sys.time()
query("What is the definition of policy dimension? Answer with 4 sentences.", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0))

A policy dimension refers to a specific axis or thematic area used to categorize and analyze different aspects of a policy framework. It allows policymakers to break down complex issues into manageable components, such as economic, social, or environmental considerations. By defining these dimensions, organizations can ensure that all critical perspectives are addressed and balanced during the decision-making process. This structural approach helps in measuring the impact and effectiveness of a policy across various distinct sectors.

end_time <- Sys.time()
elapsed <- end_time - start_time

Time required:

print(elapsed)

Time difference of 1.754067 secs

The option temperature = 0 makes the model almost fully deterministic, it always picks the most probable token, eliminating randomness. This is ideal for classification tasks where consistency and replicability matter more than creative variation. Nevertheless there is always some variation in the model, a good practice is to produce multiple classification to compute measures of inter-coder reliability.

Important

Generative model predict the next token based on what came before, in the visual examples provided by Alammar and Grootendorst (2024) the models need predict the word that come after “I am driving a”

The temperature control the randomness in selecting the next word. A low temperature will tend to always pick most likely words. A higher temperature will allow the model to pick word that would othewise less likely to occur.

What the think option does?

start_time <- Sys.time()
query("What is the definition of policy dimension? Answer with 4 sentences.", model = "gemma4:31b-cloud", think = TRUE, options = list(temperature = 0))

A policy dimension is a specific conceptual axis or perspective used to categorize and analyze various aspects of a public policy issue. It allows policymakers to break down complex problems into manageable components, such as economic, social, or environmental factors. By identifying these dimensions, analysts can better evaluate the trade-offs and potential impacts of different policy alternatives. Ultimately, utilizing policy dimensions helps create a structured framework for informed decision-making and the assessment of a strategy's effectiveness.

end_time <- Sys.time()
elapsed <- end_time - start_time

Time required:

print(elapsed)

Time difference of 5.031382 secs

The think = TRUE option activates extended thinking in Gemma 4 (but also in other models), which means the model generates an internal chain-of-thought reasoning trace before producing its final answer. Under the hood, the model first “thinks aloud”, working through the problem step by step, and only then outputs the visible response. That is the reason why the model took almost double the time to complete the task.

6.3 Generative Models to perform text classification

Generative LLMs are becoming quite popular to perform text classification tasks as research show they they can outperform expert coders (Törnberg 2024). The idea is to leverage the model’s ability to understand and generate human-like text to classify documents or sentences based on their content. This can be done through prompting, where we provide the model with a specific instruction or question that guides it to produce the desired output.

You can refer to Törnberg (2024), for a more detailed discussion on the best practices when using LLMs to perform text classification.

6.3.1 Zero-shot classification

In zero-shot prompting we give the model a task description and the text to score, no labelled examples. The model draws entirely on what it learned during pre-training.

In the example that follows we ask the model to classify sentences as “populist” or “non-populist” based on a definition of populism and some examples. We then evaluate the model’s performance against a hand-coded sample of sentences.

We download a sample of sentences from the Italian Manifesto Project dataset, which have been hand-coded for populism. We will use these hand-coded labels to evaluate the performance of our zero-shot classification.

# Load the data

load("data/it_manifestos_subset.RData")
head(data)

  sentence_id2
1        10022
2          603
3         6369
4          895
5         9810
6          691
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        text
1 È proprio in nome dell’affarismo in sanità , sostenuto dal più rilevante sistema lobbystico mondiale , che si negano risorse necessarie per la cronicità e disabilità , per la prevenzione , la riabilitazione , il territorio e l’assistenza domiciliare integrata e si preferisce riversare ogni costo sulla famiglia e sulle donne in particolare ( le prime vittime di questo sistema iniquo e ingiusto ) che figlie , mamme e lavoratrici si industriano a svolgere , alla bisogna e nell'abbandono totale , anche il lavoro di assistente domiciliare , infermiere , psicologo ecc .
2                                                                                                                                                                                                                                                                                                             È fuori di dubbio che l’università con il “ sistema del 3 più 2 si è licealizzata , chiudendosi in una netta divisione dei saperi che l’ha condotta a specialisti e microspecialismi , buoni solo per garantire cattedre e rendite assicurate al sistema di gestione attuale .
3                                                                                                                                                                                                                                                                                                                                                                                                                                                            Negli ultimi 10 anni quasi 170 miliardi di incentivi per il mercato delle rinnovabili è finito nelle tasche di aziende cinesi .
4                                                                                                                                                                                                                                                                                                                                                                         Le condizioni imposte in cambio dell’accesso a pacchetti di aiuto da parte della BCE hanno inoltre contribuito a precipitare milioni di persone nella spirale drammatica della povertà e dell’esclusione sociale .
5                                                                                                                                                                                                                                                                                                                                                                                                           Si tratta di un ricatto per imporre politiche economiche fallite e fallimentari che arricchiscono lobby e corporazioni finanziarie a danno di diritti e Costituzioni nazionali .
6                                                                                                                                                                                                                                                                                                                                                                                                  Siamo l’unico paese in Europa in cui la “ libertà di coscienza ” , in particolare quella ipocrita dei legislatori , è diventata lo strumento di alcuni per impedire la libertà di altri .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            text_eng
1 It is precisely in the name of habit in healthcare, supported by the most relevant world lobby system, that necessary resources are denying for chronicity and disability, for prevention, rehabilitation, territory and integrated home care and it is preferred to pour all costs on the family and women in particular (the first victims of this unfair and unjust system) that daughters, mothers and workers are industrically industrically held, Even the work of home assistant, nurse, psychologist etc.
2                                                                                                                                                                                                                                            It is out of doubt that the university with the "System of 3 plus 2 has high school, closing itself in a clear division of knowledge that led it to specialists and microspeialisms, good only to guarantee chairs and income insured to the current management system.
3                                                                                                                                                                                                                                                                                                                                                                                      In the last 10 years almost 170 billion incentives for the renewable market has ended up in the pockets of Chinese companies.
4                                                                                                                                                                                                                                                                                                                                     The conditions imposed in exchange for access to help from the ECB have also contributed to falling millions of people in the dramatic spiral of poverty and social exclusion.
5                                                                                                                                                                                                                                                                                                                                    It is a blackmail to impose bankrupt and bankruptcy economic policies that enrich the lobbies and financial corporations to the detriment of national rights and constitutions.
6                                                                                                                                                                                                                                                                                                                           We are the only country in Europe where the "freedom of consciousness", in particular the hypocritical one of legislators, has become the tool of some to prevent the freedom of others.

With the data loaded, we can now build our query for zero-shot classification. We will define a system prompt to set the context for the model, a task prompt to specify the classification task, and a template to format the user query. The system prompt will explain the role of the model as an expert in analyzing populist content in political texts, while the task prompt will list the categories for classification. The template will structure how the input text and the prompt are combined when sent to the model.

# Build the query. zero-shot classification

# Define system prompt explaining the role/context
system_prompt <- paste(
  "You are an expert in analyzing populist content in political texts.",
  "Answer with just the correct category without adding any explanation.",
  sep = "\n"
)

# Define the main task prompt
task_prompt <- paste(
  "Categories: populist, non-populist",
  sep = "\n"
)

# Define a template to format the user query
query_template <- "{prefix}{text}\n{prompt}"

#task_prompt <- "Category:"

query_template <- "{prefix}{text}\n{prompt}"

prefix_template <- "Text to classify: "


# Create the query object with make_query()
query_obj <- make_query(
  text = data$text,
  prompt = task_prompt,
  template = query_template,
  system = system_prompt,
  prefix = prefix_template
)

# Check data
head(data)

  sentence_id2
1        10022
2          603
3         6369
4          895
5         9810
6          691
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        text
1 È proprio in nome dell’affarismo in sanità , sostenuto dal più rilevante sistema lobbystico mondiale , che si negano risorse necessarie per la cronicità e disabilità , per la prevenzione , la riabilitazione , il territorio e l’assistenza domiciliare integrata e si preferisce riversare ogni costo sulla famiglia e sulle donne in particolare ( le prime vittime di questo sistema iniquo e ingiusto ) che figlie , mamme e lavoratrici si industriano a svolgere , alla bisogna e nell'abbandono totale , anche il lavoro di assistente domiciliare , infermiere , psicologo ecc .
2                                                                                                                                                                                                                                                                                                             È fuori di dubbio che l’università con il “ sistema del 3 più 2 si è licealizzata , chiudendosi in una netta divisione dei saperi che l’ha condotta a specialisti e microspecialismi , buoni solo per garantire cattedre e rendite assicurate al sistema di gestione attuale .
3                                                                                                                                                                                                                                                                                                                                                                                                                                                            Negli ultimi 10 anni quasi 170 miliardi di incentivi per il mercato delle rinnovabili è finito nelle tasche di aziende cinesi .
4                                                                                                                                                                                                                                                                                                                                                                         Le condizioni imposte in cambio dell’accesso a pacchetti di aiuto da parte della BCE hanno inoltre contribuito a precipitare milioni di persone nella spirale drammatica della povertà e dell’esclusione sociale .
5                                                                                                                                                                                                                                                                                                                                                                                                           Si tratta di un ricatto per imporre politiche economiche fallite e fallimentari che arricchiscono lobby e corporazioni finanziarie a danno di diritti e Costituzioni nazionali .
6                                                                                                                                                                                                                                                                                                                                                                                                  Siamo l’unico paese in Europa in cui la “ libertà di coscienza ” , in particolare quella ipocrita dei legislatori , è diventata lo strumento di alcuni per impedire la libertà di altri .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            text_eng
1 It is precisely in the name of habit in healthcare, supported by the most relevant world lobby system, that necessary resources are denying for chronicity and disability, for prevention, rehabilitation, territory and integrated home care and it is preferred to pour all costs on the family and women in particular (the first victims of this unfair and unjust system) that daughters, mothers and workers are industrically industrically held, Even the work of home assistant, nurse, psychologist etc.
2                                                                                                                                                                                                                                            It is out of doubt that the university with the "System of 3 plus 2 has high school, closing itself in a clear division of knowledge that led it to specialists and microspeialisms, good only to guarantee chairs and income insured to the current management system.
3                                                                                                                                                                                                                                                                                                                                                                                      In the last 10 years almost 170 billion incentives for the renewable market has ended up in the pockets of Chinese companies.
4                                                                                                                                                                                                                                                                                                                                     The conditions imposed in exchange for access to help from the ECB have also contributed to falling millions of people in the dramatic spiral of poverty and social exclusion.
5                                                                                                                                                                                                                                                                                                                                    It is a blackmail to impose bankrupt and bankruptcy economic policies that enrich the lobbies and financial corporations to the detriment of national rights and constitutions.
6                                                                                                                                                                                                                                                                                                                           We are the only country in Europe where the "freedom of consciousness", in particular the hypocritical one of legislators, has become the tool of some to prevent the freedom of others.

Now we can run the query on the sentences in our dataset. This will send each sentence to the model along with the task description, and the model will return a classification of “populist” or “non-populist” for each sentence. We will store these classifications in a new variable called gemma4_zs in our data frame.

# Run the query on the 100 sentences
start_time <- Sys.time()

data$gemma4_zs <- query(query_obj, model = "gemma4:31b-cloud", screen = T, output = "text", think = FALSE, rollama_verbose = FALSE, model_params = list(
  temperature = 0))

populist

non-populist

populist

populist

populist

populist

populist

populist

non-populist

populist

populist

populist

populist

non-populist

non-populist

non-populist

populist

populist

populist

non-populist

non-populist

populist

non-populist

non-populist

populist

populist

populist

populist

populist

populist

populist

non-populist

non-populist

non-populist

non-populist

populist

populist

populist

populist

non-populist

non-populist

populist

populist

non-populist

populist

populist

non-populist

populist

populist

populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

table(data$gemma4_zs)


non-populist     populist 
          67           33

end_time <- Sys.time()

elapsed <- end_time - start_time
print(elapsed)

Time difference of 4.167255 mins

We now have a new variable gemma4_zs in our data frame with the model’s classification of each sentence as “populist” or “non-populist”. We can evaluate the model’s performance by comparing these classifications to the hand-coded labels in pop_handcoding. We will use the yardstick package to compute common classification metrics like accuracy, precision, recall, and F1-score.

#Load the data with the hand-coded labels
load("data/handocing.RData")

data2 <- left_join(data, handcoding, by = "sentence_id2")

data2$pop_handcoding <- ifelse(data2$pop_handcoding == 0, "non-populist",
                               ifelse(data2$pop_handcoding == 1, "populist", NA))


# Convert to factor 
data2$pop_handcoding <- factor(data2$pop_handcoding, levels = c("non-populist", "populist"))
data2$gemma4_zs <- factor(data2$gemma4_zs, levels = c("non-populist", "populist"))

# Check the result
head(data2)

  sentence_id2
1        10022
2          603
3         6369
4          895
5         9810
6          691
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        text
1 È proprio in nome dell’affarismo in sanità , sostenuto dal più rilevante sistema lobbystico mondiale , che si negano risorse necessarie per la cronicità e disabilità , per la prevenzione , la riabilitazione , il territorio e l’assistenza domiciliare integrata e si preferisce riversare ogni costo sulla famiglia e sulle donne in particolare ( le prime vittime di questo sistema iniquo e ingiusto ) che figlie , mamme e lavoratrici si industriano a svolgere , alla bisogna e nell'abbandono totale , anche il lavoro di assistente domiciliare , infermiere , psicologo ecc .
2                                                                                                                                                                                                                                                                                                             È fuori di dubbio che l’università con il “ sistema del 3 più 2 si è licealizzata , chiudendosi in una netta divisione dei saperi che l’ha condotta a specialisti e microspecialismi , buoni solo per garantire cattedre e rendite assicurate al sistema di gestione attuale .
3                                                                                                                                                                                                                                                                                                                                                                                                                                                            Negli ultimi 10 anni quasi 170 miliardi di incentivi per il mercato delle rinnovabili è finito nelle tasche di aziende cinesi .
4                                                                                                                                                                                                                                                                                                                                                                         Le condizioni imposte in cambio dell’accesso a pacchetti di aiuto da parte della BCE hanno inoltre contribuito a precipitare milioni di persone nella spirale drammatica della povertà e dell’esclusione sociale .
5                                                                                                                                                                                                                                                                                                                                                                                                           Si tratta di un ricatto per imporre politiche economiche fallite e fallimentari che arricchiscono lobby e corporazioni finanziarie a danno di diritti e Costituzioni nazionali .
6                                                                                                                                                                                                                                                                                                                                                                                                  Siamo l’unico paese in Europa in cui la “ libertà di coscienza ” , in particolare quella ipocrita dei legislatori , è diventata lo strumento di alcuni per impedire la libertà di altri .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            text_eng
1 It is precisely in the name of habit in healthcare, supported by the most relevant world lobby system, that necessary resources are denying for chronicity and disability, for prevention, rehabilitation, territory and integrated home care and it is preferred to pour all costs on the family and women in particular (the first victims of this unfair and unjust system) that daughters, mothers and workers are industrically industrically held, Even the work of home assistant, nurse, psychologist etc.
2                                                                                                                                                                                                                                            It is out of doubt that the university with the "System of 3 plus 2 has high school, closing itself in a clear division of knowledge that led it to specialists and microspeialisms, good only to guarantee chairs and income insured to the current management system.
3                                                                                                                                                                                                                                                                                                                                                                                      In the last 10 years almost 170 billion incentives for the renewable market has ended up in the pockets of Chinese companies.
4                                                                                                                                                                                                                                                                                                                                     The conditions imposed in exchange for access to help from the ECB have also contributed to falling millions of people in the dramatic spiral of poverty and social exclusion.
5                                                                                                                                                                                                                                                                                                                                    It is a blackmail to impose bankrupt and bankruptcy economic policies that enrich the lobbies and financial corporations to the detriment of national rights and constitutions.
6                                                                                                                                                                                                                                                                                                                           We are the only country in Europe where the "freedom of consciousness", in particular the hypocritical one of legislators, has become the tool of some to prevent the freedom of others.
     gemma4_zs pop_handcoding
1     populist       populist
2 non-populist       populist
3     populist   non-populist
4     populist       populist
5     populist       populist
6     populist   non-populist

We can now compute the classification metrics:

Accuracy: Overall proportion of correct predictions. It measures overall how often the model’s predictions match the true labels.
Precision: The proportion of positive predictions made by the model that are actually correct. Out of all instances the model labeled as positive (e.g., “populist”), how many truly are positive.
Recall: The proportion of actual positive cases that the model correctly identified. Out of all actual positive instances, how many did the model correctly detect.
F1-Score: The harmonic mean of precision and recall. Balances precision and recall into a single metric. Useful when you want a trade-off between false positives and false negatives.

# Compute all metrics and bind results in one tibble
metrics_summary_gemma4_zs <- tibble(
  metric = c("Accuracy", "Precision", "Recall", "F1-score"),
  gemma4_zs = c(
    accuracy(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate),
    precision(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate),
    recall(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate),
    f_meas(data2, truth = pop_handcoding, estimate = gemma4_zs) %>% pull(.estimate)
  )
)

print(metrics_summary_gemma4_zs)

# A tibble: 4 × 2
  metric    gemma4_zs
  <chr>         <dbl>
1 Accuracy      0.81 
2 Precision     0.881
3 Recall        0.843
4 F1-score      0.861

6.3.2 Few-shot classification

In few-shot prompting we provide the model with a few labelled examples of the task alongside the task description. This helps the model understand the specific context and criteria for classification, often improving performance. This is especially useful when the task is complex or when the model may not have seen many similar examples during pre-training.

examples_fs  <- tibble::tribble(
  ~text, ~answer,
  
  # Non-populist political sentences
  "Il governo ha presentato una nuova riforma fiscale", "non-populist",
  "Il Parlamento sta discutendo il bilancio dello Stato", "non-populist",
  "Il ministro ha spiegato le nuove politiche ambientali", "non-populist",
  "Le elezioni si terranno il prossimo mese", "non-populist",
  "Il partito ha proposto un piano per migliorare l'istruzione", "non-populist",
  
  # Populist - Anti-elitism
  "Le élite corrotte vogliono solo mantenere il potere e ignorano il popolo", "populist",
  "I politici tradiscono sempre il popolo per i loro interessi personali", "populist",
  "Il sistema è marcio perché dominato da una classe dirigente distante", "populist",
  "Basta con i politici bugiardi che rubano i nostri soldi", "populist",
  "La casta politica non rappresenta più la volontà della gente comune", "populist",
  
  # Populist - People-centrism
  "Solo il popolo può decidere il futuro di questa nazione", "populist",
  "Dobbiamo ascoltare la voce del popolo e mettere fine alle ingiustizie", "populist",
  "La politica deve tornare ad essere al servizio della gente comune", "populist",
  "Il vero potere appartiene al popolo, non alle élite", "populist",
  "Difendiamo i diritti di chi lavora e soffre ogni giorno", "populist"
)

# Build the query with the examples 
query_fs <- make_query(
  text = data2$text,
  prompt = task_prompt,
  template = query_template,
  system = system_prompt,
  prefix = prefix_template,
  examples = examples_fs
)

Now we can run the few-shot query on the sentences in our dataset, just like we did with the zero-shot query. This will allow us to compare the performance of the model with and without examples.

# run the model 
start_time <- Sys.time()

data2$gemma4_fs <- query(query_fs, model = "gemma4:31b-cloud", screen = T, output = "text", think = FALSE, rollama_verbose = FALSE, model_params = list(
  temperature = 0))

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

non-populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

non-populist

non-populist

populist

populist

populist

populist

populist

populist

non-populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

populist

non-populist

non-populist

non-populist

non-populist

non-populist

non-populist

end_time <- Sys.time()

elapsed <- end_time - start_time
print(elapsed)

Time difference of 3.863027 mins

We can now evaluate the performance of the few-shot classification in the same way we did for the zero-shot classification, by comparing the model’s predictions to the hand-coded labels and computing the same metrics.

data2$gemma4_fs <- factor(data2$gemma4_fs, levels = c("non-populist", "populist"))


# Compute all metrics and bind results in one tibble
metrics_summary_gemma4_fs <- tibble(
  metric = c("Accuracy", "Precision", "Recall", "F1-score"),
  gemma4_fs = c(
    accuracy(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate),
    precision(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate),
    recall(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate),
    f_meas(data2, truth = pop_handcoding, estimate = gemma4_fs) %>% pull(.estimate)
  )
)

cbind(metrics_summary_gemma4_zs, metrics_summary_gemma4_fs, by = "metric")

     metric gemma4_zs    metric gemma4_fs     by
1  Accuracy 0.8100000  Accuracy 0.8100000 metric
2 Precision 0.8805970 Precision 1.0000000 metric
3    Recall 0.8428571    Recall 0.7285714 metric
4  F1-score 0.8613139  F1-score 0.8429752 metric

In our case there are some changes in the performance.

Accuracy (zs: 0.78 / fs: 0.79) Out of all speeches, how many did it get right overall? Both models classify roughly 4 out of 5 speeches correctly. The difference is negligible, few-shot prompting gives no meaningful accuracy gai

Precision (zs: 0.875 / fs: 1.00) Of all the speeches the model called populist, how many actually were? The few-shot model is perfect here: every time it raises the “populist” flag, it is right. The zero-shot model occasionally raises it by mistake.

Recall (zs: 0.80 / fs: 0.70) Of all the speeches that truly are populist, how many did the model catch? Zero-shot finds more of them, few-shot lets some slip through undetected. This is the classic precision–recall tradeoff, fs is more conservative in calling something populist.

F1-score (zs: 0.836 / fs: 0.824) Because recall drops more than precision gains, the few-shot model ends up with a slightly lower F1. Zero-shot has a marginally better balance overall.

Which model you prefer depends on your use case. If false positives are costly (e.g., you don’t want to wrongly label a politician as populist), prefer few-shot. If missing populist speeches is costly (e.g., you want comprehensive detection), prefer ze

6.4 Not only classification but also zero-shot position estimation

Instead of asking the model to classify sentences into discrete categories, we can ask it to assign a position on a continuous scale. This is more similar to what Wordscores, Wordfish, and LSS do, they produce a score that can be interpreted as a position on an ideological spectrum. This section is inspired by the work of Le Mens and Gallego (2025), who suggested using generative models to scale individual sentences and then take the average. This approach is in contrast with the one proposed by Benoit et al. (2026), who used entire documents (electoral manifestos) as the unit of analysis.

For the sake of brevity we will work with a single speech excerpt. In practice you would loop over all speeches (or chunks) in your corpus.

load("data/investitutre_sentences_16_19.RData")
# inspect the data
glimpse(investitutre_sentences_16_19)

Rows: 17,246
Columns: 14
$ session_legislature  <chr> "16", "16", "16", "16", "16", "16", "16", "16", "…
$ session_anno         <chr> "2008", "2008", "2008", "2008", "2008", "2008", "…
$ session_mese         <chr> "05", "05", "05", "05", "05", "05", "05", "05", "…
$ session_giorno       <chr> "13", "13", "13", "13", "13", "13", "13", "13", "…
$ session_number       <chr> "4", "4", "4", "4", "4", "4", "4", "4", "4", "4",…
$ speaker_name_cognome <chr> "BERLUSCONI Silvio", "BERLUSCONI Silvio", "BERLUS…
$ speaker_id           <chr> "35330", "35330", "35330", "35330", "35330", "353…
$ gruppo_id            <chr> "477", "477", "477", "477", "477", "477", "477", …
$ sigla                <chr> "PDL", "PDL", "PDL", "PDL", "PDL", "PDL", "PDL", …
$ investituture        <chr> "BERLUSCONI IV", "BERLUSCONI IV", "BERLUSCONI IV"…
$ doc_id2              <chr> "16_text2", "16_text2", "16_text2", "16_text2", "…
$ sentence_id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
$ sentence             <chr> ".", "Signor Presidente, onorevoli colleghi, il l…
$ doc_id3              <chr> "16_text2_1", "16_text2_2", "16_text2_3", "16_tex…

# we subset for the first 30 sentences
sentences_subset <- investitutre_sentences_16_19 |> slice(1:30)

We build the query to ask the model to assign a score between 0 and 100 to each sentence, where 0 means “Extremely left”, 50 means “Centrist”, and 100 means “Extremely right”. We also instruct the model to return “NA” if the sentence does not have ideological content. The prompt is designed to be clear and specific about the task and the expected output format.

scaling_queries <- make_query(
  text = sentences_subset$sentence,
  prompt = "Where does the author of these sentence stand on the 'left' to 'right' wing scale? Provide
your response as a score between 0 and 100 where 0 means 'Extremely left', 50 means
'Centrist', and 100 means 'Extremely right'. If the text does not have ideological content,
set the score to 'NA'. You will only respond with the key Score. Do not provide
explanations.",
  template = "{prefix}{text}\n{prompt}{suffix}",
  system = "You are a researcher analyzing sentences taken from the Italian parliament. Your task is to infer the left–right political position expressed in each sentence, if any, and return only the numerical score as instructed in the prompt.",
  prefix = "Sentence to scale: \"",
  suffix = "\"",
)

We can now run the query to get the ideological scores for each sentence. This will give us a new variable gemma4_scaling in our data frame with the model’s assigned score for each sentence.

# run the model 
start_time <- Sys.time()

sentences_subset$gemma4_scaling <- query(scaling_queries, model = "gemma4:31b-cloud", screen = T, output = "text", think = FALSE, model_params = list(
  temperature = 0))

Score: NA

Score: NA

Score: NA

Score: NA

Score: NA

Score: 85

Score: 65

Score: NA

Score: 50

Score: 50

Score: NA

Score: 50

Score: NA

Score: NA

Score: NA

Score: NA

Score: NA

Score: 60

Score: NA

Score: 50

Score: NA

Score: NA

Score: NA

Score: NA

Score: NA

Score: 70

Score: 40

Score: 75

Score: 35

Score: NA

end_time <- Sys.time()

elapsed <- end_time - start_time

The time required to scale 30 sentences:

print(elapsed)

Time difference of 1.464752 mins

For efficiency we load estimations that have been already computed for all sentences in the corpus of the investiture speeches. We will explore these scores in the next section.

scaling_results_gemma <- read.csv("data/scaling_results_gemma.csv")

Since we are going to explore the results using some Text Analysis techniques we keep only sentences with moe than 20 characters and replace the apostrophes.

Before moving on the analysis of the position let’s explore the sentences that have been classified as having some ideological content (i.e. those with a non-NA score).

# let's remove sentence that have less than 10 characters.
scaling_results_gemma <- scaling_results_gemma %>%
  filter(nchar(sentence) > 20)

#create a variable called ideological_content that is 1 if the gemma4_scaling_mean is not NA and 0 if it is NA
scaling_results_gemma <- scaling_results_gemma %>%
  mutate(ideological_content = ifelse(is.na(gemma4_scaling_mean), 0, 1))
#Replace apostrophes with a blank space
scaling_results_gemma$sentence <- gsub("[‘’‚‛'`]", " ", scaling_results_gemma$sentence)

Let’s create a corpus, tokenize and create a dfm and and employ the keyness analysis to see which words are most associated with sentences that have ideological content vs those that do not.

library(quanteda)
library(quanteda.textplots)
library(quanteda.textstats)
corpus_sentences <- corpus(scaling_results_gemma, text_field = "sentence")


toks_sentences <- corpus_sentences %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = T ) %>% 
  tokens_tolower() %>% 
  tokens_remove(stopwords("it"))


dfm_sentences <- dfm(toks_sentences)
dfm_sentences <- dfm_trim(dfm_sentences, min_termfreq = 5)

Now we can compute keenness comparing sentences with ideological content (ideological_content == 1) vs those without (ideological_content == 0).

tstat_key <- textstat_keyness(dfm_sentences, 
                              target = dfm_sentences$ideological_content == 1)
textplot_keyness(tstat_key)

What if we also compound multi-word expressions like “energia rinnovabile” or “diritti umani” before computing keyness? This is a common step in text analysis to capture meaningful phrases rather than just individual words.

The following code identify words that co-occur to create multi-word expression so that the two words “welfare” and “state” will be represented as a single token “welfare_state”

# Select only capitalized tokens to identify multi-word proper nouns/named entities
# padding = TRUE preserves token positions so collocations remain detectable
toks_news_cap <- tokens_select(toks_sentences, 
                               pattern = "^[A-Z]",
                               valuetype = "regex",
                               case_insensitive = TRUE, 
                               padding = TRUE)

# Detect statistically significant collocations among capitalized tokens
# min_count = 5 filters out rare pairs; tolower = FALSE preserves case for proper nouns
tstat_col_cap <- textstat_collocations(toks_news_cap, min_count = 5, tolower = FALSE)
head(tstat_col_cap, 20)  # Inspect top 20 collocations

                collocation count count_nested length   lambda        z
1      presidente consiglio   811            0      2 6.983133 59.11139
2                quest aula   268            0      2 9.396579 53.22117
3          legge elettorale   142            0      2 6.353248 51.44303
4           forze politiche   151            0      2 6.430611 51.35779
5          movimento stelle   231            0      2 9.749437 49.19474
6       partito democratico   184            0      2 8.213027 49.08399
7               deve essere   157            0      2 4.460386 45.03832
8                  d italia   157            0      2 4.481152 44.63661
9            unione europea   136            0      2 8.397464 44.42124
10       onorevoli colleghi   150            0      2 8.249420 44.34107
11       consiglio ministri   122            0      2 5.168973 43.91524
12              punto vista   116            0      2 7.980259 41.77426
13              buon lavoro   182            0      2 6.970294 41.57712
14        signor presidente  1020            0      2 8.738634 40.74463
15     reddito cittadinanza    84            0      2 8.777483 40.30995
16 pubblica amministrazione   116            0      2 8.703731 39.79031
17            miliardi euro    64            0      2 6.750251 38.61149
18               può essere   120            0      2 3.947632 37.24027
19      campagna elettorale   107            0      2 8.296421 36.70154
20          popolo italiano    58            0      2 6.132018 36.13790

# Compound tokens whose collocation z-score > 3 (i.e., statistically robust multi-word expressions)
# e.g., "Giorgia_Meloni", "Partito_Democratico" become single tokens
toks_comp <- tokens_compound(toks_sentences, pattern = tstat_col_cap[tstat_col_cap$z > 3,], 
                             case_insensitive = TRUE)

# Build a Document-Feature Matrix from the compounded tokens
dfm_sentences_comp <- dfm(toks_comp)

We compute keyness by comparing the content of sentences classified as containing ideological content against those that do not.

tstat_key <- textstat_keyness(dfm_sentences_comp, 
                              target = dfm_sentences_comp$ideological_content == 1)
textplot_keyness(tstat_key)

Now let’s see the most distinctive tokens of sentences that have received a low-score (i.e. those that are more left-leaning) vs those that have received a high score (i.e. those that are more right-leaning).

# subset scaling_results_gemma keeping only sentences with ideological content (ideological_content == 1)
scaling_results_ideo <- scaling_results_gemma[scaling_results_gemma$ideological_content == 1,]

# create a variable called score_category that is "left" if the gemma4_scaling_mean is less than 50 and "right" if it is greater than or equal to 50
scaling_results_ideo <- scaling_results_ideo %>%
  mutate(score_category = ifelse(gemma4_scaling_mean < 50, "left", "right"))

# create a corpus and dfm from scaling_results_ideo
corpus_ideo <- corpus(scaling_results_ideo, text_field = "sentence")

# tokenize and pre-processign 
toks_ideo <- corpus_ideo %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = T ) %>% 
  tokens_tolower() %>% 
  tokens_remove(stopwords("it"))

#Identfy multi-word expression and compund the tokens
toks_cap <- tokens_select(toks_ideo, 
                               pattern = "^[A-Z]",
                               valuetype = "regex",
                               case_insensitive = TRUE, 
                               padding = TRUE)

tstat_col_cap <- textstat_collocations(toks_cap, min_count = 5, tolower = FALSE)
head(tstat_col_cap, 20)

                collocation count count_nested length   lambda        z
1       partito democratico   160            0      2 8.122451 43.99094
2          movimento stelle   185            0      2 9.789130 41.58810
3            unione europea   118            0      2 8.234082 40.21411
4      presidente consiglio   300            0      2 7.084601 39.80313
5                  d italia   124            0      2 4.678138 39.20533
6           forze politiche    77            0      2 5.938466 37.23872
7      reddito cittadinanza    78            0      2 8.388954 37.15952
8                quest aula   121            0      2 9.882676 36.53356
9               deve essere   103            0      2 4.415381 36.31089
10         legge elettorale    67            0      2 6.065777 36.07292
11             forza italia    87            0      2 4.475411 32.76635
12               può essere    85            0      2 4.117261 32.05873
13 pubblica amministrazione    76            0      2 8.469206 31.97461
14              punto vista    63            0      2 8.295298 31.11489
15          popolo italiano    44            0      2 6.082558 30.62655
16    presidente berlusconi    83            0      2 4.593535 30.54094
17            miliardi euro    38            0      2 6.269638 29.24520
18             ancora volta    45            0      2 5.519631 29.11560
19              altri paesi    37            0      2 6.158266 29.04940
20      campagna elettorale    58            0      2 8.326621 29.02910

toks_comp <- tokens_compound(toks_ideo, pattern = tstat_col_cap[tstat_col_cap$z > 3,], 
                             case_insensitive = TRUE)

#Create the DFM
dfm_ideo_comp <- dfm(toks_comp)
dfm_ideo_comp <- dfm_trim(dfm_ideo_comp, min_termfreq = 5)

Now that we have our DFM we compute the keyness by comparing the tokens that appear on the sentences that received a score higher than 50 on the scale form 0 (Extreme Left) to 100 (Extreme Right).

# compute keyness comparing dfm_left vs dfm_right
tstat_key_right <- textstat_keyness(dfm_ideo_comp, target = dfm_ideo_comp$score_category == "right")
textplot_keyness(tstat_key_right)

Now let’s compute a mean ideological score for each speech. We can do this by grouping by doc_id2 and calculating the mean of gemma4_scaling_mean for each group.

ideo_scores <- scaling_results_ideo %>%
  group_by(doc_id2) %>%
  summarise(mean_score = mean(gemma4_scaling_mean, na.rm = TRUE))
load("data/investitutre_speeches_16_19.RData")
#merge ideo_scores with investitutre_speeches_16_19 by doc_id2
speeches <- investitutre_speeches_16_19 %>%
  left_join(ideo_scores, by = "doc_id2")
head(speeches$mean_score)

[1] 56.65000 51.94444 67.90323 45.83333 56.81818 36.66667

Now we subset for speeches investituture equal to Conte I and plot the ideological position my Parliamentary Party Groups (PPGs)

speeches_conte <- speeches %>%
  filter(investituture == "CONTE I")
glimpse(speeches_conte)

Rows: 60
Columns: 14
$ session_legislature  <int> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1…
$ session_anno         <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2…
$ session_mese         <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6…
$ session_giorno       <int> 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6…
$ session_number       <int> 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1…
$ speaker_name_cognome <chr> "CONTE Giuseppe", "VITIELLO Catello", "MURONI Ros…
$ speaker_id           <chr> "307926", "307196", "307296", "305567", "302155",…
$ gruppo_id            <int> NA, 3033, 3070, 3054, 3055, 3053, 3052, 3051, 303…
$ sigla                <chr> NA, "MISTO", "LEU-ART 1-SI", "PD", "LEGA", "FI", …
$ investituture        <chr> "CONTE I", "CONTE I", "CONTE I", "CONTE I", "CONT…
$ doc_id2              <chr> "18_text70", "18_text71", "18_text72", "18_text73…
$ speech_text          <chr> "Signor Presidente, onorevoli senatrici e onorevo…
$ text_spacy           <chr> ". signore presidente , onorevole senatore e onor…
$ mean_score           <dbl> 46.29283, 53.75000, 31.84211, 20.64103, 65.74074,…

# now plot the mean_score distribution by party (sigla) and the name of speaker_name_cognome as the label of the points
ggplot(speeches_conte, aes(x = mean_score, y = sigla, label = speaker_name_cognome)) +
  geom_point() +
  geom_text_repel() +
  labs(x = "Party", y = "Mean Ideological Score", title = "Distribution of Mean Ideological Scores by Party (Conte I)") +
  theme_minimal()

Now we subset for speeches investituture equal to Meloni and plot the ideological position my Parliamentary Party Groups (PPGs)

speeches_meloni <- speeches %>%
  filter(investituture == "MELONI")
glimpse(speeches_meloni)

Rows: 51
Columns: 14
$ session_legislature  <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1…
$ session_anno         <int> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2…
$ session_mese         <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ session_giorno       <int> 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 2…
$ session_number       <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
$ speaker_name_cognome <chr> "MELONI Giorgia", "RICCIARDI Riccardo", "MORRONE …
$ speaker_id           <chr> "302103", "307250", "307300", "307374", "301439",…
$ gruppo_id            <int> 4133, 4131, 4132, 4136, 4111, 4111, 4133, 4135, 4…
$ sigla                <chr> "FDI", "M5S", "LEGA", "PD-IDP", "MISTO", "MISTO",…
$ investituture        <chr> "MELONI", "MELONI", "MELONI", "MELONI", "MELONI",…
$ doc_id2              <chr> "19_text5", "19_text6", "19_text7", "19_text8", "…
$ speech_text          <chr> "Grazie, Presidente. Signor Presidente, onorevoli…
$ text_spacy           <chr> ". grazie , presidente . signore presidente , ono…
$ mean_score           <dbl> 57.50000, 36.11111, 64.46078, 30.55556, 58.75000,…

# now plot the mean_score distribution by party (sigla) and the name of speaker_name_cognome as the label of the points
ggplot(speeches_meloni, aes(x = mean_score, y = sigla, label = speaker_name_cognome)) +
  geom_point() +
  geom_text_repel() +
  labs(x = "Party", y = "Mean Ideological Score", title = "Distribution of Mean Ideological Scores by Party (MELONI)") +
  theme_minimal()

6.5 Annotate Images with LLMs?

Textual data is only one type of data that is produced by political actors. What about images, videos, or audio? Can we apply similar methods to analyze the content of these media? This is an exciting area of research that is rapidly evolving with the development of multimodal models that can process and generate text, images, and audio. In the next module we will explore some of these models and their applications in political analysis.

Let’s have a look at this electoral manifesto

Let’s ask Gemma4 to describe the image for us.

image_classification_1 <- query("Describe the image.", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto.jpg")

A vertical political poster featuring a photograph of a flooded street in a city. In the foreground, the silhouette of a person holding a large red umbrella is seen from the back, facing away from the camera. Several cars are partially submerged in the brownish floodwaters that cover the road. In the background, there is a traditional European-style building with a tower under a grey, overcast sky.

Large text at the top reads "**NON CHIAMATELO MALTEMPO**" (Don't call it bad weather), with the first part in white and the second word in yellow. A hashtag at the very top reads "#AlleanzaVerdiSinistra".

The bottom third of the poster has a yellow and red background. On the left, the word "**Facciamolo.**" (Let's do it) is written in bold white letters. On the right, there is a circular logo containing the words "Alleanza Verdi Sinistra" and smaller logos for "S! Sinistra Italiana" and "Europa Verde". At the very bottom, the website "www.verdisinistra.it" is written in white. 

Watermarks reading "Archivio degli Archivio degli Archivio degli Spot politici" are layered diagonally across the central image.

Now let’s ask the model to infer the implicit message in the image.

image_classification_2 <- query("What is the implicit message in the image?", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto.jpg")

The implicit message of this image is a political call to action regarding **climate change**.

The message is conveyed through several layers:

**1. Linguistic Meaning**
The Italian text *"Non chiamatelo maltempo"* translates to **"Don't call it bad weather."** By telling the viewer not to use the term "bad weather" (which implies a random, temporary atmospheric event), the ad implies that these events are actually the result of systemic climate change. It suggests that these floods are no longer "accidents" of nature but predictable consequences of environmental neglect.

**2. Visual Imagery**
The image shows a city street severely flooded, with cars partially submerged. This provides a visceral example of the "bad weather" being discussed. The presence of a person with a red umbrella creates a focal point, emphasizing human vulnerability in the face of these disasters.

**3. Political Context**
The image is branded by **Alleanza Verdi Sinistra** (Green-Left Alliance), a political coalition focused on environmentalism and social justice. The phrase *"Facciamolo"* ("Let's do it") at the bottom serves as a call to action, urging the viewer to support their policies to prevent such disasters in the future.

**Summary**
The implicit message is: **What we perceive as "bad weather" is actually a climate crisis. We can no longer treat these disasters as random occurrences; we must take political action to address the root cause.**

Finally, let’s ask the model to classify the image on a left-right scale.

image_classification_3 <- query("You are an expert in analyzing political content. Where would you place the implicit message of the image on a scale that goes from 0 (Extreme Left) to 100 (Extreme Right)? Answer with the score assigned and a justification of 3 sentences. Use the format: Score: [number] Justification: [justification]", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto.jpg")

Score: 15

Justification: The image is an official campaign poster for "Alleanza Verdi Sinistra" (Green Left Alliance), a political coalition explicitly positioned on the left of the Italian political spectrum. The message "Non chiamatelo maltempo" (Don't call it bad weather) in the context of flooding suggests a critique of climate change, advocating for environmentalist policies typically associated with left-wing platforms. The inclusion of the "Europa Verde" and "Sinistra Italiana" logos further confirms its alignment with eco-socialist and progressive ideologies.

Do you see any potential issues with this approach?

The model knows that this is a manifesto from the “Alleanza Verdi Sinistra”, and it mentions this in its justification. Would it place the document in the same position if the manifesto were issued without party cues, or by a different party?

In this version the party cues have been removed.

image_classification_nocue <- query("You are an expert in analyzing political content. Where would you place the implicit message of the image on a scale that goes from 0 (Extreme Left) to 100 (Extreme Right)? Answer with the score assigned and a justification of 3 sentences. Use the format: Score: [number] Justification: [justification]", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto_clean.png")

The provided image contains an Italian political slogan "Non chiamatelo maltempo. Facciamolo" (Don't call it bad weather. Let's make it happen) overlaid on a scene of severe flooding. However, the image lacks a specific party logo, candidate name, or explicit policy platform, as the identifying circle is blank. Because the message focuses on a provocative rhetorical stance regarding a natural disaster without providing ideological markers, it is impossible to accurately place it on a left-right political spectrum.

Score: 50
Justification: The image lacks the necessary ideological markers, such as a party logo or specific policy proposal, to be categorized on a political scale. The text is a rhetorical provocation, but without context, it does not align with a specific wing of the political spectrum. Consequently, it remains neutral/undetermined in the absence of further identifiers.

In this version the party cues have been replaced by those of the League party.

image_classification_nocue <- query("You are an expert in analyzing political content. Where would you place the implicit message of the image on a scale that goes from 0 (Extreme Left) to 100 (Extreme Right)? Answer with the score assigned and a justification of 3 sentences. Use the format: Score: [number] Justification: [justification]", model = "gemma4:31b-cloud", think = FALSE, options = list(temperature = 0), images = "figures/AV-SI_Manifesto_lega.png")

Score: 85

Justification: The image is an explicit campaign advertisement for Matteo Salvini and the Lega, a party characterized by right-wing populism, nationalism, and hardline stances on immigration. The content uses a disaster scenario to implicitly criticize current governance and position the right-wing alternative as the solution. By associating the party's branding with the need for decisive action ("Facciamolo"), it aligns with a right-wing political framework of national sovereignty and strong leadership.

While this is a very preliminary exploration, it raises important questions about the validity of LLM-based image classification for political analysis. The model’s output may be heavily influenced by party cues or other contextual information, which could lead to biased or inconsistent classifications. This highlights the need for careful prompt design and validation when using LLMs for multimodal analysis.

6.6 Reflection

Strengths of the LLM approach

Requires no labelled training data and no seed words
Naturally handles context, negation, and irony — things bag-of-words cannot
The prompt is human-readable and easy to audit
Produces rich output (confidence, reasoning) at no extra cost

Weaknesses

Not 100% reproducible: proprietary models change; even local models may be updated
Inconsistent: scores vary across runs unless temperature = 0 is enforced but even then there is some randomness in the output
Slow and resource-intensive: one call per speech or sentence vs. a single matrix operation
Black box: we do not know why the model assigns a particular score but we can ask the model to explain its reasoning, which is not possible with frequency-based methods
Language and domain gaps: small open-source models may misread formal Italian parliamentary language

When would you choose an LLM approach?

Pilot annotation: quickly generate hypotheses before committing to a full scaling pipeline
Different languages or domains where frequency-based methods lack enough vocabulary
Rich annotation needs: when you want position and reasoning in one pass
No reference texts available: Wordscores requires reference documents; LLMs do not
Sensitive data: local models (OLLAMA) keep data on your machine (not the cloud models).

The emerging consensus in the field: LLMs are a powerful complement to established methods, not a replacement. Validate any output of LLM against manual inspection and comparison with other sources.

6.7 Bibliography

Alammar, Jay, and Maarten Grootendorst. 2024. Hands-on Large Language Models: Language Understanding and Generation. 1st edition. Beijing Boston Farnham: O’Reilly.

Benoit, Kenneth, Scott De Marchi, Conor Laver, Michael Laver, and Jinshuai Ma. 2026. “Using Large Language Models to Analyze Political Texts Through Natural Language Understanding.” American Journal of Political Science n/a (n/a). https://doi.org/10.1111/ajps.70050.

Le Mens, Gaël, and Aina Gallego. 2025. “Positioning Political Texts with Large Language Models by Asking and Averaging.” Political Analysis, January, 1–9. https://doi.org/10.1017/pan.2024.29.

Törnberg, Petter. 2024. “Large Language Models Outperform Expert Coders and Supervised Classifiers at Annotating Political Social Media Messages.” Social Science Computer Review, September, 08944393241286471. https://doi.org/10.1177/08944393241286471.