Retrieving and Generating Data using LLMs
Python code and slides to use API to access LLMs. Visit the GitHub Repository
This open‑source notebook collection and slides demonstrate two complementary LLM paradigms, retrieval and generation, for turning raw text into structured, research‑ready data.
Retrieval notebooks show how to mine large document corpora to extract causal edges, stance labels, demographic attributes and other key fields (e.g., the pipeline powering www.causal.claims).
Generation notebooks start from minimal seed prompts and leverage the model’s prior to build production networks, innovation profiles and context‑aware keyword dictionaries (see aipnet.io and www.academicexpression.online).
Across both strands you will find hands‑on modules for prompt engineering, JSON‑schema enforcement, cost‑efficient batch calling, embedding‑based code mapping (HS6 / JEL) and validation routines such as modal voting and cosine sanity checks. By the end, users can scale or adapt each workflow—whether analysing messy policy PDFs or constructing supply‑chain graphs—while keeping costs predictable and outputs auditable