Practical Prompting with LLMs#
Setup: Hugging Face Access Token#
This notebook can use Hugging Face Inference Providers for live text-generation examples. A free Hugging Face account may include limited monthly inference credits, and additional usage may require paid credit or an approved provider account. Check the current Hugging Face pricing before running repeated examples: https://huggingface.co/docs/inference-providers/pricing
Data warning: Do not submit personal data, confidential research data, student or patient data, unpublished intellectual property, or data covered by ethics or data-sharing restrictions to hosted inference services unless you have explicit approval.
Steps:
Sign up at https://huggingface.co.
Go to Settings -> Access Tokens: https://huggingface.co/settings/tokens.
Create a read or fine-grained token that can be used for inference calls.
Copy the token. It starts with
hf_.
Set the environment variable:
Option A - Create a .env file in this directory:
HF_TOKEN=hf_xxxxxxxxxxxxx
Option B - Set in your terminal:
export HF_TOKEN=hf_xxxxxxxxxxxxx
Setup#
# If needed, set your token for this notebook session.
# It is safer to use an environment variable or .env file than to keep a real token in the notebook.
# import os
# os.environ["HF_TOKEN"] = "hf_xxxxxxxxxxxxx"
Local setup
This notebook is excluded from the website build and is not executed during normal site generation. To run it locally from the course repository, install the optional LLM dependency group first:
poetry install --with llm
If you are running the notebook in a separate Jupyter or Colab environment, install the notebook packages there instead:
%pip install transformers huggingface-hub python-dotenv
# If needed, uncomment and run this in a notebook environment.
# %pip install transformers huggingface-hub python-dotenv
import os
import json
# Try to load environment variables if they are set with a .env file
try:
from dotenv import load_dotenv
load_dotenv()
except ImportError:
pass
class SimpleLLM:
"""Simple LLM wrapper supporting both API and local models."""
def __init__(self, model="mistralai/Mistral-7B-Instruct-v0.2", use_api=True):
"""
Initialize LLM.
Args:
model: Hugging Face model ID for hosted inference. If use_api=False,
this defaults to a small local model instead.
use_api: If True, use Hugging Face Inference Providers. If False,
use a local transformers pipeline.
"""
self.use_api = use_api
if self.use_api:
self._init_api(model)
else:
local_model = "distilgpt2" if model == "mistralai/Mistral-7B-Instruct-v0.2" else model
self._init_local(local_model)
def _init_api(self, model):
"""Initialize Hugging Face hosted inference."""
from huggingface_hub import InferenceClient
self.api_key = os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_API_KEY")
if not self.api_key:
raise ValueError(
"HF_TOKEN not found. Create a Hugging Face access token at "
"https://huggingface.co/settings/tokens and set it as HF_TOKEN, "
"or initialize SimpleLLM(use_api=False) for local-only examples."
)
self.model = model
self.client = InferenceClient(model=model, token=self.api_key)
print(f"Using {model} via Hugging Face Inference Providers")
def _init_local(self, model):
"""Initialize local transformers model."""
from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')
print(f"Loading {model} locally (first time may take a moment)...")
self.pipeline = pipeline(
"text-generation",
model=model,
device=-1 # Use CPU (no explicit torch calls needed)
)
print(f"Model loaded: {model}")
def generate(self, prompt, temperature=0.7, max_tokens=200):
"""Generate text."""
if self.use_api:
return self._generate_api(prompt, temperature, max_tokens)
else:
return self._generate_local(prompt, temperature, max_tokens)
def _generate_api(self, prompt, temperature, max_tokens):
"""Generate using API."""
generation_temperature = temperature if temperature > 0 else None
try:
result = self.client.text_generation(
prompt,
max_new_tokens=max_tokens,
temperature=generation_temperature,
do_sample=temperature > 0,
return_full_text=False,
)
except Exception as exc:
raise RuntimeError(
"Hugging Face inference request failed. Check your token, "
"credits, provider availability, and model access."
) from exc
return str(result).strip()
def _generate_local(self, prompt, temperature, max_tokens):
"""Generate using local model."""
result = self.pipeline(
prompt,
max_new_tokens=max_tokens,
temperature=temperature if temperature > 0 else 1.0,
do_sample=temperature > 0,
pad_token_id=50256, # GPT-2 EOS token
truncation=True
)
# Extract just the generated text (not the prompt)
generated = result[0]['generated_text'][len(prompt):].strip()
return generated
# Initialize LLM using hosted inference by default.
# For local-only examples, use: llm = SimpleLLM(use_api=False)
llm = SimpleLLM()
# Test it works
test = llm.generate("Hello, I am", temperature=0.7, max_tokens=20)
print(f"Test output: {test}")
Part 1: Temperature - Control Creativity#
Temperature controls randomness in LLM outputs:
Low (0.0-0.3): Consistent, factual, reliable
Medium (0.7-1.0): Balanced, natural
High (1.5+): Creative, diverse, risky
Rule of Thumb:#
📊 Factual tasks (data extraction, Q&A): Use temp = 0.0-0.3
✍️ Creative tasks (writing, brainstorming): Use temp = 0.7-1.2
🎨 Experimental (poetry, wild ideas): Use temp = 1.5+
# Factual question - we want consistency
factual_prompt = "The capital of France is"
print("📚 FACTUAL TASK: Capital of France\n")
for temp in [0.0, 0.7, 1.5]:
print(f"Temperature {temp}:")
for i in range(3):
response = llm.generate(factual_prompt, temperature=temp, max_tokens=20)
print(f" {i+1}. {factual_prompt} {response}")
print()
# Creative task - we want variety
creative_prompt = "Once upon a time in a distant galaxy,"
print("🎨 CREATIVE TASK: Sci-fi story opening\n")
for temp in [0.0, 0.7, 1.5]:
print(f"Temperature {temp}:")
for i in range(3):
response = llm.generate(creative_prompt, temperature=temp, max_tokens=40)
print(f" {i+1}. {response}")
print()
Key Takeaway#
Notice how:
Temp 0.0: All 3 runs are identical (deterministic)
Temp 0.7: Some variation, but coherent
Temp 1.5: Very diverse outputs
For factual tasks → use temp 0.0
For creative tasks → use higher temperature!
Part 2: JSON Output - Structured Data Extraction#
One of the most practical use cases: extracting structured data from text.
The Technique:#
✅ Use low temperature (0.0-0.3) for consistency
✅ Explicitly ask for JSON in your prompt
✅ Show the exact schema you want
✅ Say “Output ONLY valid JSON”
Example Use Cases:#
Extract product info from descriptions
Parse contact details from emails
Structure feedback into categories
Convert natural language to database entries
# Extract structured data from unstructured text
review_text = """John Smith bought iPhone 15 Pro for $999.
Happy with camera and battery, but price is steep. Rating: 4.5 stars."""
prompt = f"""Extract information from this review and output as JSON.
Review: {review_text}
Output ONLY valid JSON with this structure:
{{
"customer": "...",
"product": "...",
"price": 0,
"pros": [...],
"cons": [...],
"rating": 0.0
}}
JSON:"""
# Use low temperature for consistent, structured output
response = llm.generate(prompt, temperature=0.1, max_tokens=150)
print("Raw response:")
print(response)
print("\n" + "="*50 + "\n")
# Try to parse it
try:
# Extract JSON if there's extra text
json_start = response.find('{')
json_end = response.rfind('}') + 1
if json_start >= 0 and json_end > json_start:
json_str = response[json_start:json_end]
data = json.loads(json_str)
print("✅ Valid JSON! Parsed data:")
print(json.dumps(data, indent=2))
else:
raise ValueError("No JSON found")
except Exception as e:
print(f"❌ JSON parsing failed: {e}")
print("\nTip: Smaller models (like GPT-2) struggle with JSON.")
print("For production, use API with better models or fine-tune.")
Try It Yourself!#
Extract meeting details from an email:
email_text = """Hi team,
Let's meet Tuesday at 2pm to discuss Q4 budget.
Please invite Sarah from Finance and Mike from Operations.
Best,
Alice Johnson (alice.j@company.com)"""
# TODO: Write a prompt to extract: sender, email, time, attendees, topic
your_prompt = f"""Extract meeting details as JSON.
Email: {email_text}
Output JSON:
{{
"sender": "...",
"email": "...",
"meeting_time": "...",
"attendees": [...],
"topic": "..."
}}
JSON:"""
response = llm.generate(your_prompt, temperature=0.1, max_tokens=150)
print(response)
Part 3: Few-Shot Learning - Teach by Example#
Instead of explaining what you want, show examples.
The Pattern:#
Examples:
Input: [example 1]
Output: [result 1]
Input: [example 2]
Output: [result 2]
Now you:
Input: [actual task]
Output:
Why It Works:#
LLMs are excellent pattern matchers
Examples are clearer than descriptions
Works for complex tasks (sentiment, classification, formatting)
# Few-shot learning for custom sentiment categories
prompt = """Classify customer feedback into categories.
Examples:
Input: "The app crashes every time I try to login!"
Output: URGENT
Input: "Thanks for the quick support, issue resolved!"
Output: HAPPY
Input: "I have a question about billing."
Output: NEUTRAL
Input: "I've been waiting 3 days for a response..."
Output: FRUSTRATED
Now classify this:
Input: "Your product is amazing! Just recommended it to 5 friends."
Output:"""
response = llm.generate(prompt, temperature=0.0, max_tokens=10)
print(f"Classification: {response}")
# Few-shot for text transformation
prompt = """Convert casual text to formal business language.
Examples:
Input: "Hey, can you send that report asap? thx!"
Output: "Could you please send the report at your earliest convenience? Thank you."
Input: "Meeting's cancelled, something came up"
Output: "The meeting has been cancelled due to unforeseen circumstances."
Input: "Sounds good, let's do it!"
Output: "I agree with this proposal and suggest we proceed."
Now convert:
Input: "Nope, that won't work for us, way too expensive"
Output:"""
response = llm.generate(prompt, temperature=0.3, max_tokens=50)
print(f"Formal version: {response}")
Pro Tips for Few-Shot:#
2-5 examples is usually enough
Diverse examples cover edge cases
Consistent format makes patterns clear
Use low temperature (0.0-0.3) for classification tasks
Part 4: Chain-of-Thought - Better Reasoning#
For complex problems, ask the LLM to “think out loud” before answering.
The Magic Phrase:#
"Let's think step by step:"
Why It Works:#
Forces the model to break down the problem
Reduces logical errors
Makes reasoning transparent
Dramatically improves accuracy on math, logic, and multi-step tasks
# Math problem
problem = "A store has 23 apples. They sell 17 and receive a delivery of 45. How many apples do they have now?"
# WITHOUT chain-of-thought
basic_prompt = f"{problem}\n\nAnswer:"
response = llm.generate(basic_prompt, temperature=0.0, max_tokens=30)
print("❌ WITHOUT chain-of-thought:")
print(response)
print()
# WITH chain-of-thought
cot_prompt = f"{problem}\n\nLet's think step by step:"
response = llm.generate(cot_prompt, temperature=0.0, max_tokens=150)
print("✅ WITH chain-of-thought:")
print(response)
# Logical reasoning problem
problem = """Sarah is taller than Mike.
Mike is taller than Alex.
Alex is shorter than Jordan.
Jordan is shorter than Sarah.
Who is the tallest?"""
cot_prompt = f"{problem}\n\nLet's think through this step by step:"
response = llm.generate(cot_prompt, temperature=0.0, max_tokens=200)
print(response)
Combining All Techniques#
Real-world example: Analyze support tickets using JSON + Few-Shot + Chain-of-Thought
ticket = """Customer called about slow website. Dashboard takes 30 seconds to load.
Started after yesterday's update. Premium customer needs fix ASAP."""
prompt = f"""Analyze support tickets and extract structured information.
Example:
Ticket: "User can't login. Getting error 500. This is blocking their work."
Analysis:
Step 1: Issue type -> Technical (error 500)
Step 2: Urgency -> High (blocking work)
Step 3: Category -> Authentication
Result: {{"type": "Technical Error", "urgency": "HIGH", "category": "Authentication", "blocking": true}}
Now analyze:
Ticket: {ticket}
Let's think step by step:"""
response = llm.generate(prompt, temperature=0.1, max_tokens=250)
print(response)
Practice Exercise#
Your challenge: Build a movie review analyzer that:
Extracts rating, sentiment, pros/cons
Uses few-shot learning for custom categories
Uses appropriate temperature
Outputs as JSON
Try it below!
# Practice exercise - build your own!
review = """Just watched the new sci-fi movie. The special effects were mind-blowing
and the plot kept me guessing, but some of the dialogue felt a bit cheesy.
Overall, definitely worth watching. I'd give it 8/10."""
# TODO: Write a prompt that combines:
# - Few-shot examples of movie review analysis
# - Chain-of-thought reasoning
# - JSON output format
# - Appropriate temperature
your_prompt = """
# Your prompt here!
# Hint: Show 1-2 examples, ask for step-by-step analysis, request JSON output
"""
# Uncomment to test:
# response = llm.generate(your_prompt, temperature=0.1, max_tokens=300)
# print(response)