Resources#

Below are useful resources, further reading, and links referenced throughout this course.

Course Libraries and Documentation#

Further Exploration#

Tokenization#

Embeddings#

Hugging Face Models Referenced#

Model

Course use

Licence listed by provider

mistralai/Mistral-7B-Instruct-v0.2

Hosted text-generation examples

Apache-2.0

openai-community/gpt2

GPT-2 tokenizer comparison

MIT

sentence-transformers/all-MiniLM-L6-v2

Embedding examples

Apache-2.0

google-bert/bert-base-uncased

Tokenizer comparison

Apache-2.0

google-t5/t5-small

Tokenizer comparison

Apache-2.0

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Tokenizer comparison

MIT listed by provider; model card notes this is a distillation from a Llama-derived base model

distilbert/distilgpt2

Local text-generation fallback

Apache-2.0

Python Package Licences#

Package

Course use

Licence

tiktoken

OpenAI-compatible token counting

MIT

transformers

Hugging Face tokenizers and local model loading

Apache-2.0

sentence-transformers

Embedding model wrapper

Apache-2.0

huggingface_hub

Hosted inference client

Apache-2.0

python-dotenv

Loading local .env files

BSD-3-Clause

scikit-learn

Cosine similarity utilities

BSD-3-Clause

plotly

Interactive plots

MIT

Data and Hosted Inference#

The practical prompting notebook can send prompt text to Hugging Face Inference Providers when hosted inference is enabled. Do not submit personal data, confidential research data, student or patient data, unpublished intellectual property, or data covered by ethics or data-sharing restrictions to hosted inference services unless you have explicit approval.

Hugging Face pricing, available providers, token scopes, and model availability can change. Check the current Hugging Face documentation before running live inference examples repeatedly or in teaching sessions.