Temperature as creativity in LLMs
Higher temp setting flattens the probability distribution. The financial advisor persona exhibits far less range than the creative writer!
I’ve toyed with the temperature settings in GPT-4, but was not aware of the mathematical idea underneath an LLM’s creativity until I read Logan Thorneloe’s explainer on the softmax function. I’m also fond of his infectious attitude:
The thing I love most about machine learning is that the math makes sense. On the surface, it seems really complex but I’m a firm believer that anyone can understand machine learning if they want to. Machine learning employs math in beautiful and simple ways to make numbers work the way we want them to.
Because it’s intuitive, I want to illustrate this softmax function in simple code. The illustrative prompt is “We live in Los Angeles, tomorrow we will travel to the ..” such that we only expect a one-word response. My naïve example assumes that training has already created this vector for us:
# We live in Los Angeles, tomorrow we will travel to the ..
values <- c("beach" = 9.1,
"mountains" = 7.2,
"lake" = 5.8,
"mall" = 3.4,
"park" = 2.3,
"city" = 1.5 )
Again, the complete code chunk is here. Lower temperature (aka, less creative) favors the more probable destinations, namely beach and mountains. Higher temperature (aka, more creative) flattens the resultant probability distribution. Here’s that vector under four different temperature settings:
Let’s try temperature variations in the API call
That’s all fine, but how does temperature manifest in a prompt’s response? For a taste of the variation, I utilized the OpenAI API to call GPT 3.5 from python. First, I’m able to define a system_message, which I think is equivalent to the GUI’s Custom Instruction (emphasis mine).
"You are a financial advisor specializing in portfolio allocation. Your goal is to provide personalized investment recommendations based on an individual's risk tolerance, financial goals, and market conditions."
Please note the prompt (user_message) is lame for this job; it’s just a tiny, unrealistic experiment:
“Given an investor's risk tolerance of High and a target retirement age of 65, suggest an optimal portfolio allocation strategy.”
Obviously, that’s not enough context! My arbitrary temperature settings are {0.1, 0.8, 1.0 and 1.5}. I ran a few variations but they were similar to the published variation. Interestingly:
At low temps, ~70% allocation to stocks and ~20% was the common reply
At higher temp, real estate gets included.
At temp of 1.5 (the highest I used), equities changes to “aggressive growth stocks” and real estate becomes REITs.
The question of asset allocation in a vacuum appears to be intrinsically uncreative, so I switched the system_message to (emphasis mine) this persona …
“You are a creative fiction writer who uses vivid imagery”
… and prompted
“Finish this sentence: She opened her cryptocurrency wallet and discovered ...”
There is a range of replies, from the creative to the hallucinatory. My interpretation is that the system_message (Custom) instruction is a highly influential anchor (at low temps the replies were all creative but often similar over each request). At temp = 1.5, I observed a blurring between poetry (or at least the appearance of poetry, to me) trailing into nonsensical.