Model Settings
The advanced settings (sliders) for the models.
Last updated
The advanced settings (sliders) for the models.
Last updated
You can get to the Model Settings by going to the little gear at the top right of chat, then "Character Settings" then "Advanced." Each model has their own default settings depending on what the dev has determined is the best, but these can be changed at will.
This basically changes the length of the response by arbitrarily cutting off the response once it reaches a certain amount of tokens. Generally, 4 characters is roughly 1 token. Most models default at 500. You can change this around without negative affect, so play with it to get your desired length.
As a note, Thinking models require around 1200-1500 Output tokens due to the 'thinking' counting toward the max.
Temperature controls the "creativity" of model response. Higher temperature makes the model select less predictable tokens, thus making text more lively and unexpected. (You can read about tokens in Tips section.) However, if Temperature is set too high, the response will turn into gibberish. Default Temperature on all models is chosen so that the balance between creativity and coherence stays optimal, but you can always make it higher if the replies feel bland, or lower if you feel like they don't make sense. It's a sensitive setting, so if you want to adjust it, do it in very small steps.
Top P is the percentage of tokens that you allow the LLM to use, ranged by probability. 1 is the whole pool of tokens, 0.95 is 95% top tokens with the least probable 5% cut off, etc. If you must move this, do not move it more than a little bit at a time.
Hard set number of top tokens that can be used. At 0 it's disabled, so it's the whole pool. At 1 it's literally one single word the bot is forced to choose every time.
Penalizes words softly depending on how often they appear in the text. Every time a word repeats, it gets a small penalty added to lower its chance of appearing again, discouraging repetition. If it repeats again, the same penalty gets applied on top of previous one. If it is too high, it may make the response incoherent, apply with caution.
Unlike Frequency penalty, Presence penalizes every word that has appeared in text even once, which makes the AI try to select the words you haven't seen in the response yet. Too high value will result in gibberish.
Repetition penalty, just like Frequency, penalizes words that keep repeating in text. But every time the word repeats, the Repetition hits it harder and with more force, so the results come faster than with Frequency, and your chat may devolve into incoherence very soon if you set this too high.