Model Settings

The advanced settings (sliders) for the models.

You can get to the Model Settings by going to the little gear at the top right of chat, then "Character Settings" then "Advanced." Each model has their own default settings depending on what the dev has determined is the best, but these can be changed at will.

Output Tokens

This basically changes the length of the response by arbitrarily cutting off the response once it reaches a certain amount of tokens. Generally, 4 characters is roughly 1 token. Most models default at 500. You can change this around without negative affect, so play with it to get your desired length.

As a note, Thinking models require around 1200-1500 Output tokens due to the 'thinking' counting toward the max.

Note that the amount of tokens is NOT equal to the amount of characters you get per response. An average length response (one that has multiple paragraphs but still is able to fit in your phone screen) usually holds around 300-350 tokens.

Temperature

Temperature controls the "creativity" of model response. Higher temperature makes the model select less predictable tokens, thus making text more lively and unexpected. (You can read about tokens in Tips section.) However, if Temperature is set too high, the response will turn into gibberish. Default Temperature on all models is chosen so that the balance between creativity and coherence stays optimal, but you can always make it higher if the replies feel bland, or lower if you feel like they don't make sense. It's a sensitive setting, so if you want to adjust it, do it in very small steps.

Top P

Top P is the percentage of tokens that you allow the LLM to use, ranged by probability. 1 is the whole pool of tokens, 0.95 is 95% top tokens with the least probable 5% cut off, etc. If you must move this, do not move it more than a little bit at a time.

Top K

Hard set number of top tokens that can be used. At 0 it's disabled, so it's the whole pool. At 1 it's literally one single word the bot is forced to choose every time.

Frequency Penalty

Penalizes words softly depending on how often they appear in the text. Every time a word repeats, it gets a small penalty added to lower its chance of appearing again, discouraging repetition. If it repeats again, the same penalty gets applied on top of previous one. If it is too high, it may make the response incoherent, apply with caution.

Presence Penalty

Unlike Frequency penalty, Presence penalizes every word that has appeared in text even once, which makes the AI try to select the words you haven't seen in the response yet. Too high value will result in gibberish.

Repetition Penalty

Repetition penalty, just like Frequency, penalizes words that keep repeating in text. But every time the word repeats, the Repetition hits it harder and with more force, so the results come faster than with Frequency, and your chat may devolve into incoherence very soon if you set this too high.

PreviousModels NextRecollections

Last updated 2 months ago