Step 4: (Optional) Advanced Model Settings

Model Temperature: Controls the randomness of the output. Lower values (e.g., 0.2) result in more deterministic responses, while higher values (e.g., 0.8) produce more creative and varied outputs.

Top P (Nucleus Sampling): Sets the probability threshold for token selection. The model considers only the tokens that make up the top cumulative probability (e.g., 0.9), ensuring a more focused output while still allowing some variability.

Frequency Penalty: Penalizes tokens that have already appeared in the output. This reduces repetition by discouraging the model from using the same words or phrases too frequently.

Presence Penalty: Encourages the model to introduce new topics or words by penalizing tokens that have already appeared. This helps diversify the generated content and prevents sticking too closely to the original input.

Max Tokens: Specifies the maximum number of output tokens (words or word pieces) that can be generated in a single output. It limits the length of the response to ensure concise and manageable outputs.

Last updated