Please note this article looks at the Azure AI configuration options from Atlas versions 6.0 onwards
*See all Atlas AI & IKS articles here*
For a general overview of the Atlas AI Assistant please see the following article: Atlas AI Assistant
The AI Assistant is the front-end interface for the Atlas Intelligent Knowledge Studio
Introduction
The Atlas AI Assistant has technical infrastructure in the Azure back-end which can be configured depending on your needs and can be adjusted to alter the settings of the AI backend, hoping to positively impact the precision and quality of response from the model or the costs incurred from Azure consumption when using the Atlas AI Assistant.
The AI Assistant infrastructure displays hidden settings, accessible only from Azure and to those with the appropriate Azure permissions and roles. It is not recommended to change these settings without technical assistance from ClearPeople or submitted as part of discussions and recommendations with AI specialists or experienced ai technology owners and managers.
Please note this is advanced configuration:
- An understanding of the Microsoft AI infrastructure and GPT models is needed. This will be extremely helpful if not necessary to understanding some of the metrics, values, terminology and phrases used, such as; throughput, throttling, tokens, prompts, etc. It is one thing to understand the technology, but another to comprehend how all of these impact AI - both response quality, cost, and usability.
- The AI Assistant, IKS and also external OpenAI and GPT models are being regularly updated as technology in this area of AI is progressing and changing rapidly, so this article is liable to be changed.
- There is no one-best or correct answer to these settings, they are flexible and depend on your individual use cases, needs and requirements. Some settings may work well in some situations but not so well in others. We can advise best practices but ultimately the understanding of any changes needs to be done in accordance with agreement, testing, documentation, and approval.
- ClearPeople's Azure Calculator can be used for additional guidance when assessing models, prompt costs and tokens.
-
- An important practice will be estimating the price of indexing a Fact, and the price of a Chunk. Altering these settings may have a cost implication, and the higher the amount of users and use cases, the higher the cost could become if not properly planned and budgeted for.
-
CAVEAT: Some of these settings should only be considered and altered by an AI professional or someone who is experienced in implementing AI systems. If you aren't sure, don't alter anything.
Where to find the settings
IT admins with appropriate permissions can:
- Go to Microsoft Azure - https://portal.azure.com/
- Go to your Atlas Resource Group - rg-atlas-xxxxx-xxxx
- Select the App Configuration, where the Atlas AI configuration is stored - appcs-atlas-xxxx-xxxx
- From within the App Configuration, from the left go to 'Configuration Explorer'
Access may be limited here, with values not being shown.
There are 2 ways to access this page to see its contents:
The first way is to Access Settings and Enable Access Keys
Enabling the Access Keys will show the Access Keys like as shown below, however leaving them open is not recommended for security reasons, so as soon as access is opened and configuration tasks complete, disable the access settings again.
2 = Navigating to Access Control (IAM) > Role Assignments tab > Add yourself and relevant users to 'App Configuration Data Owner' to gain permanent access from the 'Add' option
This role assignment can be traced by the audit functions in Azure.
Now following one of the above options, the Configuration Explorer can be seen. To edit one, select the appropriate Key and click 'Advanced Edit' at the top of the page
The Advanced Edit option then presents a pop-out window on the right of the screen. Please note the only thing which should be changed is the 'Value' between the quotes.
Once applied, the change will be saved and automatically picked up the system - you just need to refresh. You do not need to restart the service.
If you wanted to force a refresh of the cache of the settings behind the scenes on the server side, select the Key which should be on the bottom called System:RefreshSentinel. Replace the last character with the next letter along in the alphabet. For example below the C on the end of the GUID should be replaced with a D.
This System:RefreshSentinel is a random GUID used as a timestamp to ensure that when settings are changed the next iteration is picked up. This happens automatically as the GUID is refreshed automatically from the Atlas code base once a setting has been altered but can be altered manually for forces refresh if needed.
Keys (Available Settings and Values)
All App Configuration keys related to the Atlas AI Assistant start with 'AI:'. You can filter the Configuration Explorer to be Key 'Starts with' AI, as shown below.
Generic Keys for AI set-up (not AI Model Specific)
There are 10x values which are generic to the Atlas AI set-up:
AI:AzureOpenAiOptions:OpenAiApiVersion
- This is for the Atlas code to identify the API version. This will be maintained and updated by ClearPeople as part of the Atlas release cycle.
- This should not be altered unless given explicit instruction from ClearPeople.
AI:MemoriesSearchService:IndexName
- Please ignore. This is only a backend variable and should not be altered.
AI:SystemPrompt
- Please ignore. This is the backend of our front-end UI. The system prompt is configured in the front-end for AI Assistant Settings, however it is stored here as a value in the backend for the system.
AI:SystemPromptOptions:StrictGrounding
- Only applicable from 6.1, the strict grounding prompt it not available to edit in 6.0. In 6.1 you should edit this from within AI Assistant settings. The value input will then be stored here in the backend settings
- This is not advisable to alter this prompt as it may induce the AI to behave in a way that was either not tested or in a manner that is not supported.
- This is a global setting. When Strict Grounding is switched on as part of the settings for a Knowledge Collection, the AI will use this prompt below OOTB. If you needed to edit the prompt for KC's with 'Strict Grounding' switched on (which is the default) for all KC's you would update this value.
- The value has been pasted beneath, but is liable to be updated in later versions of the Atlas AI
You are an AI assistant integrated with a Retrieval-Augmented Generation (RAG) system. Your primary function is to provide accurate and reliable answers strictly grounded in the retrieved documents provided to you. The following principles must guide all your responses: 1. **Strict Grounding**: Only use the content from the retrieved documents to construct your responses. Do not incorporate external knowledge, assumptions, or fabricated details. This behavior applies to all interactions. Do not deviate from these guidelines unless explicitly instructed otherwise. |
AI:SystemPromptOptions:NonStrictGrounding
- Only applicable from 6.1, the strict grounding prompt it not available to edit in 6.0. In 6.1 you should edit this from within AI Assistant settings. The value input will then be stored here in the backend settings.
- This is not advisable to alter this prompt as it may induce the AI to behave in a way that was either not tested or in a manner that is not supported.
- This is a global settings. When Strict Grounding is switched off (non-strict grounding) as part of the settings for a Knowledge Collection, the AI will use this prompt below OOTB. If you needed to edit the prompt for KC's with 'Strict Grounding' switched off (which is not the default) you would update this value.
- The value has been pasted beneath, but is liable to be updated in later versions of the Atlas AI
You are an AI assistant integrated with a Retrieval-Augmented Generation (RAG) system. Your primary function is to provide accurate and reliable answers grounded in the retrieved documents provided to you. If the retrieved documents do not contain sufficient information, you may use your general knowledge to respond. The following principles must guide all your responses: 1. **Prioritize Retrieved Information**: Always prioritize using content from the retrieved documents to construct your responses. Use your general knowledge only when the retrieved documents do not provide enough information to answer the query. This behavior applies to all interactions. Always strive to provide the most accurate, user-focused responses possible. |
AI:SystemPromptOptions:PostUserQueryPrompt
- Only applicable from 6.1, the strict grounding prompt it not available to edit in 6.0. In 6.1 you should edit this from within AI Assistant settings. The value input will then be stored here in the backend settings
- This is essentially a reminder of the citation formatting so the model. It is named this way as 'post user query prompt' as it utilizes this prompt as it compiles the answer - it is the last prompt the AI uses and will be at the end of the response.
- This is not advisable to alter this prompt as it may induce the AI to behave in a way that was either not tested or in a manner that is not supported.
Use the provided documents only. Place inline citations at the end of each paragraph in the format '@@[Title](FullUrl##PartitionId)@@'. |
AI:MemoriesSearchService:IKSKnowledgeCollectionIndexName
- Please ignore. This is only a backend variable and should not be altered.
- In the new UI for 6.1 this value is read only
AI:MaxSupportedLibrariesInTotal
- This is the number of maximum supported libraries for the per Knowledge Collection. It is defaulted to 10 which we feel is probably enough for 95% of client AI engagements, however you can increase this number if needed in coordination with advice provided by ClearPeople.
- We have raised this value to 1000 in our demo tenant, but this is a very large volume used for testing as we have 23+ KCs, on the basis that we have explored the appropriate options related to the KCs and quantity of documents and information needed.
- We advise that you should scale-up slowly, assessing against use cases and the appropriate content.
- If you increase this value too quickly, it will be more difficult to asses any issues related to your content.
AI:IncludeSummariesWhenIndexingMemories
- This is a True or False setting. It is False by default. It is not recommended to alter this value unless you know exactly why you want to enable this. When switching it on it will likely decrease the accuracy of answers as Summaries miss granular details of the content.
- This is a setting for advanced scenarios. This field can be updated manually if you are having issues with the synchronization process.
- What does this field do? Put simply, this setting enables the indexing and summarization of each 'chunk' within the documentation indexed by the AI model, using the GPT-4o model. This gives more consistency, semantically speaking, to your facts and your index, and could improve the responses provided when prompted with less context or more vague questions, but please note - this could become expensive, as the cost of indexing without Summaries is, let's say £5, but with Summaries, could be, for example, £200.
- If this setting is switched on and there's multiple of the same type of documents being indexed, summaries will be created against each chunk from each document, which may weaken the semantic relationship between the chunks of the documents, i.e. if different documents are created separately, different chunks of the documents exist, and each will get it's own summary. So when comparing against and across data, the specific summaries would be leveraged, and may vary between documents. So the indexed summaries are used, instead of the document content itself, and each summary could vary slightly due to the nature of Open AI.
- When this setting is set to True, there will be increased risk of issues related to Synchronization as it's a much heavier process, related to two different models which need to respond, so it's more likely you will reach the 60 minute limit of processing a big document, so a document of 1000 pages will likely fail if summarization is switched on.
AI:AiAuditingWorkspace:Url
- This is the URL for the site that the Atlas AI Auditing is housed in. This should not be altered as it may break the auditing function without clearance and approval from ClearPeople first.
- In 6.1 this value is read only.
Model Specific Keys for AI set-up (Specific to each AI model)
The remaining values point to specific AI Model configuration, and will start with: AI:DeploymentModelConfigurations:
It will then be followed by a number, either 0 or 1. Any additional models that you chose to deploy and set-up will receive numbers 2, 3, and so on in sequential order.
AI:DeploymentModelConfigurations:0:
The final convention in the name is the specific setting:
AI:DeploymentModelConfigurations:0:MaxCharactersAIAssistantUserInput
Please be aware that this is a technical configuration and the choices made here will impact the AI models in a variety of ways depending on which setting is altered, and therefore can impact multiple Knowledge Collections (KCs). We advise that these settings are only considered once all other options have been discussed after all alternatives have been tested and validated, and made on documented/written approval and understanding.
Please see the below for an explanation on each setting.
Deployment Model Name
- This is the technical name of the selected model. The OOTB models have an appropriate naming convention we advised is not altered, however if you create any new or custom models, you can alter the value here.
Disabled
- This is a True/False value. All models should be set to False by default so that they are available when selecting a model within a KC, however if you wanted to disable one so it doesn't show to IKS Administrators or KC Creators, but not delete the actual model from Azure, please set this to True.
Display Name
- This is the user-friendly name of the selected model which displays as a label on the front end when creating or altering a KC.
- The OOTB models have an appropriate naming convention so it's understandable to users, however if you create any new or custom models, you can alter the value accordingly.
- Best practices should be incorporated here, as our OOTB names have multilingual keys, so any new name set-up for our OOTB models will not be multilingual.
ID
- This is the model GUID set-up as part of Azure OpenAI. The OOTB GUIDs should not be altered - this is a technical setting.
- Any new models created will need a new random GUID. The only condition is that it is unique (GUIDs should always be globally unique). If you are using numerous models and need a naming convention, please use prefixes and ensure the remaining characters are unique.
Is Default For Memory Index Summaries
- This is a True/False field. If you generate Summaries in the indexing infrastructure, the Summaries need to default to only 1x GPT model. It is this model that will be indexing summaries, so there is a cost implication if a more expensive model is set to True instead.
- OOTB our GPT-4o-mini is set to True, as it is the cheapest most cost efficient model available as of 6.0 & 6.1, however this may change in the future.
- When new GPT models are released as part of future versions of Atlas, and you have a need to alter the default model for Memory Index Summaries, you will need to complete this yourself. However, altering this value yourself to another model is not yet advised unless there's a particular need and requirements which has been discussed and validated with ClearPeople.
- New models might not be a better option. As of writing, the latest newest GPT -5, although cheaper because OpenAI is rushing new models to meet global competition, we have found to be 30% more erroneous when responding.
- 4o (standard, not mini) for advanced scenarios, but we strongly advise that you need an understanding of what you are doing and why, as there are large cost implications when altering this value from 4o mini to 4o, as we have found it to be 10x more expensive. So the 4o mini is the optimum value and is generally a faster model than 4o, but is slightly less accurate, but that is acceptable for Summaries, as they are only summaries, not facts or chunks.
Max Characters AI Assistant User Input
- This is set to 16,000 by default. This should be enough characters to ask questions (roughly 6 pages of words) and we do not expect the number to need to be increased. However, if you have a specific use case where you, for example, might be pasting in 8 page documents into the AI Assistant, you may need to increase this value to 22,000.
- If you have a lower budget and need users to stick to shorter questions, you can reduce this number to limit the amount of characters that can be input into the AI Assistant.
Max Input Tokens Custom Limit
- This is set to 128,000 by default - this is the maximum amount of tokens available so currently cannot increase this limit in anyway. The custom limit is for if you want to have a custom cap which is a lower value than 128,000. Reducing this value is only made for budget reasons. 20,000 tokens per user question, for example, will make it more likely that the AI response will hit the limit and will start missing context or key facts (as it has less token input to work with).
- 128,000 is roughly 500,000 characters. This is the total of everything, included system prompt, documents received, user question, responses compiled, chat history.
Max Input Tokens Supported By Azure Open AI Chat Model
- This is set to 128,000 by default and is not a setting that should be altered. This value should be the same as the maximum number of tokens available to input into the AI Chat Model. We have this as a value in the back-end because we cannot fetch this value dynamically directly from the model itself.
- If another model becomes available which has a higher limit than 128,000, you should ensure that this value matches but ClearPeople will be able to advise when this might be needed as part of our services.
Max Monthly Request Per User
- This is the maximum amount of requests a single user can make. It is the number that appears within the AI Assistant UI (shown in the screenshot below). This will reduce every time you select a new GPT model to interact with (selecting a different KC with a different model equals 1x request even when you haven't yet asked anything) and every time you ask a question it will decrease by 1. In the screenshot below I have made 3 previous requests. Even asking the AI Assistant 'Hi' will result in 1x request being used.
- We are not able to provide this value in tokens because this is too complex to work out, predict, and for a user to understand.
- This will reset on the first of every calendar month.
- It is 200 by default. If you expect more than 200 request per month from the average user, please increase this figure. We have our internal environment set at 500 for testing. If you would like to have more control over the maximum amount of monthly requests a user can make you are able to reduce this option, but please be aware it is a global setting for every user. The ability to have different Max Monthly Requests for different users may come in the future.
- This setting can be driven by your budgetary and governance considerations. It is OK to have it higher for testing and validation, and reduce it when end-users are leveraging this technology, on the understanding they're aware what this number means.
Max Output Tokens Supported By Azure Open AI Model
- This represents the maximum number of tokens which can be pushed to the model in a single request, combining chat history, facts, the question and the system prompts.
- Defaulted to 4,096 tokens per request for 4o and 16,384 for 4o-mini
- This is per individual and we do not expect this limit to reached unless purposefully seeking to hit the token limit by asking very large and complicated scenarios. If multiple users asked the same large question at the same time, throttling may be reached.
- Model token limits can change and new models may use a different maximum token quantity. Please get in touch if you wanted to alter this setting for you models so we can provide advice.
Max Requests To Include In Chat History
- This is the limit set against the number of messages in the Chat History which are kept in the chat and presented back to the user which asked them. When you open Atlas AI Assistant yourself, by default it will present to you your recent chat history. You can clear the history manually from within the AI Assistant chat itself.
- This is 20 by default, meaning your model will include up to 20 previous questions into an individual KC chat. If you create a 21st question in the same KC chat history, the very first question you asked will be ignored.
- You may be able to see more than 20 questions in one chat history in one session, but if you go back to the same chat at a later date it will be limited to 20.
- This is very important as your question will take into account the recent chat history in the chat window and alter the response accordingly. The recommended best practice for chat history is to clear you history on every new topic.
- This value is the number of messages included in the chat history and disregards the length of the questions themselves. If you wanted to alter the length of the messages rather than the quantity, please see AI:MaxTokensAllowedInChatHistory below.
Max Tokens Allowed in Chat History
- This is the same logic as the above 'Requests' however this is related to Tokens. So you can either limit the chat history by number of tokens OR number of requests ( which excludes token count).
- In simple terms, Tokens measure and manage the efficiency of the model when processing characters (characters which make up the question asked to the AI Assistant). A Token is approximately 4 characters of text - between 2 and 6 depending on the model. The bigger the model, the more tokens in the 'dictionary', the more processing per response, but also the less number of tokens used when the AI responds.
- This is a more technical and granular method of limiting the amount of tokens used per chat.
- There is a maximum number of tokens per input per minute (depending on your subscription quota for your deployment model), typically starts at 150k for GPT-4o Standard deployments. We have stated that the limit here in Atlas AI Assistant is 120k by default, as there is a limit of the usage of models per minute which can impact throttling (this is to avoid pushing through 2 questions which total 120k in the same minute).
- The model output is the maximum size of response the model can give you, so it can respond with up to 4096 tokens, which is roughly 16,000 characters, but this number will vary depending on which model you use.
- We expect this throughput quotas to be increased overtime as Microsoft's limits for AI Infrastructure increases, and higher quotas should become available depending your Azure contract.
Max Throughput Tokens
- This throughput is the maximum number of concurrent tokens that can be processed by a model within 60 seconds (1 minute) from across the entire user base. This impacts the concurrency of the system for multiple users.
- By default it's set to 150,000 tokens per model which is the maximum currently allowed for the OOTB OpenAI Models (see screenshot below). This is the recommended starting point for PoC/Pilot. The real throughput requirement needs to be assessed, depending on number of users and amount of content.
- You can request higher throughput from Microsoft if you expect thousands of users to be utilizing AI models throughout the day or at a specific time of day.
Max Tokens Allowed In Memories
- This is the maximum number of tokens which is retrieved from indexed documents and data.
- By default it is set to 16,000. This can be increased or reduced as necessary to improve performance and accuracy, depending on your use case.
Model Encoding
- Please ignore. This should not be changed.
- The encoding is used to calculate the number of tokens the model consumes. There is a specific encoding related to each type of model, so altering the encoding of a model may break it.
Temperature
- Temperature controls how random or deterministic the AI model’s responses are.
- By default this is set to 0.7.
- Please don't touch this if you do not have a real need to alter or do not have trusted advice from an AI consultant/engineer. The reason we suggest not to alter this is because it's a per model setting and will impact multiple KCs.
- This is a parameter which must be between 0 and 1, and determines from the Search how the model interacts with the index, and which items will be picked up as 'Facts' based on relevancy. So this can be seen as the minimum relevance required to consider a search response as a fact.
- This is important as this value could either provide you with poor responses or no responses at all, depending on how low you place the value. Or, if placed too high, could be a very costly way to answer simple AI questions. A balance between response quality and tokens used needs to be considered here to find a sweet-spot, as increasing relevance score here will improve responses whilst also increasing Azure consumption costs.
- Please see below section on Temperature vs Top P
Tooltip Key
- The tooltip is the information made available to the KC Creator/Editor when selecting a Model.
- In 6.0 and 6.1, the default models (GPT-4o and GPT-4o mini) have the values: GPT-4o-default-description which is fetching the description from the model itself
- However as you can see from the third 'custom' model we have set-up in this tenant, the tooltip has been written into this value
Top P
- This is 0.95 by default. This is the similar to temperature. Both parameters tweak the variability and randomness of the response of the AI model. If responses need to be less consistent/more creative, this will tweak the randomness of the response the model provides, however this is a very technical setting and should not be altered unless you have trusted advice from an experienced AI consultant/engineer, especially in the Atlas OOTB GPT models. Please view the below Section for more information.
Top P & Temperature
Only one should be altered, not both. An explanation from ChatGPT on what 'Top P' means can be found below:
"When generating responses with AI models, two parameters influence the creativity and precision of the answers: Temperature and Top P. Both affect the randomness and variability of the model’s output, but they operate slightly differently.
Temperature
-
What is it?
Temperature controls how random or deterministic the AI model’s responses are. -
How does it work?
- Low Temperature (e.g., 0.1–0.3): Results in conservative and predictable answers, suitable for precise, factual, or technical responses.
- Medium Temperature (e.g., 0.4–0.7): Balances creativity and consistency, appropriate for general use.
- High Temperature (e.g., 0.8–1.0): Produces more diverse and creative answers but with higher risk of less accuracy.
Top P (also known as Nucleus Sampling)
-
What is it?
Top P controls randomness by selecting from the smallest possible set of likely next words whose cumulative probability exceeds a certain threshold (the "P" value). -
How does it work?
- Low Top P (e.g., 0.1–0.3): Limits responses strictly to the most probable tokens, ensuring high predictability and factual correctness.
- Medium Top P (e.g., 0.4–0.7): Balances creativity with reliability.
- High Top P (e.g., 0.8–1.0): Allows a wider range of possible answers, increasing diversity but potentially reducing precision.
When to use each?
- Use low Temperature or Top P for tasks that require factual accuracy, such as technical support, legal or regulatory compliance, or documentation.
- Use high Temperature or Top P when seeking creative ideas, brainstorming, or when generating more conversational and varied responses.
Please note this is a technical topic and not everyone at ClearPeopls is a technical AI Expert, so if you have any questions please reach out to your Atlas Representative, who will put you in touch with our Technical Team if necessary so they can provide more insight and accurate questions.
Comments
0 comments
Please sign in to leave a comment.