Introduction
The Atlas AI Assistant has technical infrastructure in the Azure back-end which can be configured depending on your needs and can be adjusted to alter the settings of the AI backend, hoping to positively impact the precision and quality of response from the model or the costs incurred from Azure consumption when using the Atlas AI Assistant.
The AI Assistant infrastructure displays hidden settings, accessible only from Azure and to those with the appropriate Azure permissions and roles. It is not recommended to change these settings without technical assistance from ClearPeople or submitted as part of discussions and recommendations with AI specialists or experienced technology owners.
Please be aware, this article does not cover the front-end UI 'basic' settings for the AI Assistant which can be controlled by Atlas AI Admin users with appropriate permissions. For these settings please see the this separate article here
Please note this is advanced configuration:
- An understanding of the Microsoft AI infrastructure and GPT models will be helpful if not necessary to understanding some of the metrics, values, terminology and phrases used, such as; throughput, throttling, tokens, prompts, etc.
- The AI Assistant is being regularly updated as technology in this area of AI is progressing and changing rapidly, so this article is liable to be changed often.
- ClearPeople's Azure Calculator can be used for additional guidance when assessing models, prompt costs and tokens.
-
- An important practice will be estimating the price of indexing a Fact, and the price of a Chunk
-
**Please note this article applies to Atlas v5.3 and 5.4 only. Atlas 6.x will have a new article when available**
Where to find the settings
IT admins with appropriate permissions can:
- Go to Microsoft Azure - https://portal.azure.com/
- Go to your Atlas Resource Group - rg-atlas-xxxxx-xxxx
- Select the App Configuration, where the Atlas AI configuration is stored - appcs-atlas-xxxx-xxxx
- From within the App Configuration, from the left go to 'Configuration Explorer'
Access may be limited here, with values not being shown.
There are 2 ways to access this page to see its contents:
The first way is to Access Settings and Enable Access Keys
Enabling the Access Keys will show the Access Keys like as shown below, however leaving them open is not recommended for security reasons, so as soon as access is opened and configuration tasks complete, disable the access settings again.
2 = Navigating to Access Control (IAM) > Role Assignments tab > Add yourself and relevant users to 'App Configuration Data Owner' to gain permanent access from the 'Add' option
This role assignment can be traced by the audit functions in Azure.
Now following one of the above options, the Configuration Explorer can be seen. To edit one, select the appropriate Key and click 'Advanced Edit' at the top of the page
The Advanced Edit option then presents a pop-out window on the right of the screen. Please note the only thing which should be changed is the 'Value' between the quotes.
Once applied, the change will be saved and automatically picked up the system - you just need to refresh. You do not need to restart the service.
If you wanted to force a refresh of the cache of the settings behind the scenes on the server side, select the Key which should be on the bottom called System:RefreshSentinel. Replace the last character with the next letter along in the alphabet. For example below the C on the end of the GUID should be replaced with a D.
This System:RefreshSentinel is a random GUID used as a timestamp to ensure that when settings are changed the next iteration is picked up. This happens automatically as the GUID is refreshed automatically from the Atlas code base once a setting has been altered but can be altered manually for forces refresh if needed.
Keys
All App Configuration keys related to the Atlas AI Assistant start with 'AI:'. In the future ClearPeople will be writing more App Configuration Keys not related to AI. You can filter the Configuration Explorer to be Key 'Starts with' AI, as shown below.
AI:AzureOpenAiOptions:OpenAiApiVersion
- This is for the Atlas code to identify the API version. This will be maintained and updated by ClearPeople as part of the Atlas release cycle.
- You should not change this manually unless given instruction from ClearPeople.
AI:GPTDeploymentName
- This is just a name to identify the Deployment Name
- This should not be changed
AI:IincludeSummariesWhenIndexingMemories
- This is a setting for advanced scenarios. This field can be updated manually if you are having issues with the synchronization process. This is set to False by default.
- What does this field do? Put simply, this setting enables the indexing and summarization of each 'chunk' within the documentation indexed by the AI model, using the GPT-4o model. This gives more consistency, semantically speaking, to your facts and your index, and could improve the responses provided when prompted with less context or more vague questions, but please note - this is very expensive, as the cost of indexing without Summaries is, let's say £5, but with Summaries, could be, for example, £200.
- If this setting is switched on and there's multiple of the same type of documents being indexed, summaries will be created against each chunk from each document, which may weaken the semantic relationship between the chunks of the documents, i.e. if different documents are created separately, different chunks of the documents exist, and each will get it's own summary. So when comparing against and across data, the specific summaries would be leveraged, and may vary between documents. So the indexed summaries are used, instead of the document content itself, and each summary could vary slightly due to the nature of Open AI.
- When this setting is set to True, there will be increased risk of issues related to Synchronization as it's a much heavier process, related to two different models which need to respond, so it's more likely you will reach the 60 minute limit of processing a big document, so a document of 1000 pages will likely fail if summarization is switched on.
AI:KnowledgeSync:ListsToSync
- This is the backend of the UI settings which contain every knowledge library marked as in scope for indexing. There's no need to change this, this is just where the setting is held in the backend.
- If this setting needs to be changed, please alter the field from the Atlas Settings > AI Assistant > Library URLs
AI:MaxMonthlyRequestsPerUser
- This is the backend of the UI settings which contains the maximum amount of requests per user
- So there's no need to change this here, this is just where the setting is held in the backend. If this setting needs to be changed, please alter the field from the Atlas Settings > AI Assistant > Request limits per user.
AI:MaxRequestsToIncludeInChatHistory
- This is the limit set against the number of messages in the Chat History which are kept in the chat and presented back to the user which asked them. When you open Atlas AI Assistant yourself, by default it will present to you your recent chat history. You can clear the history manually from within the AI Assistant chat itself.
- This is very important as your question will take into account the recent chat history in the chat window and alter the response accordingly. The recommended best practice for chat history is to clear you history on every new topic.
- This value is the number of messages included in the chat history and disregards the length of the questions themselves. If you wanted to alter the length of the messages rather than the quantity, please see AI:MaxTokensAllowedInChatHistory 2 points below.
AI:MaxSizeInMBToStoreFilesInABatch
Deprecated in 5.3 GA versions and onwards.
AI:MaxTokensAllowedInChatHistory
- This is related to Tokens. In simple terms, Tokens measure and manage the efficiency of the model when processing characters (characters which make up the question asked to the AI Assistant). A Token is approximately 4 characters of text - between 2 and 6 depending on the model. The bigger the model, the more tokens in the 'dictionary', the more processing per response, but also the less number of tokens used when the AI responds.
- So the limit of tokens for the Chat History to call back to the user which asked them is controlled here.
- There is a maximum number of tokens per input per minute (depending on your subscription quota for your deployment model), typically starts at 150k for GPT-4o Standard deployments. So we have stated that the limit here in Atlas AI Assistant is 120k by default, as there is a limit of the usage of models per minute which can impact throttling (this is to avoid pushing through 2 questions which total 120k in the same minute).
- The model output is the maximum size of response the model can give you, so it can respond up to with 4096 tokens, which is roughly 16,000 characters, but this number will vary depending on which model you use.
- We expect this throughput quotas to be increased overtime as Microsoft's limits for AI Infrastructure increases, and higher quotas will be available depending your Azure contract.
AI:MaxTokensModelOutput
- Maximum size of response the model can give you in their responses, measured by tokens
- By default it's set to 4096 tokens which is the limit per model, roughly equivalent to 16,000 characters. Reducing the number of tokens will limit the responses of the model to be smaller with a hard limit set by this setting.
AI:MaxTokensSupportedByAzureOpenAiChatModel
- This represents the maximum number of tokens which can be pushed to the model in a single request, combining chat history, facts, the question and the system prompts.
- Defaulted to 120,000 tokens per request.
- This is per individual and we do not expect this limit to reached unless purposefully seeking to hit the token limit by asking very large and complicated scenarios. If multiple users asked the same large question, throttling will be reached.
- Model token limits can change and new models may use a different maximum token quantity
AI:MemoriesSearchService:IndexName
- Please ignore. This is only a backend variable
AI:MinMemoryRelevanceScore
- This is the parameter which must be between 0 and 1, and determines from the Search how the model interacts with the index, and which items will be picked up as Facts based on relevancy. So this can be seen as the minimum relevance required to consider a search response as a fact.
- This is important as this value could either provide you with poor responses or no responses at all, depending on how low you place the value. Or, if placed too high, could be a very costly way to answer simple AI questions. A balance between response quality and tokens used needs to be considered here to find a sweet-spot, as increasing relevance score here will improve responses whilst increasing Azure consumption costs.
AI:SystemPrompt
- Please ignore. This is the backend of our front-end UI
AI:TopMemoriesReturnedByAzureSearch
- These are the top 50 memories (search results) being picked up by the model. Up to a 100 maximum to avoid performance issues.
- In other words, the maximum number of chunks the model will use. A look at the top 50 is done as the first step, then the model will look at the relevancy, and only if the memories are above the relevance score will they be returned.
- So even if they are all always relevant, you can never have more than 50 facts included.
- This is important when calculating costs as you could assume the maximum cost of a query will be (number of facts x 1100 tokens/maximum).
System:RefreshSentinel
- This System:RefreshSentinel is a random GUID used as a timestamp to ensure that when settings are changed the next iteration is picked up. This happens automatically by code but can be altered manually if needed. This GUID is refreshed automatically from the Atlas code based once a setting has been altered.
- As discussed earlier in this article, if you wanted to force a refresh of the cache of the settings behind the scene on the server side, select the Key which should be on hte bottom called System:RefreshSentinel. Replace the last character with the next letter along in the alphabet. For example below the C on the end of the GUID should be replaced with as D.
Comments
0 comments
Please sign in to leave a comment.