We have noticed that when some customers have created the New Deployment Models used for Atlas AI, the initial Deployment Quota for gpt-4o and gpt-4o-mini is very low at 8K.
We have also seen that customers that had their Deployment Quota for gpt-4o and gpt-4o-mini set to the recommended atlas TPM (Token Per Minute) of 150K and 300K respectively have their total quota reduced to 8K. This doesn't show up on the above image, but if we click on Quota on the left Panel we will see the Quota showing values like 150K of 8K TPM. This issue has appeared in a recent Microsoft Forum: Quota for Azure OpenAI Service Model Changed to 8k Without Notice - Microsoft Q&A
If you are facing the issue described above, a solution is to request from Microsoft a Quota Increase for each of the Deployment Models you require. Unlike other Azure Quotas, this request is managed directly from the Azure AI Foundry Portal in your Atlas Azure OpenAI Resource.
Request Quota Increase
- Access your Atlas Resource Group and Locate the Open AI Resource that you have already Created. Instructions on how this is done can be found in a different article to this. Click on the button Explore Azure AI Foundry Portal.
NOTE:
Resource Group Name (if Atlas Naming convention is followed): rg-atlas-xxxx-yyyy where xxxx and yyyy are your chosen Client Acronym and Environment Name respectively.
OpenAI Resource Name (if Atlas Naming convention is followed): cog-atlas-oai-xxxx-yyyy where xxxx and yyyy are your chosen Client Acronym and Environment Name respectively.
- From the Azure AI Foundry Portal we can access Deployments and see the list of Deployments we have created. From the listing we may be able to identify the assigned capacity for each.
However, to confirm the assigned quotas for each deployment model, we can click on Quota to better see your Deployment Model Quotas. You can expand the hierarchy until you can see each Deployment Model and its Quota Allocation. You will see that the Quota Restrictions are normally applied to gpt-4o and gpt-4o-mini (below image shows 8K). The Text-Embedding models may not have these restrictions, but if they do, you will need to request for the Quota to be increased.
NOTE: Below image shows the Text-Embedding Model to be Text-Embedding-Ada-002 which is an Atlas Compatible Model. A newer Text-Embedding Model compatible with Atlas AI is Text-Embedding-3-Large.
- To increase the assigned Quota, click the Request Quota Button (above image) and fill in the corresponding Request Form, that appears. First set of form data are quite self-explanatory and includes the Person requesting the Quota's Full Name and the Company's Name and address.
IMPORTANT: A form request will have to be submitted for each Deployment Model that requires additional Quota.
You will then have a few other questions to fill out with regards to the Quota Request.
Subscription Id: The GUID of the Subscription where your Atlas OpenAI Resource is located. This is usually the same subscription where your Atlas Resource is located.
Justification: This a text indicating why you are requesting the Quota increase and the usage needs required.
Quota Request Type: Here you will select the Quota Request Type for each model. More information of each type can be found here: Understanding Azure OpenAI Service deployment types - Azure AI services | Microsoft Learn.
Once you select a request type the form will extend with a few other questions. For Atlas Deployment Models you will normally be working with Global Standard and Standard.
Whether you select Global Standard or Standard for the Deployment Model the following question will be asked. "What Region is the Deployment Model deployed". You will need to select the corresponding Azure Region where you have created your Deployment Model.
REMEMBER: This will also be dependent on the Atlas AI Compatible Models Region Availability. Please refer to the Knowledge Base Article: Compatible regions for the Azure Open AI service (Atlas 6.0)
You will then be given a list of Global Standard Models from which to select.
REMEMBER: You will be selecting from the Atlas Compatible Models (gpt-4o, gpt-4o-mini).
NOTE: You can't create a single request for both Models in a single Quota Form Request. So will have to create one Form Request for Each.
Finally, you will be asked to enter your estimate of Quota needed. Depending on the Model requested the Quota Value Entered may vary. Please read the description below and form more information review the following link: Azure OpenAI Service quotas and limits - Azure AI services | Microsoft Learn.
NOTE: For Atlas AI the Default Quota requests for:
gpt-4o : 450 K
gpt-4o-mini: 2 M (2000 K)
Once all the questions have been answered click on Submit.
If Standard Quota is selected we will have an Additional Question to answer, regarding what to do if there is No Quota available in the Selected Region:
Decline my Quota increase request: Use this if the quota increase is not really necessary or if a different deployment in a different region may have an adverse affect to your organization (ie. Data residency and compliance issues).
Grant me Quota in an alternate region: Assigning Quota in a different region may cause issues like Latency and Performance issues and Data Residency and Compliance. If these are not a key importance, select this option. - Once the Form Request has been submitted you will receive a few Emails while each request is being Processed by Microsoft.
REMEMBER: You will be required to Submit a separate Quota Request Form for each Model you have for Atlas AI. Currently the time taken by Microsoft to Approve or Deny a Quota request is approximately 24 hours. (This may increase or decrease based on quota requests being sent in).
Microsoft Communications During the Request Process
While you wait for your Quota Request to be Processed by Microsoft you will receive a series of Emails from Microsoft. It is thus, very important that the Company Email field in the Form have an Email address that is being monitored.
IMPORTANT: Please regularly check Junk Email Folder and confirm that emails aren't being blocked by any SPAM Filters in your Organization.
- You will firstly receive a confirmation email from Microsoft Azure OpenAI confirming your Quota Increase Request has been received. This should initially occur a few minutes after submitting the request.
- You will then receive an email to verify the Email Address used in the Request. This can take anything from a few minutes to a few hours to receive. This email has 2 links:
One that indicates that no request was made with this email address.
Other one to Verify the Email Address.
If you click on the Verify email address link you will be redirected to the following page with your Browser.
- After approximately 24 hours after submitting the request (can take longer) you will receive an email with the Increase Quota Response from Microsoft. If your Quota Increase has been approved, you will receive the following email.
- Once all your Quota Increases have been approved you will be able to verify this by going to Atlas Open AI Resource and Opening the Azure AI Foundry Portal. Click on Quota from the left navigation panel and verify the Quota Assigned.
NOTE: You may need to assign usage quota to each Deployment Model required by Atlas AI.
Comments
0 comments
Please sign in to leave a comment.