The AI cannot read every type of file, it needs to support the file type to be able to extract and index the data. This article explains which file types are supported in each Atlas version.
From Atlas 6.0 we support the following file extensions for indexing:
- DOCX, PPTX, PDF*, XLSX**, TXT***
- SharePoint webpages - ASPX
*We can only index the text in the PDF - if it is a scanned document therefore actually an image this cannot be indexed
**depending on content and structure, as large numbers of rows and columns can result in an "Out of memory" exception and fail to index. A large number of rows or columns may mean thousands depending on how large the file is and the complexity of the data. We advise that if this may be a likely scenario for your AI to undertake testing by loading and indexing complex .xlsx files to ensure they work as expected.
***TXT files can contain text information, but also URLs which the system will try to index. The website would need to allow this so you'll want to test the specific URLs you want to index.
Note: the file types above are explicit, so the old DOC/XLS/PPT file extensions from older versions are not supported - so we cannot say all MS Office is supported, but specifically only newer file types files with the DOCX/XLSX/PPTX file extension.
All file extensions from 6.0 listed above, with the addition of:
- SharePoint list items (no file extension as they are just list items)
- English video transcripts (from .mp4 or .webm file extensions where transcript in English was enabled for the meeting)
Comments
0 comments
Please sign in to leave a comment.