Document Processing in Microsoft 365

February 23, 2023
9 min read

Today, on average, globally, every human creates at least 1.7 MB of data every second. At this rate, over 200 zettabytes (200,000,000 gigabytes) of data is expected to be stored in the cloud infrastructure by 2025. At that scale, it’s imperative to provide users with an ability to easily categorize and tag content. One way to do this is by using document processing software. Such software can read the content of documents, extract specific keywords or entities, and store them as metadata with the document.

Today’s Microsoft 365 offers two tools for processing documents: Microsoft Syntex and AI Builder. Although they have some similarities, there are some distinct differences between them. This document provides an overview of each of these tools and discusses the differences between them.

Microsoft Syntex

Microsoft Syntex is a suite of tools for performing various analysis and manipulation tasks on content stored in SharePoint Online. These tools are:

  • Content Assembly: Automatically generate business documents by filling-in placeholders with content stored in SharePoint or manually selected by a user.
  • Image Tagging: Tag images with descriptive keywords using AI.
  • Taxonomy Tagging: Tag documents with taxonomy terms using AI.
  • Optical Character Recognition: Extract printed and handwritten text from images and store it in a description field for the image.
  • Document Processing: Extract information and classify documents using Prebuilt, Custom, and Unstructured models.

In addition, Syntex includes a few management tools to help manage the SharePoint Online, including:

  • Annotations: Annotate documents and images without modifying the original files.
  • Content query: Perform specific metadata-based queries on SharePoint document libraries. 
  • Solution accelerators: Streamline the SharePoint site creation process using prebuilt templates.
  • Content compliance: Ensure compliance using retention labels and sensitivity labels to manage your documents.
  • Content processing: Perform automated tasks, such as sending notifications, when content is created or modified in SharePoint Online.
  • Term Store features: Perform advanced tasks with the SharePoint term store, such as working with Simple Knowledge Organization System (SKOS)-based formats. Use this to push enterprise content types to a hub site, adding them to the associated sites and any newly created lists or libraries, and providing detailed insight reports into usage of the term store. 

More on the various syntax tools can be found at Overview of Microsoft Syntex - Microsoft Syntex | Microsoft Learn. Document Processing is the main tool of interest for this article.

Syntex Document Processing

The Syntex Document Processing tool is made up of a collection of six AI model types that are used to extract keywords and metadata. These models are grouped into Prebuilt and Custom models, as shown in Table 1.

Table 1: Prebuilt and Custom Model types available with Microsoft Syntex

Table 1: Prebuilt and Custom Model types available with Microsoft Syntex
Table 1: Prebuilt and Custom Model types available with Microsoft Syntex View Full Size

Prebuilt models are ready to be used as-is. There’s no training necessary. Configuration of these prebuilt models is limited to selecting which pieces of information should be extracted and stored as metadata fields, and in which sites and libraries these models should be turned on for automatic processing, as shown in Figure 1. Once configured, the user doesn’t need to do anything else. 

A receipt with specific invoice extractor fields (ReceiptDate, ReceiptTax, and ReceiptTotal) highlighted to be extracted. Fields not selected for extraction include AmountDue, CustomerAddress, CustomerAddressRecipient, CustomerID, CustomerName, and seven more that aren’t shown on this screen.
Figure 1: Selecting fields to be extracted from prebuilt models View Full Size

Custom models, on the other hand, need to be built and trained. This process begins with choosing a set of documents for training purposes. The set should include a variety of positive examples from which the fields can be extracted. Also, at least one negative sample needs to be provided so the model knows how to differentiate good from bad content. With the sample data in hand, the user needs to then identify what entities should be extracted by marking them on the training documents. Once the manual tagging is done, the model needs to be trained. This involves letting it process several documents. Once complete, the model provides a level of confidence in identifying each of the entities within the sample data, as shown in Figure 2. If the confidence is high enough to satisfy the user’s needs, the model can be configured to run on the SharePoint libraries and made ready for use. However, if the confidence is too low, additional sample data and training may be necessary.

The Syntex model training screen showing the classification accuracy and entity extraction confidence of 94 in the Classify files and run training Key actions section. Other actions include Add example file, Create and train extractors, and Apply model to libraries. Another section, labeled Entity Extractors, shows previous training results for Donation Amount, with an accuracy of 91, and Second Amount, with an accuracy of 83.
Figure 2: Custom model showing the confidence of each extracted entity View Full Size

As soon as files are added to a library, they’re picked up by the AI model and, within a short time, the metadata is extracted.

AI Builder

AI Builder is one of the building blocks of the Microsoft Power Platform, a suite of tools and services designed to help automate tasks and build custom applications. AI Builder is specifically focused on bringing artificial intelligence (AI) capabilities to business users and citizen developers, allowing them to create AI models and incorporate them into their applications and workflows without needing extensive coding or AI expertise. AI Builder is typically used in conjunction with other Power Platform services, such as Power Apps and Power Automate, to create end-to-end solutions that leverage AI to enhance business processes. This makes it easier for organizations to harness the power of AI in a user-friendly and accessible way, even if they don't have a team of data scientists and AI experts on hand.

Like Syntex, AI Builder provides a range of pre-built AI models that can be used for various tasks and the ability to create custom AI models tailored to their specific business needs. These models are grouped into Prebuilt and Custom models, as shown in Table 2.

Table 2: Prebuilt and Custom Model types available with AI Builder
Table 2: Prebuilt and Custom Model types available with AI Builder View Full Size

The process for developing custom models is similar to the way custom models are built in Syntex. Microsoft continues to evolve and release prebuilt models to further simplify the usage of AI in application by citizen developers.

Syntex vs. AI Builder

As mentioned above, both Microsoft Syntex and AI Builder offer options for document processing. Both tools offer similar prebuilt models for receipts and invoices, as well as the ability to create custom models. There are some differences, as discussed below.

Maker Portal

The Microsoft Syntex experience is exclusively within SharePoint Online. Models are created and managed in specific sites called Content Centers or within specific libraries of existing SharePoint sites. When custom models are created, the user is redirected to the AI Builder maker portal for the creation of these specific models.

The AI Builder models, on the other hand, are exclusively built, trained, and managed within the AI Builder maker portal.

Scope

Microsoft Syntex is an add-on for SharePoint Online. It’s intended to process documents that have been stored in the document libraries. Depending on how the models are configured, a single model can act on content stored in SharePoint for an entire tenant or be targeted to specific sites and libraries. 

AI Builder, on the other hand, is data-source agnostic. As long as the document content is provided to the model in a format that it can understand (binary stream), the processing will occur. This includes SharePoint, but can also be OneDrive, other non-Microsoft systems, or even a screen capture.

Extracted Entities

During Syntex model publishing, new content types are created or existing ones reused. Furthermore, individual metadata fields of specific types are added to the content types to store the extracted information. When Syntex detects entities within a document, they get stored as individual metadata fields with the document.

As AI Builder isn’t aware of where the content is coming from, it only does the processing and returns the entities extracted. With the entities, AI Builder models also provide confidence levels for each piece of data that was extracted. 

Integration

With Syntex, once the document has been processed and the extracted entities are stored in the metadata fields for the document, the process ends. With AI Builder, the extraction may take place as part of a larger process, allowing the extracted entities and document to be used throughout the greater process.

Licensing

Microsoft Syntex uses Azure pay-as-you-go for billing when using prebuilt or unstructured document processing and is measured in transactions. Each page in a document is considered a transaction. Processing occurs on document upload and on subsequent updates. Processing is counted for each model applied. There’s no charge for model training but costs are incurred for processing, whether or not there's a positive classification or any entities extracted. At the time of this writing, the cost for prebuilt document processing is $0.01/transaction and $0.05/transaction.

AI Builder consumption requires pre-purchasing blocks of credits. For each Power Apps and Power Automate premium license, a pooled set of credits is allocated monthly to the tenant. Currently, these allotments are as listed in Table 3. 

Table 3: Monthly AI Builder credit allocation for premium licenses

 

Up to one million credits can be accrued monthly from licensing. Additional credits can be purchased at $500/1M credits.

The current costs are 32 credits/page (~ $0.016/page) for prebuilt invoice or receipt processing models and 100 credits/page (~ $0.05/page) for custom document processing.

For more information on the AI Builder licensing, see https://go.microsoft.com/fwlink/?linkid=2085130

Bottom Line

Both Microsoft Syntex and AI Builder are great alternatives to processing documents and help with streamlining how you store and manage content. Depending on the specific business scenarios, organizations may choose to use Syntex, AI Builder, or even both with this challenge. Table 4 summarizes the key differences between the two options available in Microsoft 365.

Table 4: Summary of differences between Microsoft Syntex and AI Builder

 Microsoft SyntexAI Builder
Maker Portal
  • SharePoint for pre-built models
  • AI Builder Maker Portal for custom models
  • AI Builder Maker Portal
Scope
  • SharePoint Document Libraries  
  • Any source
Extracted Entities
  • Stored with processed document in individual metadata fields
  • Can be stored anywhere
Integration
  • No direct integration
  • Documents can be post-processed
  • AI Builder is used in Power Apps and Power Automate to integrate with 100s of systems
Licensing
  • Prebuilt Syntex models are billed using Azure pay-as-you-go subscriptions
  • Custom models require AI Builder credits
  • AI Builder credits required for processing

Regardless of the option chosen, using AI to process documents can greatly help businesses overcome the challenge of managing the overall information lifecycle.

 

Haniel Croitoru

Haniel Croitoru

Haniel Croitoru is an enterprise architect and Microsoft MVP with over 20 years of experience in Microsoft 365. Since 2003, Haniel has been focusing on delivering solutions to meet short-term and long-term business goals using Microsoft 365 workloads such as SharePoint, Teams, Power Platform and more. 

In addition to Haniel’s professional tenure, he has always been a big proponent of sharing knowledge and giving back to the community through presenting at numerous conferences and networking meetings on the topics of utilizing Microsoft 365 for business process optimization, effective collaboration and communication, and managing projects using the platform. 

Prior to entering the Microsoft 365 arena, Haniel spent several years in the medical imaging industry where he helped launch an orthopedic software division and published four patents and numerous articles. 

Haniel holds a Master of Science in Computer Science with a specialty in Computer-Assisted Orthopedic Surgery from Queen’s University and a Master’s Certificate in Project Management from the York Schulich School of Business. He is a PMI-certified Project Management Professional (PMP) and Agile Certified Practitioner (PMI-ACP).