Content-based workflow automation for German Engineering Company

Automated document processing meeting the highest demands

Karakun's in-depth and extensive expertise helped us realise our project quickly and efficiently.

Throughout the entire project, the Karakun team acted as a strong and reliable partner who also worked closely with our ERP and M365 consultant.

We are so convinced of their capabilities that we are already exchanging ideas with Karakun for upcoming projects.

Björn Börner

Supply Chain Manager

Electronic documents such as orders, delivery slips or invoices are the predominant format for exchanging data in B2B. However, these documents are not ideal for machine processing, making automation significantly more difficult. Nevertheless, aiming for automation here offers enormous cost savings thanks to faster, more efficient and less error-prone processes.

Our customer CVS engineering GmbH from Rheinfelden (Germany) also recognized this potential. CVS produces compressors and vacuum pumps for vehicle applications (mainly in trucks and trains) and sells them worldwide. The company receives about 10,000 order confirmations and invoices per year from its suppliers. And the numbers are rising. Until recently, employees manually extracted relevant data from these documents, transferred them to the company’s ERP system and compared them with the corresponding order. CVS wants to replace this time-consuming, error-prone manual process with an automated workflow.

A suitable software solution should automatically read the relevant data from each incoming document and compare it with the ERP data, alert employees about any deviations (e.g. concerning quantity, price or delivery date) or data errors, and offer them the possibility to control and correct these data in the ERP. Incoming invoices and order confirmations should then be archived fully automatically in the DMS (SharePoint) with a link to their DMS location in the respective ERP records.

CVS tested various standard solutions that specialize in invoice processing. However, these applications did not deliver a satisfying result for their particular use case. Thus, CVS searched for a partner to implement their specific requirements and to seamlessly integrate the customized solution into their system landscape. CVS found this partner in Karakun and its HIBU platform.

Content-based automation

To implement a high-quality data extraction functionality, we first needed to know the types of document layouts that the solution would have to process (general layout, arrangement and designation of individual information elements, etc.). During our analysis, we identified a considerable variation and many unusual, partly inconsistent layout decisions in the documents, especially in the tabular document sections. We also noted substantial differences in the designation and placement of specific information elements (e.g. shipping costs, packaging costs and delivery note numbers).

We adapted the HIBU platform to the given task based on our findings. As there wasn’t sufficient training data for a suitable machine learning (ML) approach available and the costs for their procurement or creation would have exceeded the project budget, we started with HIBU’s rule-based extraction components and extended them to the needs at hand. However, by its productive use and occasional manual checks, the rule-based approach generates sufficient training data as a side effect that allows us to train HIBU’s ML components in the future.

Agile approach

Adopting an MVP (Minimum Viable Product) approach, we initially configured these extended extraction components only for the layouts of the 15 most frequent suppliers. Automated tests ensure that the extraction works error-free for these suppliers. For all other suppliers, we configured a generic extraction logic that primarily minimizes false positives and doesn’t fill individual target data fields in case of doubt. In addition, we set up a processing pipeline together with the customer: This pipeline continuously monitors two dedicated email inboxes (Outlook 365) for incoming emails, automatically transfers the relevant PDF attachments of these emails to the extraction component, and, after extraction, performs further analyses and validation of the extracted data. The resulting data is compared with the underlying order record and then transferred to the customer’s ERP system for further processing. The original documents and relevant metadata are also automatically archived in the connected DMS.

Early benefits

Soon after going live, CVS observed significant measurable benefits, primarily time savings. Encouraged by this, the customer asked us to configure the extraction for many other suppliers to improve the data extraction results for them, too, compared to the generic logic. Since only deviations from this generic logic had to be configured, only a few changes to the target data fields were necessary.

Throughout the project, Karakun solved many challenges. Examples include incomplete, inconsistent, or broken table layouts or data that looks like an order item but contains different information. Some of these challenges are even a problem for humans when processing documents manually, whilst others are only challenging for machine processing.

Automation leads to time savings and reduced workloads

Today, CVS’ overall workflow of processing these documents runs highly automated. The solution uses various status codes to inform the ERP if unexpected results occur. The ERP displays these messages and the results of its plausibility checks to CVS employees in the user interface. They can then take manual actions immediately. Employees, if needed, can also open relevant PDFs with a single mouse click.

By experience, this custom solution processes a large share of the incoming order confirmations and invoices fully automatically. For other documents that do require manual interventions, only very few fields need to be added or corrected manually. Employees benefit from the solution as some of their daily tasks become much easier, faster and less monotonous, whilst the company already benefits from massive savings and can free its knowledge workers for more challenging tasks.

The use case of our customer CVS can be translated to a range of other use cases across industries in which document-based processes are to be automated. The amount of documents to be processed is unlimited. Even with particularly challenging layouts (especially with information presented in tabular form), we can use the HIBU platform to reliably extract the required information and place it in the right context. With HIBU and our combined expertise in language intelligence and software development, we offer our customers enormous automation and savings potential.