This article is about parsing Excel files. Learn how you can do it in BizTalk Server and in Azure. I show how to do this in an Azure function.
Excel file use case
This story begins with a customer request to send us an order as an Excel file. Furthermore, could we process it automatically into our ERP? Certainly, any BizTalk server developer will tell you that you can convert the Excel file to XML in a custom pipeline component. Let’s first consider how we would do it in BizTalk Server and then finally move on to how we could do it using Azure components.
BizTalk Excel Solutions
Firstly you can buy a component that will do it. In 2008 this author writes about disassembling Excel files using FarPoint For Spread BizTalk 2006. A few of my past solutions use this product and the development experience was good. Sadly you cannot purchase this product anymore. Another product that promises to do the same thing is the Excel BizTalk Adapter by cData. I have not used this but it looks promising.
Secondly you can roll your own solution. A quick search reveals many examples. All solutions create a custom pipeline component that decodes the excel file into XML.
This author choose a different path in one solution many years ago. Firstly a custom pipeline component converts a XLS or XLSX file to a CSV file. I use a Excel SDK in the pipeline component to convert the Excel files to CSV files. Secondly a standard flat file disassembler Pipeline Component converts the CSV to an XML file. Finally BizTalk server does what it wants with the XML file. This handles XLS files from Excel pre 2003 and XSLX files post Excel 2003.
The purpose of this blog is not to show that code but to decode an Excel file into a CSV file in an Azure function. The pipeline component uses version 2.0 of the ExcelDataReader SDK .
An Azure Function Excel Processing Solution
More importantly, this time I thought about a solution that would be easy to migrate to Azure in the future. Thus instead of using a pipeline component to decode the Excel file I built the same thing in an Azure function. With this in mind the Excel file is posted to the function and returned as a CSV file. The resultant CSV can be further processed in BizTalk Server as above. Alternatively in the future a Logic App and an Integration account could use the same BizTalk flat file schema to further process the same CSV.
The Azure function solution is a bit different but uses the same idea. It uses the most up to date version of ExcelDataReader SDK on GitHub. What follows is a prototype which contains the basic code to make this work.
Firstly initialise a DataSet to hold the excel data. Secondly, the Excel worksheet tab and filename are appended as query strings when the file is posted e.g. https:// &workSheetTab=Sheet1&filename=MyExcelFile.xlsx. Thirdly we need to register 1252 encoding page.
Fourthly read the file as XLS or XLSX file. This code snippet is copied directly from the ExcelDataReader ReadMe example.
Fifthly search for the given workSheetTab. Sixthly convert workSheetTab in the result DataSet using the convertToCSV method. Finally return the Excel file as a CSV file.
The convertCSV method was also stolen from somewhere but I have lost the reference.
As a result of this we have a way of converting a Excel file to a CSV but this prototype has some caveats before you can use it.
- Error handling is required.
- The Excel file must be always have the same format to be useful as an order file.
As a consequence of point 2 all Excel files have been banned from our exchange and work on this project has been suspended. If there is no confidence that the customer can always give the excel file in a fixed format then it is not possible to parse it in a consistent manner. This would be source of irritation for our support and development team. Notwithstanding, this is true for any file format e.g. XML, JSON etc. It is the age old problem of rubbish in and rubbish out.
Finally we have shown an example of an Azure function mimicking a BizTalk pipeline component to convert a Excel file to a CSV file. This prototype code can be found here. Unless my project is revived I will not be embellishing this code any further and you are welcome to use it.