the TaXoBeetle : Extracting and tagging XBRL-GL instances Logo

XBRL (acronym for Extensible Business Reporting Language) is an increasingly popular standard used to represent financial information in XML-base files.

XBRL data are coded according to taxonomies, XML schemes, that describe the kind of elements that can be included in a financial report using this standard. Oversimplifying, a taxonomy is the "chart of accounts"; the number (and other information) included in the financial report as coded according to this schema, and stored in an XBRL instance.

The case for TaXoBeetle

The first issue that a Company usually meets when it starts approaching XBRL is how to obtain their financial reports translated according to XBRL format: the trivial approach is to write a software program that reads the financial statements from the Company's legacy system, translate the internal account codes in term of XBRL elements (according to the taxonomy chosen), and produces the XML file with the desired instance.

There are some weak points in this approach, among them:

As a result, if an auditor wants to assure the correctness of the resulting financial statement, and check if the figures reported are exactly what was posted in the legacy systems, he has to face a hard job.

A better approach is available using the GL taxonomy: a special taxonomy under development by XBRL International. GL allows the representation of original accounting transactions, ledgers, trial balances, carrying along the related XBRL information. Exporting data according to the XBRL-GL taxonomy offers two advantages:

In this way, an auditor could access the details of transactions as they were posted in the legacy system, and perform a comprehensive assurance.

A step further

Those advantages alone made, in my opinion, a strong case toward the adoption of GL, but I found another requirement.

The information systems usually adopted in many business are made up of several components, often strongly integrated - as the ERP systems - but sometime linked only by interfaces that carry transactions back and forth, and may not be so rock solid.

In this scenario, a CFO that has to sign a financial statement that is the result of a complex integrated system has very few instruments to effectively check the correctness of the whole process; this situation - especially if you have to complain with regulation like Sarbanes Oxley Act - is not so happy.

My idea is to provide a tool allowing to extract the transactions from every software component in my Company, allowing me to perform the reconciliation among matched accounts in different systems - at a transaction level - storing the detail of the reconciliation process in a way that is readable by an external auditor.

XBRL-GL is perfect for this. What I need is:

  1. a tool to extract transactions or balances from different legacy databases, with a well documented extraction logic;
  2. a "language" allowing me to tag the extracted transactions that were matched during the reconciliation process;
  3. a tool helping me in the automatic and manual reconciliation process.

TaXoBeetle project covers the first two points; the third will be hopefully covered by a small and dynamic software vendor.

Data Extraction

My idea was simple: if I want to produce a text file (at the and, an XBRL instance is a text file) extracting data from a legacy database, an easy approach is to produce a sample of the desired output, insert in it the information about where to take the actual data from, and use this as a template  that will drive the extraction process.

Two excellent tools can help me in this direction: Velocity (a project of Apache Foundation) and Velosurf (another Sourceforge project). With this strong foundation, the job was easy, and I have only the need to complete this with some utilities, performing some important tasks in this specific context.

Project Status and site organization

The project is currently (October 15, 2005) in a pre-development phase, so don't expect to find something downloadable!

However, there is a first prototype, successfully extracting valid XBRL-GL instances.


Documents strictly related to TaXoBeetle project will be available in the "Docs" section of the Sourceforge project site; other "conceptual" documents  will be available on my website, in the XBRL section.

I have chosen to develop this project with English as a primary language, but I am planning to release some documentation in Italian too. The first is not my mother tongue, so I apologize in advance for grammar.

BTTL language

Starting from the assumption to use XBRL-GL as the underlying standard, the consequence was to use the same standard also for the markup language that I am using to match transactions together during the reconciliation process.

The GL taxonomy is extensible, so designing a new "module" for this taxonomy will be enough. A preliminary UML model for the new module can be viewed for more details; when it will be complete, I will open it to the comments from the XBRL community - and obviously to the Sourceforge users.

Similar projects

I am not aware of other projects with the same goal. There are several commercial solutions for the extraction of XBRL data from other sources: anyone will pursue its own strategy about the alternatives among commercial and open source solutions. In my opinion, open source is an excellent choice to experiment a new technology.

A list of XBRL-enabling applications can be found on the official XBRL site, on the last Reconciliation goalThe proces of reconciliation .

The theme of assurance linked with XBRL-based financial reporting is developed with the XARL (Assurance Reporting for XBRL) standard, from the University of Waterloo, Canada.

Another tool that can be successfully used for data transformation (also if restricted to the XML domain) is ABRA, also hosted by Sourceforge.

I have developed TaXoBeetle as an open source project for several reasons:


            Massimo Coletti

Document made with Nvu