Friday, 10 February 2017

Has the dream come true? - show me how I can see 10 years of XBRL data

This is the latest installment in what seems to be turning out to be a series - catch the start here - has the dream come true?

After my last post, I thought I really ought to show you all this data. Provide the evidence as it were. We've just added a few new bits and pieces to Xbrl Sheet so I also I thought it would be a good way to bring you upto date with what we've being doing over at XBRL to XL.

So it looks like this.

And you can see how I created this in next to no time (and how you can do the same) here.

And if you just can't wait and want to get hold of the data in a hurry for Neflix (or any company), you can download the Xbrl Sheet from here. Just pick up a token from the website and fire away.

Friday, 3 February 2017

Has the dream come true - Is there enough XBRL?

If you missed the introduction, then you can go back to this post.

I think I'm gonna end up tackling the "enough" question alot. Today as we stand at the beginning of the 2017 10-K reporting season, lets for now just narrow it down to the question of history.

Throughout this examination of the validity of XBRL today, I want to keep it as real world as possible, so a lot of my work in the coming weeks is gonna be working with a subset of companies from that most popular of general US indices - the S&P500. So I'm gonna draw from approx. 250* constituents to see if we can get closer to an answer.

*No I haven't just halved the index! I didn't want to work with financial companies as that just makes my life harder and I wanted to just look at companies that were going to lodge 10-K's in the current reporting window. Conveniently that left me with 256.

So before any reporting for 2017 began, we had (Source: XBRL to XL Database):

Which means not surprisingly, given how long XBRL has been a requirement, pretty much the entire population already has a five year history (not 100% as not all constituents were listed when XBRL filing started). That's a good start but even more excitingly, nearly 60% of this index population is suddenly gonna have 10 years of data! As the majority of their data points will now stretch back 10 years for the first time by the end of this reporting cycle. Companies are required to disclose 3 years of income and cash flow data (in each filing) so 8 filings will take you back 10 years for period items (9 years for balance sheet items).

In fact 4 companies (chevron corp, fluor corp, newmont mining corp, united technologies corp) had already filed 10'K's 8 times so this year will have a complete 10 year history.

10 years has always been regarded by analysts as the kinda historical period you can properly get your teeth into. The theory being that it equates roughly with an entire macro-economic cycle - from boom to bust.

Tuesday, 31 January 2017

And so has the dream come true?

Seeing I thought I had some stuff to say and I hadn't bloggged for far too long, I thought I'd look back at my early posts. When I started back in 2011, 2,000 companies had made their first fledgling filings. Now 11,000 plus companies have made over 170,000 filings (Source: XBRL to XL database). Now that we are swimming in a sea of XBRL, are we living in a world where analysis is easy, accurate and free (well cheap at least!)?

When XBRL started, there was a tremendous euphoria that we were on the cusp of analytical nirvana. The first mandatory filing season quickly burst that bubble. XBRL it appeared (or certainly the FASB/SEC implementation) wasn't all it was cracked up to be. Those problems are well documented but all the same I thought it would be worth listing them here to see what if anything has changed. Has nirvana sailed back into sight or is it still a case of manana?

So why couldn't I use XBRL?

There wasn't enough of it. If I'd only got a few 10-K and Q filings, how could I do any meaningful historical analysis?

There was too much of it! Over 16,000 tags, but I'm only interested in two or three hundred data points at the most!

It was complicated. A lot of the interesting numbers that were being structured for the first time were held in weird structures - multi-dimensioned hypercubes. And what the hell is a tuple?

I couldn't access it. Yes I could go to the SEC website and download it for free but then what? It certainly wasn't playing nice with Excel which was somewhat of a surprise for structured data.

This structured data didn't have any structure. Why could I get a tagged figure for Revenues for Google but not for Microsoft. How does that work? And what was I meant to do with extensions which seemed to allow Filers to bypass completely any semblance of standardisation that might exist.

So apart from too little data, too much data, obscure complicated structures, poor accessibility, evanescent structures, extensions, what else was wrong?

...Oh did I mention the errors? Filings in the early years were plagued with errors. The top three spots on the podium of errors were taken in reverse order by missing required values (like err, earnings per share), negative values (due to yet another layer of complication) and right at the top, invalid axis combinations?? Exactly! (Filers were tying themselves up in tuples trying to dimension this stuff). The most oft cited error - the mis-used extension didn't even get a look in. Pushed way off the rostrum by more basic booby traps.

So given what I've just said, the next question might seem a little rhetorical - could I trust the source? There ought to have been a simple answer to this question. Something along the lines of yes. I was no longer having to deal with a middle man to get my structured data (a vendor), it was being signed off by the company and of course audited. Except it wasn't - the audited accounts were a separate filing, the html 10-K. The xml XBRL was in a seperate file which wasn't audited and for which the Directors had no liability.

I've mentioned the missing data values but I haven't mentioned the missing data. Not everything was being xbrl'd. It only applied to the financial statements and accompanying notes. Not for example the MD & A (Management Discussion and Analysis) into which companies often like to lob a lot of interesting and important values. And yes the 10-K got the treatment but not the preceding earnings announcement.

So apart from too little data, too much data, obscure complicated structures, poor accessibility, evanescent structures, extensions, errors, unaccountability, missing tranches of data, was there anything else? was only for US companies and so in an increasingly globalised world in which the conventional orthodoxy of free trade prevailed and your competitor could be the on the other side of the world then even if you managed to straighten out your data you might not have any companies to compare all this comparative data with. Is it possible that Donald Trump could be the saviour of XBRL?

Realising that Donald Trump is not a satisfactory answer to my question (or indeed any question for that matter), I will address each of these issues in turn in my next few posts to see whether this dream is still giving analysts nightmares. For now I've run out of blog.

Monday, 23 February 2015

The SEC Structured Data Sets technically speaking

In my last post The SEC Structured Data Sets, I talked a little about this new SEC initiative to make XBRL more accessible. This time I'm gonna major on how it works.

At this point you may want to refer to the SEC technical document Annual and Quarterly Financial Statements, the Financial Statement Data Sets page (where the files reside) and if you want to see what the data contained in these files actually looks like, you can download one of our example spreadsheets here. The web queries in this sheet access XBRL Flat, our name for this data set. The sheet itself contains links to videos & info on how it all works. I will talk more about our item for item implementation of this data set in my next post.

On the Financial Statement Data Sets page, you will see there are currently 24 files. After we pass the last business day of this quarter (March 31 2015), they'll be 25. Don't try to open the latest files in Excel - they're too big but you could download one of the early ones to take a peak at the data layout of the files contained in these zips.

As the comprehensive technical document explains, there are 4 files. The one that counts is the num.txt. This has the values. In theory this file by itself has enough in it to do your analysis - values matched with dates and most importantly, tags for each filing. The files are not cumulative so you need to access each one to be sure of finding your filing. This is the point, in other words, where you need to load all these files into a database. If you load it all, its gonna be big (over 10 gig for starters).

The filings are keyed on the Accession Number (adsh) which is what EDGAR uses, so if you want to find the values for a particular company, you need to look up the adsh. This is where the sum.txt file comes in, which contains the company names & CIK's, so you need to load this into another table. Of course you could just find the adsh by going to EDGAR or our website - if you select a filing from, the adsh corresponds to what the aNo = in the address bar, but the adsh adds some annoying dashes! (In our implementation, we use a more comprehensive and timely database for these lookups - this is what you see at

You could stop there, as for example all the values we download in XBRL Flat come from just these two files. But if you want to see what the company has called these data items and if like us, you are sticklers for as reported data, then pre.txt contains the layout along with the labels. The final file, tag.txt contains important information on the tags but you may consider it not important enough.

So what to watch out for? Duplicate values! - surely impossible but no it's been seen and verified in the original filings. And the fields aren't quite in the order shown in the documentation so use the header records. Also you may want to exclude any records where the coreg field is populated, as more than likely you ain't looking at a value for the entire consolidated entity in these cases (I anticipate this will become more prevalent and relevant when they release values for the notes).

Two small bits of standardisation have occurred in these files that are not explicitly documented.

The financial statement headings do not have standard names and tags in the US-GAAP implementation of XBRL i.e. what's in the original filings (Yes I know - ridiculous!) but they do now in this SEC data set; these names and codes (or shall we call them tags?!) are actually all listed in the technical document.

The only slight problem is that a filing can have two financial statements which bear the same code (e.g. BS). One for the consolidated group and one for the parent. Can you tell the difference? Er No. Of course the parent one should have drastically less items. But of course if I'm gonna read this with a computer, I have to go through the hassle of counting items or something and that of course is not necessarily an exhaustive solution. There is also a code called UN (Unclassifiable Statement) which suggests that the SEC classifications may not themselves be exhaustive! I don't actually know why I'm going on about this as we solve this problem (differently) in our full database.

Secondly, the month end of the dates attached to each value have been standardised. This is good and bad. Good as it makes searching and aggregating easier, unless you were specifically looking for values at Apple's year end (27th Sept 2014) when the values are held as 30th Sept. Bad as those few companies that don't adhere to standard month end periods (last Friday of the month etc) can have say 51 week or 53 week years and you wouldn't know it from the num.txt file. This could lead to say revenues being over or understated by 2% on an annual basis, more so if quarterly. To pick this up, you need to keep an eye on the period date in the sub.txt file (don't use the fye field as its filled in inconsistently - 0927 for Apple but 1130 for adobe).

Probably worth re-iterating that no additional standardisation of data values has occurred in this data set. For more details on what needs to be done, see my Missing Totals post.

Next time I will explain how we have replicated this database and what you can do with it.

Wednesday, 18 February 2015

The SEC Structured Data Sets - uh?

This SEC Announcment may have slipped your notice, as it was slid out at the back end of last year when certainly my thoughts were more about parties than databases. The SEC announced that some of that mountain of XBRL data that's been piling up on their computers is now available as a series of flat files. The Data Transparency Coalition certainly thought it was significant.

So what are these Structured Data Sets and what does this all mean for accessing Comparative XBRL data for financial analysis? By bulking it up and stripping out the markup they've made part of the XBRL filing data more accessible but with a number of caveats. Note I said "more accessible" and not "accessible". You can't open these files in Excel - you might think you can because they are tab separated flat files but actually you can't; they are too big. They are designed to be read into a database (from whence they come). So if you quickly just want to get hold of some tagged data for a few companies from the SEC, you're out of luck. Note you can also load the XBRL instance document into Excel as XML but that doesn't make it any more readable.

But it is now much easier to read them into an database. Yes you still need to build an intermediary database. But you don't have to worry about context references and dimensions and all that messy XML tagging. e.g. In an existing XBRL filing I might find 43 values for "Revenues" but which is actually the one I want? In the flat file, num.txt, there will probably be just 3 values - one for each of the primary financial periods.

Of course you don't have to build your own database - because we built one earlier! So if you quickly just want to get hold of some tagged data for a few companies then you can! We thought it would be an interesting exercise in evaluating the worth of this pilot program. We found it relatively easy and we like what we see. We chose to add a little pre-processing to the files, to make the resulting database run more efficiently and coalesce better with our existing ones.

Our copy can be accessed in exactly the same way as all our data - through Excel. A simple web query brings the data into a sheet according to the parameters you supply. It works just like the existing XBRL Sheet. The query is available here and the example XBRL Sheet here. You can also find a video here that shows you how to get started with the example sheet. Probably worth pointing out that because access is simply through the rendering of a customisable web page - the web query, access isn't confined to Excel; many other applications and programming languages can interface with this.

So what are the caveats? It is only data from the Financial Statements themselves that are in these files. Nothing from the Notes to Financial Statements for now and new files only appear once a quarter. And it is just as it was filed - there are no corrections in these data sets.

In the next couple of posts I'll talk more about the technical aspects, our implementation and the current limitations of this initiative.

Monday, 2 February 2015

Dealing with XBRL data issues part 1 - missing totals

As can be detected from the title, there are numerous issues, many of which in reality are (or can be seen as) positive features of XBRL! I'm gonna deal with each of them one by one by demonstrating the various strategies we use to create comparative data.

I first touched on this in an earlier post and explained what was then our rather clumsy solution to the problem.

Missing totals mean missing standard values when trying to do comparative analysis. They are missing because XBRL preserves the presentation that companies have always used to show their financial performance, namely the succinct and more readable presentation you see in a paper report. Why would you want to repeat a figure just to create a complete set of totals? There's no need, it will only create clutter and make it less readable.

Fortunately we can do something about this and we do in the latest version of XBRL Sheet (the latest version is not yet generally available so email us - - if you would like to start using it). You can find more info on XBRL Sheet in this post and you can watch XBRL Sheet solving the missing totals problem in this video.

Lets have a look at an XBRLSheet download...

Before we used to download one column of tags, now we download two! The 1st column contains the standard tags which is our understanding of the high level tag that the company should be using in a "Standard" rendition of the values. This will usually marry up with the actual tag they used (in column 3) as per line 4 - Cost of Revenue in the example above. But if they have been more specific (which is great as more precise tags gives us a better understanding of their business), then we supply the different high level tag as well (which they haven't used as they would have to duplicate the line, creating a cluttered presentation as discussed above). So line 3 - Revenues is a case in point. Microsoft used the more specific tag "SalesRevenueNet". To make these easy to spot and check if necessary, the different standard tag will appear in a different colour. So we see another one further down. Again we may be interested in quantifying all provisions when a business is restructured rather than just goodwill. These two different tags enable us to do this.

So how do we use this info in our model? Well as we always recommend (see this post - specifically the bit about transparent data), you should connect with this data via an intermediate sheet (see below), to create values that can plug straight into your model.

Now with this extra column, we don't need to create a calculation to catch all the multiple alternatives for revenue (as represented by the Total Revenues line); we just need to use a simple lookup on "revenues" in this new column and the values will appear as shown. The 2nd column above contains the tag that is looked for. We in fact do a double lookup - we look in both tag columns in the Filings sheets to make sure we never miss an item and this also allows us to pick up the specific values if we want as well (e.g. the Revenue - Sales line). The names in the 1st column by the way are our names for the items and demonstrate how having an in-between sheet enables you to customise the data before it hits your model.

Friday, 24 January 2014

Fixing errors in XBRL Instance documents again

You can find a previous example in the post "Fixing errors in XBRL Instance documents" and an introduction to XBRL data errors here.

The next example is a little more subtle. Wrong but not immediately obvious, unless you are trying to model some business sectors, perhaps using our Sector3 product! - find loads more info on our award winning product here.

Boeing in their 2012 10-K slightly changed the tag for one of their top level business segments in one of the sections showing data for their businesses. It was referring to exactly the same segment (the label i.e. the description was the same) but the tag was different. A mistake - someone wasn't paying attention. In fact with reference to the previous example, they created an entirely bogus "context" for this identical segment.

So we got rid of it, replacing all connections to it with the correct context reference (shown below). In fact there was more than one wrong context so they all went the same way.

There was a little more work to do here than previously as a tag comes with a panoply of associated data - labels, definitions & the like, all of which we felt it was prudent to remove. Details were as ever recorded in the "xsd" file as shown below.

The consequence of this error was that data was missing for assets in Boeing's Defense, Space And Security business. Well it isn't anymore in Sector3.