TABLE OF CONTENTS:
Intraday data is rapidly becoming an essential resource for market forecasters as numerous studies now point to the superiority of intraday data in predicting market movements, even for longer time-horizons such as over one year. (Introduction to Intraday ETF Data)
The selection of a data vendor is, however, very fraught since there are numerous to choose from and the quality and consistency of the data is not easy to verify prior to purchase. At FirstRate Data we recently selected our two primary data vendors after vetting over twenty vendors and we found several criteria were indicative of a high-quality vendor.
The quality of support has a direct correlation with the quality of the data. We persistently found that vendors with high-quality intraday data also provided excellent pre-sales support. We would typically test this by sending a detailed list of questions to a vendor on a Friday and time the response. High-quality vendors would typically reply on the day or even on the Saturday, whilst low-quality vendors would take upwards of five days to respond and often give incomplete or superficial answers.
The questions we typically asked were for details on how dividends were adjusted in stock data, what exchanges (including dark pools) data was sourced from, how frequently (and at what times during the day) were datasets updated, and what work was done on validating the completeness and accuracy of a dataset.
Vendors that do not provide sample files were generally of poor quality. In addition, sample files should be of sufficient size for a sample test to be done (this is usually 10 days for intraday data and 2 days for tick data).
Sample files should be accompanied by full details on the data set (ie timezone and timestamp details, policy on zero volume bars, volume units, exchanges covered).
High-quality vendors normally provide details on how the datasets are cleaned and tested. For intraday data expect to see details such as how zero volume bars are dealt with, are outlier data-points removed or only flagged, are stock prices adjusted for splits and dividends (and how the dividend-adjusted price is calculated).
For tick data, expect to see details on how simultaneous ticks are dealt with, is only trade-tick data available or is it combined with bid-ask tick data, finally, how are errors coming from the exchange datafeed such as zero volume or zero/negative prices dealt with.
Once we were satisfied with the above checks we proceeded to a test purchase (of 10+ years intraday data or 2+ years tick data) for detailed testing.
We found that several types of errors were indicative of issues in the broader datasets, namely: