Tuesday, January 26, 2010

General notes for linking databases

I was thinking of writing about this linkage issues between financial databases. However, I found out that Jie Cao was faster than me. So I quoted him instead of rewriting what he has documented already (http://ihome.cuhk.edu.hk/~b121456/tools.html).

  • NCUSIP is the historical CUSIP and changes over time. CUSIP is the current CUSIP and does not change over time. A historical NCUSIP during a specific period will correspond to only one current CUSIP. [www.cusip.com]
  • The NCUSIP in Thomson, I/B/E/S, ISSM, TAQ and  Option-Metrics is labeled as 'CUSIP'.
  • In Compustat, CNUM + first 2 digit of CIC is the CUSIP.
  • The major matching variable across databases are NCUSIP and then Ticker.  
  • The CUSIP-NCUSIP transition file builds a link between NCUSIP and CUSIP as well as PERMNO at a specified time interval. [Download the transition file here]
  • For ISSM database, all NYSE and AMEX stocks from 1983 to 1992, and NASDAQ stocks after 1990 can be matched by NCUSIP. NASDAQ stocks before 1990 could be matched by SMBL, which at a given month & exchange corresponds to the Ticker in CRSP.
  • For TAQ databse, stocks can be matched by the first 8 digits of TAQ's 12-digit NCUSIP.
  • Mutual Fund Links (MFLINKS) connects CRSP mutual fund information to Thomson (S12) mutual fund holding data. 
  • Matching by company or fund name is difficult as the last resort. The SAS function 'SPEDIS" can determine the likelihood of two words matching.  
  • Extra efforts are needed for a precise matching. See this sample SAS Code to generate a link between I/B/E/S and CRSP using multiple identifiers. (Internet connection and access to both I/B/E/S and CRSP data at WRDS are required)

Wednesday, January 06, 2010

CUSIP and CFMRC Dataset

If you would work with Canadian stocks info data, you would notice that CFMRC (the leading database for Canadian securities and the equivalent of CRPS (for US stocks) does not always list the CUSIP identifier with the data.

CUSIP is highly important to link the CFMRC data with other databases as tickers are not always reliable. In CFMRC (the Windows client), CUSIP is not produced in all formats. One has to select the extended format to get CUSIP in the output.