Author Topic: SEC EDGAR Gurus (and financial programmers)  (Read 16865 times)

oddballstocks

  • Lifetime Member
  • Hero Member
  • *****
  • Posts: 2238
    • Oddball Stocks Blog
SEC EDGAR Gurus (and financial programmers)
« on: March 17, 2013, 06:22:48 PM »
I'm looking for a way to find a ticker for a stock if I already have a CIK. The SEC is CIK driven and I can't seem to find a link between the two. Does anyone know of a list that's updated frequently with a CIK and associated ticker?

I have already surfed the SEC's FTP server without luck.

If not any idea on how to get this programatically?

On a side note can somebody explain the SEC ascention numbering scheme?
The ultimate edge for bank investors: http://www.completebankdata.com


Hielko

  • Hero Member
  • *****
  • Posts: 1125
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #1 on: March 17, 2013, 06:38:15 PM »
From a quick google search: http://www.jot.fm/issues/issue_2008_09/column2/index.html

This is how you can go from ticker to cik.

oddballstocks

  • Lifetime Member
  • Hero Member
  • *****
  • Posts: 2238
    • Oddball Stocks Blog
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #2 on: March 17, 2013, 06:58:40 PM »
Thanks but I already have the CIK, I'm trying to get a ticker from the CIK.

The SEC has a master file updated nightly on their FTP server with every company and the associated CIK. But no where in anything (filings/XBRL) do they have the ticker listed, that's the problem.

I can get a list from NASDAQ but I'm left to string matching and it isn't very accurate. I get oddities like Apple Community Bank getting the ticker AAPL, and there are too many to manually look up in Yahoo finance by a human.
The ultimate edge for bank investors: http://www.completebankdata.com

compoundinglife

  • Hero Member
  • *****
  • Posts: 645
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #3 on: March 17, 2013, 07:01:01 PM »
I'm looking for a way to find a ticker for a stock if I already have a CIK. The SEC is CIK driven and I can't seem to find a link between the two. Does anyone know of a list that's updated frequently with a CIK and associated ticker?

I have already surfed the SEC's FTP server without luck.

If not any idea on how to get this programatically?

On a side note can somebody explain the SEC ascention numbering scheme?

You could programatically get it by scraping the data off the edgar website. I usually use python for stuff like this but it could be easily adapter to other languages.

This code (in python) takes the ticker in the variable "ticker" and the fetchs the company data page from edgar, parses out the CIK.

Code: [Select]
import urllib2
import time
ticker = 'BAC'
string_match = 'rel="alternate"'
url = 'http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=%s&owner=exclude&Find=Find+Companies&action=getcompany' % ticker
response = urllib2.urlopen(url)


for line in response:
    if string_match in line:
        for element in  line.split(';'):
            if 'CIK' in element:
                cik = element.replace('&amp','')
                print cik

compoundinglife

  • Hero Member
  • *****
  • Posts: 645
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #4 on: March 17, 2013, 07:02:49 PM »
I'm looking for a way to find a ticker for a stock if I already have a CIK. The SEC is CIK driven and I can't seem to find a link between the two. Does anyone know of a list that's updated frequently with a CIK and associated ticker?

I have already surfed the SEC's FTP server without luck.

If not any idea on how to get this programatically?

On a side note can somebody explain the SEC ascention numbering scheme?

You could programatically get it by scraping the data off the edgar website. I usually use python for stuff like this but it could be easily adapter to other languages.

This code (in python) takes the ticker in the variable "ticker" and the fetchs the company data page from edgar, parses out the CIK.

Code: [Select]
import urllib2
import time
ticker = 'BAC'
string_match = 'rel="alternate"'
url = 'http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=%s&owner=exclude&Find=Find+Companies&action=getcompany' % ticker
response = urllib2.urlopen(url)


for line in response:
    if string_match in line:
        for element in  line.split(';'):
            if 'CIK' in element:
                cik = element.replace('&amp','')
                print cik

Woops this does the opposite of what you asked for. Could do the reverse easy as well. Happy to post an example after dinner for getting the reverse if you want/need it.

Hielko

  • Hero Member
  • *****
  • Posts: 1125
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #5 on: March 17, 2013, 07:17:01 PM »
The SEC has a master file updated nightly on their FTP server with every company and the associated CIK. But no where in anything (filings/XBRL) do they have the ticker listed, that's the problem.
I don't think there is a clean and easy solution. The CIK is a unique key for entries in the SEC database, but I think you also get a CIK number if you for example file as an individual a 5% position. And the SEC doesn't care about ticker symbols. That something between companies and exchanges. So you either need to build your own database or find someone who has done the work. Probably not available for free...

oddballstocks

  • Lifetime Member
  • Hero Member
  • *****
  • Posts: 2238
    • Oddball Stocks Blog
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #6 on: March 17, 2013, 07:30:02 PM »
The SEC has a master file updated nightly on their FTP server with every company and the associated CIK. But no where in anything (filings/XBRL) do they have the ticker listed, that's the problem.
I don't think there is a clean and easy solution. The CIK is a unique key for entries in the SEC database, but I think you also get a CIK number if you for example file as an individual a 5% position. And the SEC doesn't care about ticker symbols. That something between companies and exchanges. So you either need to build your own database or find someone who has done the work. Probably not available for free...

Correct, as of this quarter there are 65000 CIK codes, but I already have them filtered down to around 5000 ones that I know pertain to what I need. The problem is as you state, the ticker is an exchange mechanism and the SEC could case less.

There is one possible extremely clunky way which is this. When a company files with XBRL the standard naming convention is TICKER-date.xml. This is in the zip files linked from the SEC XBRL feed. The problem is I can't get the historic zips from the SEC because everything historic is stored on the FTP server in a different format. The format is passable, but doesn't contain the tickers.

The problem with using the RSS feed is it would take up to six months to build this as companies file. And I wouldn't have any historic tickers.

There are no paid solutions that I can find either, but I can build this myself, no need to pay.
The ultimate edge for bank investors: http://www.completebankdata.com

oddballstocks

  • Lifetime Member
  • Hero Member
  • *****
  • Posts: 2238
    • Oddball Stocks Blog
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #7 on: March 17, 2013, 07:31:27 PM »
I'm looking for a way to find a ticker for a stock if I already have a CIK. The SEC is CIK driven and I can't seem to find a link between the two. Does anyone know of a list that's updated frequently with a CIK and associated ticker?

I have already surfed the SEC's FTP server without luck.

If not any idea on how to get this programatically?

On a side note can somebody explain the SEC ascention numbering scheme?

You could programatically get it by scraping the data off the edgar website. I usually use python for stuff like this but it could be easily adapter to other languages.

This code (in python) takes the ticker in the variable "ticker" and the fetchs the company data page from edgar, parses out the CIK.

Code: [Select]
import urllib2
import time
ticker = 'BAC'
string_match = 'rel="alternate"'
url = 'http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=%s&owner=exclude&Find=Find+Companies&action=getcompany' % ticker
response = urllib2.urlopen(url)


for line in response:
    if string_match in line:
        for element in  line.split(';'):
            if 'CIK' in element:
                cik = element.replace('&amp','')
                print cik

Woops this does the opposite of what you asked for. Could do the reverse easy as well. Happy to post an example after dinner for getting the reverse if you want/need it.

Yes I'd love to see this in reverse if you have a chance, thanks!

I was going to post this on StackOverflow. It realized it's more of a business domain problem than a coding one. You guys knew exactly what I meant, this forum is awesome!
The ultimate edge for bank investors: http://www.completebankdata.com

compoundinglife

  • Hero Member
  • *****
  • Posts: 645
Re: SEC EDGAR Gurus (and financial programmers)
« Reply #8 on: March 17, 2013, 09:47:35 PM »
Ok this very hacky but it works for the few symbols I tested. If I were going to use this on a regular basis I would use one the html parsing modules available for python, but I wanted a quick POC that did not require installing additional software.

As mentioned earlier in this thread, the SEC offers ticker -> CIK resolution via there web interface and you can screen scrape data that way if you have the ticker and want the CIK. However they do not offer the reverse.

So basically what my script does is hit the SEC website with the CIK to get a company name. It then takes the company name and does a search against yahoo for the ticker. if more than one ticker are returned from yahoo finance it only grabs the first. I tested it on a few tickers and worked ok, but I imagine it will not work for all cases. And this code is very dependent on yahoo or the SEC not changing layouts on their websites.

I also do some fuzzy matching on the names. For example I only grab the first two words in the company name unless the second word is only 2 letters long, then I grab three. This is try and avoid situations where one data source has "corp" and the other has "corporation" etc...

The best way to do this long term would be the harvest the data yourself with some scripts and then run your software or site or whatever off your normalized data.

Here are some usage example assuming the script name is "cik.py":

Code: [Select]
python cik.py 0001021860
Searching for symbol that matches 0001021860
Found company name NATIONAL OILWELL VARCO INC  from SEC
Attempting to search yahoo finance for NATIONAL OILWELL
Yahoo search URL is http://finance.yahoo.com/lookup?s=NATIONAL%20OILWELL
NOV

python cik.py 0001067983
Searching for symbol that matches 0001067983
Found company name BERKSHIRE HATHAWAY INC  from SEC
Attempting to search yahoo finance for BERKSHIRE HATHAWAY
Yahoo search URL is http://finance.yahoo.com/lookup?s=BERKSHIRE%20HATHAWAY
BRK-B

python cik.py 0000783412
Searching for symbol that matches 0000783412
Found company name DAILY JOURNAL CORP  from SEC
Attempting to search yahoo finance for DAILY JOURNAL
Yahoo search URL is http://finance.yahoo.com/lookup?s=DAILY%20JOURNAL
DJCO

python cik.py 1085917
Searching for symbol that matches 1085917
Found company name BANK OF AMERICA CORP  from SEC
Attempting to search yahoo finance for BANK OF AMERICA
Yahoo search URL is http://finance.yahoo.com/lookup?s=BANK%20OF%20AMERICA
BAC

And the python code:

Code: [Select]
import urllib
import urllib2
import re
import sys

cik = sys.argv[1]

print "Searching for symbol that matches %s" % cik

yahoo_url = 'http://finance.yahoo.com/lookup?s='
edgar_url = 'http://www.sec.gov/cgi-bin/browse-edgar?CIK=%s&action=getcompany' % cik
string_match = 'companyName'

# Fetch company page from edgar using the CIK
response = urllib2.urlopen(edgar_url)
for line in response:
    if string_match in line:
        name_match = re.search('<span class="companyName">(.*)<acronym', line)
        company_name = name_match.group(1)
        print "Found company name %s from SEC" % company_name

# Here we do some fuzzy logic. If the company name has more than
# three words then only use the first two unless the second word
# is 2 chars or less.
if len(company_name.split()) >= 2:
    company_name_words = company_name.split()
    if len(company_name_words[1]) <= 2:
        company_name = '%s %s %s' % (company_name_words[0],
                                     company_name_words[1],
                                     company_name_words[2])
    else:
        company_name = '%s %s' % (company_name_words[0], company_name_words[1])

print "Attempting to search yahoo finance for %s" % company_name


# URL encode the company name
company_name = urllib.quote(company_name)

#Take the company name to yahoo and get the ticker
yahoo_url = 'http://finance.yahoo.com/lookup?s=%s' % company_name
print "Yahoo search URL is %s" % yahoo_url
response = urllib2.urlopen(yahoo_url)
# Interate throught the HTML and print the first ticker. If there
# are more than one we only get the first.
for line in response:
    # the existence of "ticker_up" or "ticker_down" tells we are
    # on the line with the first symbol
    if "ticker_up" in line or "ticker_down" in line:
        ticker_link = re.search('<td>(.*?)</td>', line).group(1)
        ticker = re.search('">(.*?)</a>', ticker_link).group(1)
        print ticker
        break
« Last Edit: March 17, 2013, 11:19:09 PM by compoundinglife »