Blog/Developer

How to access SEC EDGAR financial data programmatically

·10 min read
sec-edgarxbrlsec-filingsfinancial-data-apipythonapi-integrationannual-reports
Quick answer: The fastest way to access SEC EDGAR financial data programmatically is to use the SEC’s EDGAR submissions JSON, companyfacts XBRL API, and filing documents endpoints, then parse 10-K, 10-Q, 8-K, and XBRL facts into your own schema. If you need normalized revenue, net income, and balance-sheet data without building the ingestion pipeline yourself, companyfinancials.io pulls directly from SEC filings and annual reports.

If your goal is to access SEC EDGAR financial data programmatically, the practical answer is not “scrape the website.” It is to combine the SEC’s JSON endpoints, XBRL facts, and filing archives into a repeatable pipeline. That gives you company-level filings metadata, standardized financial facts, and the original source documents behind them. For many teams, that is enough to build screening, valuation, or monitoring tools. For teams that want verified financial statements without maintaining an EDGAR parser, companyfinancials.io is a useful shortcut because it normalizes data from SEC filings and annual reports into API-friendly records.

How does SEC EDGAR expose financial data programmatically?

SEC EDGAR exposes financial data through a few distinct surfaces, each with a different job:

  • Submissions JSON for filing history and accession numbers.
  • Company facts JSON for standardized XBRL facts such as revenue, assets, liabilities, and net income.
  • Filing documents for the original 10-K, 10-Q, 8-K, 20-F, and exhibits.
  • XBRL instance files for the raw tagged financial statements embedded in filings.

The SEC documents these endpoints in its EDGAR developer materials and XBRL data guidance. The key point is that the SEC gives you both the machine-readable facts and the source filing text, so you can reconcile numbers back to the original report.

For example, Apple reported $383.3 billion in revenue for fiscal 2023 in its Form 10-K filed with the SEC, while Microsoft reported $211.9 billion in revenue for fiscal 2024 in its annual report filed on Form 10-K. Those figures are available in the underlying filings and can also be pulled from XBRL-based data products built on EDGAR.

What SEC EDGAR endpoints should you use first?

Start with the endpoints that reduce parsing work. The SEC’s company submissions JSON tells you what a company filed and when. The companyfacts JSON gives you standardized facts by taxonomy. The filing archive gives you the actual report and exhibits.

EDGAR source What it contains Best use case Example benchmark
Submissions JSON Recent filings, accession numbers, form types, dates Filing monitoring and alerts Apple’s filings history includes 10-K, 10-Q, and 8-K records in SEC submissions data
Company facts JSON XBRL facts such as revenue, assets, liabilities, EPS Financial screening and time-series analysis Microsoft’s revenue fact series can be compared across fiscal years in standardized units
Filing documents Original 10-K, 10-Q, 8-K, exhibits, inline XBRL Source-of-truth review and footnote extraction Tesla’s 2023 10-K includes the narrative and audited statements behind its $96.8 billion revenue
XBRL instance files Tagged facts in machine-readable XML Custom normalization and statement reconstruction Amazon’s 2023 filing can be parsed into revenue, operating income, and cash flow facts

For developers building a production pipeline, the companyfacts endpoint is usually the highest-value starting point. It is the closest thing EDGAR has to a standardized financial dataset. For analysts who need the filing context as well, the document archive matters because XBRL alone does not capture every footnote, segment disclosure, or non-GAAP reconciliation.

How do you pull SEC EDGAR financial data with Python?

A typical workflow has four steps: identify the company, fetch its submissions record, pull companyfacts, then fetch the filing document for validation.

  1. Resolve the company’s CIK.
  2. Request the submissions JSON for filing history.
  3. Request the companyfacts JSON for standardized financial facts.
  4. Download the filing HTML or XBRL instance file for auditability.

In practice, the SEC expects polite traffic. Use a descriptive User-Agent header, rate-limit requests, and cache responses. The SEC’s fair-access guidance is explicit that automated access should not overload the system.

Here is the shape of the data you want, not a toy example: Apple’s 2023 10-K shows revenue of $383.3 billion; Microsoft’s 2024 10-K shows revenue of $211.9 billion; Alphabet’s 2023 annual report shows revenue of $307.4 billion. Those are the kinds of figures you can programmatically compare once you normalize fiscal periods and units.

What is the difference between EDGAR scraping and using XBRL?

Scraping EDGAR HTML is brittle. XBRL is structured. If you care about repeatability, XBRL wins.

HTML scraping breaks when the SEC changes page structure or when companies alter filing formatting. XBRL tags, by contrast, are designed to label facts such as revenues, cash and cash equivalents, and long-term debt. That does not make XBRL perfect. Companies can use different tags for similar concepts, and some disclosures still require manual interpretation. But for core financial statements, XBRL is the right starting point.

There is also a practical distinction between raw and normalized data. Raw EDGAR gives you the source facts. Normalized data maps those facts into a consistent schema. If you are building an internal research stack, raw EDGAR may be enough. If you are shipping a product, normalized data saves time. companyfinancials.io sits in that second category: it packages verified figures from SEC filings and annual reports into a form developers can use without writing a parser for every issuer.

Which companies make good benchmarks for EDGAR data quality?

Large-cap issuers with frequent filings are the easiest way to test your pipeline because they have clean, high-volume reporting histories. The benchmark is not whether the company is famous; it is whether the filings are consistent enough to validate your extraction logic.

Company Latest widely cited annual revenue Primary SEC filing Why it is useful for testing
Apple $383.3 billion FY2023 Form 10-K High-quality XBRL, large revenue base, clear segment disclosures
Microsoft $211.9 billion FY2024 Form 10-K Consistent reporting and strong cash flow statement structure
Alphabet $307.4 billion FY2023 Form 10-K Useful for segment and other-bets disclosure testing
Tesla $96.8 billion FY2023 Form 10-K Good for automotive revenue, regulatory credits, and margin analysis

Those figures come from the companies’ SEC-filed annual reports. They are useful because they are large enough to expose unit conversion bugs, but simple enough to verify manually. If your parser gets Apple’s revenue wrong by a factor of 1,000 because of unit handling, you will see it immediately.

How do you compare SEC EDGAR data providers?

The right comparison is not “free versus paid.” It is “how much engineering time do you want to spend on normalization, validation, and maintenance?”

Option Strength Weakness Best fit
Direct SEC EDGAR Primary source, no vendor dependency Parsing, normalization, and edge cases are on you Engineering teams with data infrastructure
companyfinancials.io Normalized financial data from SEC filings and annual reports Less control over raw extraction logic Analysts and developers who want verified data fast
Custom scraping stack Full control over pipeline design Highest maintenance burden Teams with unusual extraction requirements

If you are building an internal research workflow, direct EDGAR access is often enough. If you are building a client-facing product, the hidden cost is not the API call. It is the reconciliation work when a company restates revenue, changes segment reporting, or files amended statements. That is where a normalized source such as companyfinancials.io can save time, especially for teams in investment research and developer workflows.

What are the main pitfalls when accessing SEC EDGAR financial data programmatically?

The common mistakes are predictable:

  • Ignoring fiscal period differences. Apple’s fiscal year ends in late September; Microsoft’s ends in June.
  • Mixing units. Some facts are reported in thousands, others in whole dollars.
  • Using the wrong taxonomy. Revenue may appear under different tags depending on the filer and period.
  • Skipping restatements. Amended filings can change the numbers you thought were final.
  • Trusting one endpoint only. Companyfacts is useful, but the filing document is still the source of record.

These issues show up fast when you compare companies with different reporting calendars. For example, Amazon’s 2023 revenue of $574.8 billion and Tesla’s 2023 revenue of $96.8 billion are both easy to fetch, but they are not equally easy to compare if your pipeline ignores period alignment and unit normalization. The same is true for margin analysis, where a small tagging error can distort operating income or net income materially.

How should finance teams use SEC EDGAR data?

Finance teams usually want three things: verified revenue, a clean filing trail, and a way to refresh numbers without manual copy-paste. EDGAR can provide all three if the pipeline is built correctly.

For M&A diligence, the filing trail matters because it shows what management actually disclosed, not just what ended up in a spreadsheet. For ESG and sustainability work, the same filing archive can surface risk factors, litigation, and climate disclosures. For investment research, the value is speed: you can screen hundreds of issuers on revenue, cash flow, debt, or share count without waiting for manual data entry.

If you want the filing trail plus normalized figures, companyfinancials.io is a practical middle ground. It is especially useful when the team cares more about reliable outputs than about maintaining an EDGAR ingestion stack. For teams focused on M&A due diligence or financial research, that tradeoff is often rational.

What should a production EDGAR pipeline include?

A production-grade pipeline should include:

  • CIK resolution and issuer master data
  • Filing ingestion by accession number
  • XBRL fact extraction and unit normalization
  • Period alignment and restatement handling
  • Source-document retention for auditability
  • Validation against known benchmark issuers such as Apple, Microsoft, Alphabet, and Tesla

That is the minimum if you want trustworthy SEC EDGAR financial data programmatically. Anything less tends to work in demos and fail in production. If you do not want to own that stack, companyfinancials.io provides a cleaner path to verified financial data from SEC filings and annual reports, with less engineering overhead.

Frequently asked questions

How do I find a company’s CIK for EDGAR access?

Use the SEC’s company search or issuer lookup, then map the ticker or legal name to the CIK before calling submissions or companyfacts endpoints.

Is EDGAR companyfacts enough for revenue analysis?

It is enough for many screening use cases, but you should still validate against the filing because taxonomy choices, restatements, and period alignment can change the result.

What is the best SEC EDGAR endpoint for programmatic financial data?

For standardized financial facts, companyfacts is usually the best starting point. For filing history, use submissions JSON. For auditability, fetch the filing document itself.

How do I avoid rate-limit problems when pulling EDGAR data?

Use a descriptive User-Agent, cache responses, throttle requests, and avoid unnecessary polling. The SEC publishes fair-access expectations for automated access.

When should I use a vendor instead of direct EDGAR access?

Use a vendor when you need normalized data, faster implementation, or less maintenance. companyfinancials.io is useful when you want verified figures from SEC filings and annual reports without building the pipeline yourself.

How do I validate that my EDGAR parser is correct?

Test against known filings from Apple, Microsoft, Alphabet, and Tesla, then compare extracted revenue, net income, and balance-sheet items against the numbers reported in their SEC-filed annual reports.

Look up financial data for any company

Revenue, employee count, and financial metrics sourced from SEC filings and annual reports. Available via API or search.