Is Web Scraping Legal in 2026

Updated May 2026

Web scraping of publicly accessible data is generally legal in the United States and the EU when done without bypassing technical access controls, though legality depends on what data you scrape, how you access it, and what you do with it. The CFAA, GDPR, copyright law, and terms of service each impose different requirements and limitations that scraping operations must navigate carefully.

The CFAA and Public Data

The Computer Fraud and Abuse Act (CFAA) is the primary US federal law relevant to web scraping. Its key prohibition is against accessing computers "without authorization" or "exceeding authorized access." The central legal question for scraping has been whether automated access to publicly available web data constitutes unauthorized access under the CFAA.

The Ninth Circuit's ruling in hiQ Labs v. LinkedIn (2022) established the most important precedent. The court held that scraping publicly accessible data from a website does not violate the CFAA because the statute was designed to prevent hacking into protected systems, not to restrict access to information that anyone with a web browser can view. If data is available to the general public without any authentication barrier, accessing it automatically is not "unauthorized" under the CFAA.

The Meta v. Bright Data decision in 2024 reinforced this position. The court found that Bright Data's scraping of publicly available data on Facebook and Instagram did not violate the CFAA. However, the court allowed Meta's breach of contract claim to proceed for the period when Bright Data had an active contractual relationship with Meta, illustrating that CFAA compliance does not eliminate all legal risk.

Bypassing technical access controls changes the analysis significantly. If a website requires a login, uses CAPTCHAs, implements IP blocking, or otherwise restricts access, circumventing these measures may constitute unauthorized access under the CFAA. The distinction between publicly accessible data and access-controlled data is the critical line.

Does violating a website's terms of service make scraping illegal under the CFAA?

Under current Ninth Circuit precedent, no. The hiQ and Bright Data decisions both held that violating terms of service does not create CFAA liability. However, terms of service violations can give rise to separate breach of contract claims, which carry different legal consequences. The distinction is that the CFAA is a federal criminal statute with steep penalties, while breach of contract is a civil matter with more limited remedies.

Can I scrape data that requires creating a free account to access?

This is a gray area. If the site offers free accounts to anyone and the data is accessible to all logged-in users, some courts have treated this as effectively public data. However, the account creation process typically involves agreeing to terms of service, which may prohibit scraping. Additionally, using an account solely for scraping purposes could be viewed as exceeding authorized access under some interpretations of the CFAA. Legal counsel is recommended for this scenario.

Data Protection: GDPR and CCPA

Data protection regulations add requirements when scraped data includes personal information. The GDPR, which applies to data subjects in the European Economic Area regardless of where the scraper operates, requires a lawful basis for processing personal data. Simply because personal data is publicly visible (a LinkedIn profile, a Twitter bio, a business directory listing) does not automatically provide a lawful basis for collecting and processing it at scale.

Legitimate interest is the most commonly claimed lawful basis for scraping personal data under GDPR. This requires demonstrating that your interest in the data is legitimate, that the processing is necessary for that interest, and that the data subject's rights do not override your interest. Market research, competitive analysis, and fraud prevention have been recognized as legitimate interests, but the specific circumstances of each case determine whether the claim holds.

The CCPA gives California residents the right to know what personal information is collected about them, to delete that information, and to opt out of its sale. Organizations that scrape personal data of California residents and meet the CCPA's threshold requirements (revenue, data volume, or business model) must comply with these provisions.

Data minimization principles under both GDPR and CCPA require collecting only the personal data necessary for your stated purpose. Scraping entire profiles when you only need business email addresses violates minimization principles and increases legal exposure. Design your extraction schemas to capture only the specific personal data fields your use case requires.

Can I scrape and store publicly visible email addresses?

Technically possible, but legally nuanced. In the US, collecting publicly visible email addresses is generally not prohibited by the CFAA. However, using them for unsolicited commercial email may violate the CAN-SPAM Act. Under GDPR, collecting email addresses constitutes processing personal data and requires a lawful basis. Under CCPA, individuals can request deletion of their collected data. The legality depends on your jurisdiction, the source of the data, and your intended use.

Copyright Considerations

Copyright law protects original creative expression but does not protect facts. This distinction is critical for web scraping. Product prices, business hours, stock availability, weather data, and sports scores are facts that cannot be copyrighted. Articles, blog posts, product descriptions with creative elements, photographs, and videos are copyrightable creative works.

Extracting factual data from websites generally does not raise copyright concerns. A scraper that extracts product names, prices, and specifications from e-commerce sites is dealing in facts, not creative expression. However, scraping and republishing entire articles, product reviews, or creative descriptions may constitute copyright infringement.

The database rights concept, recognized in the EU but not the US, protects the investment in compiling databases even when the individual data points are factual. A scraped compilation of factual data from an EU-based database may infringe the database maker's sui generis rights if it represents a substantial part of the database.

Robots.txt and Technical Standards

The robots.txt standard is a voluntary convention, not a legal requirement. Websites publish robots.txt files to indicate which parts of the site automated crawlers should not access. Compliance with robots.txt is not legally mandated, but courts have considered it as evidence of good faith in scraping disputes.

Ignoring robots.txt does not automatically make scraping illegal, but it does weaken your legal position if a dispute arises. Respecting robots.txt directives as a baseline practice demonstrates that your scraping operation makes a good-faith effort to comply with site operators' wishes. When you have a legitimate reason to access content that robots.txt restricts (such as security research or academic study), document that reason.

Emerging Legal Landscape

Several ongoing legal developments may reshape the legal framework for web scraping in the coming years. Reddit's lawsuit against Perplexity AI and data collection services invokes DMCA Section 1201, alleging that circumventing rate limits and anti-bot systems constitutes circumvention of technological protection measures. If this argument succeeds, it could significantly expand the legal tools available to website operators against scrapers.

The broader wave of AI training data litigation, involving news publishers, authors, visual artists, and content platforms, may produce rulings that affect web scraping for AI purposes specifically. While scraping for data extraction and scraping for model training raise different legal questions, court decisions in these cases may establish precedents that apply to both activities.

State-level legislation is also evolving. Several US states have proposed or enacted laws addressing automated data collection, bot activity, and digital privacy that may impose additional requirements on scraping operations. Staying current with these developments is important for organizations that operate at scale.

Key Takeaway

Web scraping of public data is generally legal under US law following hiQ v. LinkedIn, but legality depends on what you scrape (facts vs. creative works), how you access it (public vs. access-controlled), what you do with it (especially personal data under GDPR/CCPA), and whether you bypass technical protections. Consult legal counsel for your specific use case.

The CFAA and Public Data

Data Protection: GDPR and CCPA

Copyright Considerations

Robots.txt and Technical Standards

Emerging Legal Landscape

Related Questions

Anti-Detection Techniques for AI Scraping

AI Scraping for Social Media Data

What Is AI-Powered Web Scraping

AI Scraping for E-Commerce Data

AI Research Automation