Data Scraping Engineer

We are hiring a Data Scraping Engineer to help expand the world’s largest database of basketball stats by gathering structured data from publicly available sources. This role involves developing and maintaining scraping pipelines, automating data extraction workflows, and ensuring that collected data is accurate, clean, and formatted for seamless integration into Cerebro Sports’ database.

Key Responsabilities

  • Design, build, and maintain scalable web scraping pipelines.
  • Automate data extraction from public websites, APIs, and unstructured sources.
  • Implement mechanisms to handle dynamic content, pagination, and authentication challenges.
  • Ensure data integrity by implementing validation, deduplication, and cleaning processes.
  • Convert raw scraped data into structured formats (CSV, JSON, SQL) for ingestion.
  • Optimize data pipelines for speed, efficiency, and minimal server load.
  • Develop and manage scraping infrastructure using Python, Selenium, and WebDriver.
  • Implement proxy management and CAPTCHA bypass techniques to avoid blocking.
  • Monitor and maintain scraping scripts to ensure uptime and scalability.
  • Work closely with engineering and analytics teams to identify compliance impacts.

Qualifications & Skills:

  • Strong proficiency in Python for web scraping and automation.
  • Experience with Selenium, WebDriver, and/or BeautifulSoup for extracting structured and dynamic content.
  • Familiarity with APIs, JSON, XML, and web crawling best practices.
  • Knowledge of SQL and database integration for structured data storage.
  • Experience with ETL pipelines and transforming raw data into structured formats.
  • Familiarity with AWS, GCP, or Azure for managing scraping workflows.
  • Understanding of proxy management, headless browsers, and request optimization.
  • Experience working with basketball data and analytics is a plus.
Apply Now