ETL/DB/Data Extraction Engineer

We are seeking an exceptionally skilled Senior Data Engineer with extensive experience in ETL development and Bulk Data Extraction through Web Scraping and Crawling technologies. The ideal candidate will have a solid background in building, optimizing, and maintaining ETL pipelines, with strong expertise in data management and cloud technologies. This is a challenging role designed for a detail-oriented professional who is passionate about delivering high-quality data solutions.

You will be responsible for managing large-scale data projects, ensuring best practices in security, scalability, and performance. If you have a proven track record of delivering comprehensive data engineering solutions across diverse industries, we would love to hear from you.

Roles & Responsibilities

As a Senior Data Engineer, you will:

1. ETL Pipeline Management:

Develop, optimize, and troubleshoot ETL pipelines across production, QA, and UAT environments.
Handle large datasets and manage server infrastructure effectively.

2. Web Scraping and Crawling:

Implement advanced web scraping techniques for rapid and efficient data extraction from large-scale sources across hundreds of websites.

3. Performance Optimization:

Apply techniques such as parallelism, concurrency, and threading to improve data extraction and processing efficiency.

4. Library and Framework Expertise:

Utilize relevant tools such as requests, xpath, ActionChain, Urllib, Numpy, Pandas, and lxml.
Work with frameworks like Selenium and Scrapy to build scalable scraping solutions.

5. Database Management:

Work with both relational databases (Postgres, MySQL) and NoSQL databases.
Optimize database performance, including schemas, triggers, and queries.

6. Cloud and API Integration:

Implement cloud best practices for security, scalability, and performance using AWS, GCP, or Azure.
Utilize AWS services such as Lambda, EC2, and S3 alongside external APIs.

7. Data Engineering:

Participate in data warehousing, architecture, and ETL pipeline development at an enterprise scale.
Create and optimize SQL procedures and data pipelines for seamless integration.

8. Collaboration and Project Delivery:

Work cross-functionally with teams to deliver end-to-end data solutions.
Ensure source data quality, develop mapping and transformations, and validate jobs across various environments.

Skills & Requirements

To be successful in this role, you should possess:

ETL Development Expertise: Proficiency in ETL pipeline development and optimization for large datasets.

Bulk Data Extraction: Experience with Web Scraping and Crawling technologies.

Parallel Processing: Strong understanding of parallelism, concurrency, and threading.

Technical Stack: Expertise in Python-based libraries like Numpy, Pandas, Selenium, and Scrapy.

Database Management: Proficient in Postgres, MySQL, and NoSQL databases.

Cloud Technologies: Experience in AWS, GCP, and Azure, with hands-on knowledge of AWS Lambda, EC2, and S3.

REST API: Experience with developing and integrating REST APIs.

SQL Proficiency: Skilled in SQL procedures, query optimization, and pipeline creation.

Version Control: Familiarity with code versioning tools like Git and SVN.

SDLC Knowledge: Solid understanding of the Software Development Life Cycle (SDLC), testing methodologies, and CI/CD processes.

Problem-Solving Skills: Exceptional analytical and problem-solving abilities with a strong focus on delivering scalable data solutions.

Experience

Minimum 4-5 years of experience in ETL development, including experience with Bulk Data Extraction through web scraping and crawling technologies.

Proven track record of delivering large-scale data engineering solutions.

Prior experience with cloud environments like AWS, GCP, or Azure is highly preferred.

Why Join Us?

Opportunity to work on high-impact, large-scale data projects.

Collaborative, growth-oriented environment where innovation is encouraged.

Competitive compensation and benefits package.

Apply Today

Email: careers@webdataguru.com