We are seeking an exceptionally skilled Senior Data Engineer with extensive experience in ETL development and Bulk Data Extraction through Web Scraping and Crawling technologies. The ideal candidate will have a solid background in building, optimizing, and maintaining ETL pipelines, with strong expertise in data management and cloud technologies. This is a challenging role designed for a detail-oriented professional who is passionate about delivering high-quality data solutions.
You will be responsible for managing large-scale data projects, ensuring best practices in security, scalability, and performance. If you have a proven track record of delivering comprehensive data engineering solutions across diverse industries, we would love to hear from you.
Roles & Responsibilities
As a Senior Data Engineer, you will:
1. ETL Pipeline Management:
- Develop, optimize, and troubleshoot ETL pipelines across production, QA, and UAT environments.
- Handle large datasets and manage server infrastructure effectively.
2. Web Scraping and Crawling:
- Implement advanced web scraping techniques for rapid and efficient data extraction from large-scale sources across hundreds of websites.
3. Performance Optimization:
- Apply techniques such as parallelism, concurrency, and threading to improve data extraction and processing efficiency.
4. Library and Framework Expertise:
- Utilize relevant tools such as requests, xpath, ActionChain, Urllib, Numpy, Pandas, and lxml.
- Work with frameworks like Selenium and Scrapy to build scalable scraping solutions.
5. Database Management:
- Work with both relational databases (Postgres, MySQL) and NoSQL databases.
- Optimize database performance, including schemas, triggers, and queries.
6. Cloud and API Integration:
- Implement cloud best practices for security, scalability, and performance using AWS, GCP, or Azure.
- Utilize AWS services such as Lambda, EC2, and S3 alongside external APIs.
7. Data Engineering:
- Participate in data warehousing, architecture, and ETL pipeline development at an enterprise scale.
- Create and optimize SQL procedures and data pipelines for seamless integration.
8. Collaboration and Project Delivery:
- Work cross-functionally with teams to deliver end-to-end data solutions.
- Ensure source data quality, develop mapping and transformations, and validate jobs across various environments.
Skills & Requirements
To be successful in this role, you should possess:
- ETL Development Expertise: Proficiency in ETL pipeline development and optimization for large datasets.
- Bulk Data Extraction: Experience with Web Scraping and Crawling technologies.
- Parallel Processing: Strong understanding of parallelism, concurrency, and threading.
- Technical Stack: Expertise in Python-based libraries like Numpy, Pandas, Selenium, and Scrapy.
- Database Management: Proficient in Postgres, MySQL, and NoSQL databases.
- Cloud Technologies: Experience in AWS, GCP, and Azure, with hands-on knowledge of AWS Lambda, EC2, and S3.
- REST API: Experience with developing and integrating REST APIs.
- SQL Proficiency: Skilled in SQL procedures, query optimization, and pipeline creation.
- Version Control: Familiarity with code versioning tools like Git and SVN.
- SDLC Knowledge: Solid understanding of the Software Development Life Cycle (SDLC), testing methodologies, and CI/CD processes.
- Problem-Solving Skills: Exceptional analytical and problem-solving abilities with a strong focus on delivering scalable data solutions.
Experience
- Minimum 4-5 years of experience in ETL development, including experience with Bulk Data Extraction through web scraping and crawling technologies.
- Proven track record of delivering large-scale data engineering solutions.
- Prior experience with cloud environments like AWS, GCP, or Azure is highly preferred.
Why Join Us?
- Opportunity to work on high-impact, large-scale data projects.
- Collaborative, growth-oriented environment where innovation is encouraged.
- Competitive compensation and benefits package.
Apply Today
Email: careers@webdataguru.com