Back to jobsJob overview

About the role

Data Engineer II, ROW AOP at Amazon (China) Holding Company Limited

Required Skills

pythonsqlawsetlci/cdmachine learningdata pipelinesdata warehousing

About the Role

The Data Engineer II role involves designing and maintaining scalable data pipelines to support ML model development and production deployment. You will collaborate with data scientists and product managers to implement efficient data processing solutions and ensure proper data governance.

Key Responsibilities

  • Design, develop, and maintain scalable data pipelines to support ML model development and production deployment.
  • Implement and maintain CI/CD pipelines for the data and ML solutions.
  • Collaborate with data scientists and other team members to understand data requirements and implement efficient data processing solutions.
  • Create and manage data warehouses and data lakes, ensuring proper data governance and security measures are in place.
  • Stay current with emerging technologies and best practices in data engineering, and propose innovative solutions to improve data infrastructure and processes for ML models and analytics applications.

Required Skills & Qualifications

Must Have:

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in data engineering or related roles.
  • Strong programming skills in languages such as Python, Java, or Scala.
  • Expertise in SQL and experience with both relational and NoSQL databases.
  • Familiarity with cloud platforms (e.g., AWS) and their services.
  • Knowledge of data modeling, data warehousing, and ETL design patterns.
  • Experience with version control systems (e.g., Git) and CI/CD pipelines.
  • Strong problem-solving skills and attention to detail.
  • Excellent communication skills and ability to work in a collaborative team environment.

Nice to Have:

  • Experience working in a scientific or research-oriented environment.
  • Familiarity with machine learning workflows and model deployment.
  • Experience with Infrastructure as Code (IaC) by tools such as CDK.
  • Experience with streaming data processing and real-time analytics.
  • Experience with big data technologies (e.g., Hadoop, Spark, Hive).