Zyte Logo

Zyte

Machine Learning Engineer - Web Data Quality

Posted 23 Days Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Rio de Janeiro, Rio, Rio de Janeiro
Mid level
In-Office or Remote
Hiring Remotely in Rio de Janeiro, Rio, Rio de Janeiro
Mid level
You will design and implement AI systems to assess and improve the quality of web datasets, collaborating with data, product, and engineering teams.
The summary above was generated by AI

At Zyte, we make the world’s web data accessible to everyone. Our technology powers data extraction at scale, helping businesses and researchers unlock the full potential of the web.

We’re a remote-first, multicultural team of engineers, data scientists, and innovators who believe in curiosity, collaboration, and continuous learning. If you’re passionate about building reliable AI systems and improving the quality of web data, we’d love to hear from you.

About the Role

As a Machine Learning Engineer (Web Data Quality), you’ll design and implement intelligent systems that automatically detect, measure, and improve the quality of large-scale web datasets. You’ll work at the intersection of data science, AI, and distributed systems, collaborating closely with product, engineering, and data teams to make data accuracy measurable, scalable, and actionable.


RequirementsWhat You’ll Do
  • Develop and deploy ML models for anomaly detection, schema drift, and content validation
  • Build and improve data quality pipelines leveraging modern data and MLOps tools
  • Design and optimize embeddings and GenAI models to enhance data consistency
  • Collaborate with engineers to integrate AI systems into production workflows
  • Conduct experiments, evaluate performance, and iterate for continuous improvement
  • Stay up to date on AI/ML and GenAI research to guide innovation within Zyte
Required
  • 3+ years of experience in Machine Learning / Data Science / AI Engineering
  • Strong Python skills and experience with ML frameworks (PyTorch, TensorFlow, scikit-learn)
  • Experience with data validation, anomaly detection, or data quality systems
  • Familiarity with data pipelines (Airflow, Spark, or similar)
  • Understanding of model evaluation, metrics, and deployment best practices
  • Excellent problem-solving, communication, and collaboration skills
Preferred
  • Experience with LangChain, LlamaIndex, or GenAI model orchestration
  • Familiarity with data labeling tools and active learning approaches
  • Contributions to open-source or public ML projects
  • Experience working in a remote, cross-functional team environment

Benefits
  • 35 days of paid time off
  • Health & wellness support
  • Inclusive and supportive team environment
  • Attend conferences and meet with team members from across the globe.
  • Work with cutting-edge open source technologies and tools

Top Skills

Airflow
Python
PyTorch
Scikit-Learn
Spark
TensorFlow

Similar Jobs

18 Hours Ago
Remote
Brazil
Senior level
Senior level
eCommerce • Edtech
Manage and mentor data engineers, lead technical projects, support data optimization, collaborate with stakeholders, and enhance data infrastructure.
Top Skills: AirflowAppflowAws DmsAws MksAws MwaaAws RdsAws S3Aws Sqs/SnsData Build Tool (Dbt)KafkaKinesisSpark StreamingTerraform
18 Hours Ago
Remote
Brazil
Mid level
Mid level
eCommerce • Edtech
Lead the full product lifecycle at Teachable by uncovering customer needs, collaborating with cross-functional teams, and aligning product strategy with business goals.
Top Skills: Ai ToolsFigma
18 Hours Ago
Remote or Hybrid
Brazil
Senior level
Senior level
Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Lead the engineering execution of Assist Agents, focusing on architecture, scalability, and integration for real-time AI interactions. Mentor engineers, manage technical direction, and ensure long-term success of the product.
Top Skills: Ai TechnologiesLangchainLlmsPyTorch

What you need to know about the Boston Tech Scene

Boston is a powerhouse for technology innovation thanks to world-class research universities like MIT and Harvard and a robust pipeline of venture capital investment. Host to the first telephone call and one of the first general-purpose computers ever put into use, Boston is now a hub for biotechnology, robotics and artificial intelligence — though it’s also home to several B2B software giants. So it’s no surprise that the city consistently ranks among the greatest startup ecosystems in the world.

Key Facts About Boston Tech

  • Number of Tech Workers: 269,000; 9.4% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Thermo Fisher Scientific, Toast, Klaviyo, HubSpot, DraftKings
  • Key Industries: Artificial intelligence, biotechnology, robotics, software, aerospace
  • Funding Landscape: $15.7 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Summit Partners, Volition Capital, Bain Capital Ventures, MassVentures, Highland Capital Partners
  • Research Centers and Universities: MIT, Harvard University, Boston College, Tufts University, Boston University, Northeastern University, Smithsonian Astrophysical Observatory, National Bureau of Economic Research, Broad Institute, Lowell Center for Space Science & Technology, National Emerging Infectious Diseases Laboratories

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account