AI Model Evaluator (LLM & Agent Systems)

Work from home Full-time role Hiring

Job Title: AI Model Evaluator (LLM & Agent Systems) Job Type: Contract (Minimum 2 weeks, with potential extension) Location: Remote Job Summary: Join our customer's team as an AI Model Evaluator (LLM & Agent Systems) and play a pivotal role in shaping the future of generative AI and autonomous agents. You'll help benchmark, analyze, and assess cutting-edge AI systems in real-world scenarios, providing structured insights that drive improvements. This position is ideal for analytical professionals passionate about AI quality and real-world impact. Key Responsibilities:

Evaluate outputs from large language models (LLMs) and autonomous agent systems against defined guidelines and rubrics
Review multi-step agent actions, including screenshots and reasoning traces, to determine accuracy and quality
Consistently apply evaluation standards, flagging edge cases and identifying recurring patterns or failure modes
Provide detailed, structured feedback to inform benchmarking, product evolution, and model refinement
Participate in calibration and alignment sessions to ensure consistent application of evaluation criteria
Work collaboratively to adapt to evolving scenarios and ambiguous evaluation situations
Document findings and communicate insights clearly both in writing and verbally to relevant stakeholders

Required Skills and Qualifications:

Demonstrated experience with LLM evaluation, AI output analysis, QA/testing, UX research, or similar analytical roles
Strong background in AI model evaluation, benchmarking, and applying rubric-based scoring frameworks
Exceptional attention to detail and sound judgement in ambiguous or edge-case scenarios
Proficiency in English (B2+ or equivalent) with excellent written and verbal communication skills
Ability to adapt quickly to evolving guidelines and work independently
Comfort with remote work and a commitment of at least 20 hours per week for the initial term
Analytical mindset with a focus on actionable, qualitative feedback

Preferred Qualifications:

Experience with RLHF, annotation workflows, or AI benchmarking frameworks
Familiarity with autonomous agent systems or workflow automation tools
Background in mobile apps or digital product evaluation processes

Required Skills

LLMs
Generative AI
AI Model Evaluation
AI Benchmarking
AI Quality Assessment
Model Performance Evaluation
Prompt Response Evaluation
AI Output Analysis
Rubric-Based Scoring

Apply Now

AI Model Evaluator (LLM & Agent Systems)

More open positions

Illinois‐Licensed School‐ Clinical Evaluator – Remote, Weekday x

Video Evaluator | $34/hr Remote

Evaluator - Pittsburgh area

Looking for Product Owner - ServiceNow Remote Job : W2 Candidates Only !!

Product Owner - Contract

Retirement Services Customer Support Specialist – Remote, Bilingual Preferred, Financial Solutions & Client Success

Product Manager – Regulatory Data

[Remote] Growth Marketing Lead

SENIOR INTERNAL AUDITOR (REMOTE) (CHARLOTTE, NC, US, 28217-4511)

Customer Team Leader (District Sales Manager), Cardiovascular Disease - West North Carolina District

Korepetytor online Unity

Director, Regional Marketing job at Huntress Labs in US National

Remote Customer Support Specialist – Multi‑Timezone Phone Outreach, Lead Conversion, CRM Management & Technical Assistance (Work‑From‑Home, $25‑$50/hr)

Join the BEST, be the BEST: Junior Technical Consultant for AI-driven data capture and digital archive. APPLY TODAY!

[Remote] Human Resource Generalist

Agent Damage (m/f/d) - Contrato de Interinidad - 100% Remoto

Service Reliability Engineer

Game Designer, Sr.

Experienced Strategic Customer Success Manager – Media Technology and Entertainment

Comic Illustrator Job at New Heights Educational Group in Sherwood

Working Student Data Science Forecasting (m/f/d)