🎉 Thanks for watching! Content unlocked for this session.
Advertisements

Data Scientist


Contract Type: Hourly (Remote)
Rate: $100–$120/hr

Advertisements

AI Task Evaluation & Statistical Analysis Specialist

Role Overview

We are looking for a highly analytical Data Scientist with expertise in statistical analysis, AI evaluation, and failure pattern detection to assess the performance of AI agents on finance-related tasks. This role focuses on uncovering performance breakdowns, identifying root causes, and improving the overall evaluation framework across multiple task dimensions.

Key Responsibilities

  • Statistical Failure Analysis: Detect trends and recurring issues in AI agent failure modes across prompts, rubrics, templates, tags, and file types.
  • Root Cause Identification: Determine whether errors originate from task design, rubric clarity, data or file complexity, or model-specific limitations.
  • Multi-Dimensional Analysis: Evaluate performance variations across finance sub-domains, task categories, and diverse file formats.
  • Reporting & Visualization: Develop insightful dashboards and reports that surface failure clusters, edge cases, and optimization opportunities.
  • Quality Framework Improvements: Recommend data-driven enhancements to task structures, evaluation criteria, and rubric design.
  • Stakeholder Collaboration: Communicate findings clearly to data labeling teams, product stakeholders, and technical partners.

Required Qualifications

  • Statistical Expertise: Strong foundation in statistical modeling, hypothesis testing, and trend recognition.
  • Programming Skills: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for deep-dive analysis.
  • Data Analysis Experience: Skilled in exploratory data analysis (EDA) and generating actionable insights from complex datasets.
  • AI/ML Knowledge: Familiarity with LLM evaluation methodologies, metrics, and quality frameworks.
  • Tools: Comfortable using Excel, SQL, and visualization platforms such as Tableau or Looker.

Preferred Qualifications

  • Hands-on experience with AI/ML model evaluation, quality assurance, or benchmarking frameworks
  • Background in finance or willingness to learn key financial concepts
  • Experience conducting multi-dimensional failure analysis
  • Knowledge of industry evaluation datasets
  • 2–4 years of relevant professional experience

If you’re interested in this role, check out more information on the official website And apply now!

Advertisements

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top