Contract Type: Hourly (Remote)
Rate: $100–$120/hr
Advertisements
AI Task Evaluation & Statistical Analysis Specialist
Role Overview
We are looking for a highly analytical Data Scientist with expertise in statistical analysis, AI evaluation, and failure pattern detection to assess the performance of AI agents on finance-related tasks. This role focuses on uncovering performance breakdowns, identifying root causes, and improving the overall evaluation framework across multiple task dimensions.
Key Responsibilities
- Statistical Failure Analysis: Detect trends and recurring issues in AI agent failure modes across prompts, rubrics, templates, tags, and file types.
- Root Cause Identification: Determine whether errors originate from task design, rubric clarity, data or file complexity, or model-specific limitations.
- Multi-Dimensional Analysis: Evaluate performance variations across finance sub-domains, task categories, and diverse file formats.
- Reporting & Visualization: Develop insightful dashboards and reports that surface failure clusters, edge cases, and optimization opportunities.
- Quality Framework Improvements: Recommend data-driven enhancements to task structures, evaluation criteria, and rubric design.
- Stakeholder Collaboration: Communicate findings clearly to data labeling teams, product stakeholders, and technical partners.
Required Qualifications
- Statistical Expertise: Strong foundation in statistical modeling, hypothesis testing, and trend recognition.
- Programming Skills: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for deep-dive analysis.
- Data Analysis Experience: Skilled in exploratory data analysis (EDA) and generating actionable insights from complex datasets.
- AI/ML Knowledge: Familiarity with LLM evaluation methodologies, metrics, and quality frameworks.
- Tools: Comfortable using Excel, SQL, and visualization platforms such as Tableau or Looker.
Preferred Qualifications
- Hands-on experience with AI/ML model evaluation, quality assurance, or benchmarking frameworks
- Background in finance or willingness to learn key financial concepts
- Experience conducting multi-dimensional failure analysis
- Knowledge of industry evaluation datasets
- 2–4 years of relevant professional experience
Advertisements