Responsabilities:
• Design and execute test strategies for machine learning models (classification, regression, NLP, CV, etc.).
• Validate model accuracy, fairness, bias detection, explainability, and robustness under varying datasets.
• Perform adversarial testing and edge-case validation for AI systems.
• Test autonomous agents, multi-step reasoning paths, and state transitions.
• Validate correctness, grounding, consistency, and safety of LLM outputs.
• Evaluate prompt robustness and behavioral variations across scenarios.
• Validate retrieval accuracy, grounding quality, and hallucination reduction.
• Test vector store behavior, document chunking logic, and retriever configurations.
• Validate fallback behaviors when tools or external services fail.
• Execute adversarial, prompt-injection, and red team style testing.
• Validate compliance with data privacy, business rules, and insurance guidelines.
• Benchmark LLM latency, throughput, and multi-agent performance.
• Validate concurrency handling, degradation logic, and retry mechanisms.
• Document test cases, behavior maps, prompt variations, and evaluation reports.
• Work closely with AI engineers and SMEs to refine agent workflows.
• Functional & Non-Functional Testing:
o Conduct API testing for AI services and model endpoints.
o Validate performance, scalability, and latency of AI inference services.
o Perform security testing for data pipelines and model-serving endpoints.
• Quality Governance & Compliance:
o Ensure compliance with AI ethics, data privacy, and regulatory standards.
o Establish KPIs for model quality (precision, recall, F1-score, drift detection).
o Document test strategies, results, and provide audit-ready evidence.
Collaboration & Leadership:
o Work closely with data scientists, ML engineers, and DevOps teams to integrate testing into the AI lifecycle.
o Mentor junior testers and contribute to building reusable test accelerators.
o Represent the testing function in client discussions, providing insights on AI quality assurance.
• Develop automated test frameworks for AI/ML pipelines.
• Implement synthetic data generation for edge-case testing.
• Conduct bias and fairness testing across diverse datasets.
• Validate monitoring dashboards for model drift and performance degradation.
• Perform regression testing for retrained models.
• Create reusable test assets for AI projects (scripts, datasets, frameworks).
• Provide detailed defect analysis and SLA-driven closure.