Job Description
Company Overview:
JR Software Solutions is looking for an experienced and innovative Azure Open AI and Gen AI Architect to lead the development of our AI capabilities. Join us in shaping cutting-edge AI infrastructure and collaborating with our team of researchers and engineers.
Responsibilities:
- AI Infrastructure Development: Lead initiatives in building large-scale distributed training clusters, deploying LLMs on GPU instances, supporting AI research, and enhancing decisioning systems in our public cloud infrastructure.
- MLOps and Azure Services Integration: Drive MLOps practices by leveraging Azure ML, Databricks ML, Cognitive Services, and other Azure-native tools to streamline AI development workflows, ensuring scalability, reliability, and efficiency.
- Collaboration and Implementation: Work closely with cloud and container infrastructure teams, alongside AI researchers, to design and implement advanced AI capabilities.
Project Examples:
- Deploy a thousand-node training cluster optimizing storage and networking in the public cloud.
- Design fault-tolerant infrastructure for large-scale training tasks using containers and checkpointing libraries.
- Develop run-time infrastructure for serving large ML models like LLMs and FMs in our public cloud.
- Create infrastructure for deploying search indexes and embeddings in vector databases, integrated with our existing capabilities.
Basic Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, or a technical field.
- Minimum of 8 years of experience designing and building data-intensive solutions using distributed computing.
- At least 4 years of experience with HPCs, vector embedding, or semantic search technologies.
- Minimum of 4 years of programming experience with Python, Go, Scala, or Java.
- At least 3 years of experience building, scaling, and optimizing training and inferencing systems for deep neural networks.
Preferred Qualifications:
- Master's or Doctoral degree in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, or similar.
- Expertise in MLOps practices, leveraging tools like Azure ML, Databricks ML, and Cognitive Services for AI development.
- Proficiency in machine learning frameworks like TensorFlow, PyTorch, Lightning, or Mosaic ML.
- Ability to navigate ambiguous environments, iterate rapidly with researchers and engineers, and prioritize effectively in a fast-paced tech-driven atmosphere.
- Experience deploying large neural network models in demanding production environments.
- Knowledge in building GPU clusters in the public cloud with tightly-coupled storage and networking.