Job Description
We’re looking for an experienced Data Scientist to join our product development team in creating cutting edge AI-driven assessment tools. DDI is a global leadership consulting firm that helps organizations hire, promote, and develop exceptional leaders. We are seeking a highly skilled and motivated Data Scientist with expertise in Natural Language Processing and experience in data infrastructure (e.g., Azure and Databricks).
In this role, you will support the product development team led by our principal scientist in developing NLP algorithms and models for our new products. You will also support our developers in launching these products. You will build and maintain data, code, and model pipelines for responsible and reproducible machine learning operations. Finally, you will be expected to support and contribute to scientific publications as well.
This work requires technical skills as well as creativity and curiosity for working with behavioral data and learning about psychometrics. Experience in data infrastructure and the application of machine learning models (specifically NLP and LLMs) is a must. Come join our success story and help us create a truly innovative assessment experience!
Key Responsibilities:
- Data Pipeline Development: Design, develop, and maintain data pipelines for collecting, processing, and storing large volumes of structured and unstructured data.
- Python Coding: Utilize Python to write efficient and scalable code for data transformation, data integration, and ETL processes.
- Machine Learning: Knowledge and application of machine learning fundamentals, including supervised learning, neural networks, and deep learning architectures.
- Natural Language Processing: Application of Natural Language Processing (NLP) techniques, including tokenization, word embeddings, sequence-to-sequence models, and attention mechanisms.
- Large Language Models (LLMs): Working with different cutting-edge LLMs (e.g., GPT; LLaMA; Claude; MPT) as well as knowledge and skills in prompt engineering, quantizing, and fine-tuning LLMs.
- Data Modeling and Architecture: Design and implement data models that align with business requirements and promote data consistency and quality. Familiarity with state-of-the-art deep learning architectures for NLP.
- GPU/TPU Programming: Knowledge and application of GPU/TPU programming (e.g., CUDA) to accelerate model training.
- Hyperparameter Tuning: Skill in optimizing hyperparameters and model architectures to achieve the best performance.
- Model Evaluation: Expertise in evaluation metrics for NLP tasks and the ability to design appropriate evaluation procedures.
- Software Engineering: Proficient software engineering skills for developing scalable and maintainable code, and version control (e.g., Git).
- DevOps and Deployment: Knowledge of deploying models in production, containerization (e.g., Docker), and continuous integration/continuous deployment (CI/CD) pipelines.
- Azure Expertise: Utilize Azure services, such as Azure Data Factory, Azure Databricks, Azure SQL Data Warehouse, and Azure Storage, to architect and deploy data solutions.
- Databricks Experience: Work with Databricks to process and analyze data, optimize performance, and troubleshoot issues.
- Data Integration: Collaborate with cross-functional teams to ensure seamless integration of data from various sources into the data platform.
- Data Quality Assurance: Develop and implement data quality checks and validation processes to maintain data accuracy and integrity.
- Scalability and Performance: Optimize data pipelines and infrastructure for scalability, reliability, and performance.
- Documentation: Create and maintain documentation for data pipelines, infrastructure configurations, and best practices.
- Monitoring and Troubleshooting: Skills in monitoring model performance in production, handling model drift, and maintaining the model over time with regular updates and improvements.
- Security and Compliance: Ensure data security, privacy, and compliance with relevant regulations and company policies.
- Continuous Learning: A commitment to continuous learning and staying updated with the latest research in LLMs and their supporting technologies is required.
- Thought Leadership: Keeping up with Generative AI innovations and models. Interest in publishing peer reviewed research. Organizing upskilling sessions for the development of other scientists.
- Master's degree in computer science, data science, data engineering, or a related field.
- Proven experience as a Data Scientist, with a strong focus on Python coding.
- Experience with Azure services, particularly Azure Data Factory, Azure Databricks, and Azure SQL Data Warehouse.
- Strong understanding of data science and engineering best practices, data integration, ETL processes, and data warehousing concepts.
- Proficiency in data modeling and database design.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration abilities.
- Certification in Azure and Databricks is a plus.
Salary: The anticipated hiring range for this position is listed below.
Variable Pay: 10% of Salary
The exact compensation offered will vary based on skills, experience, and geographic location.
Benefits: Click here for an overview of the benefits DDI offers.