Data Scientist or AI Developer: That's the question

 

Author: Klaus Puchner (Program Manager AI & Team Lead)

 

Presentation of the role of Data Scientist / AI Developer

In the last part of our mini blog series, we will take a look at the last role not yet presented: the Data Scientist. The media often call it the “sexiest job of the 21st century”: the Data Scientist plays a crucial role in creating value from data. This is absolutely true for our projects. However, there are more challenges to converting a great model into a successful AI product which in practice can be incorporated into existing business processes.

What tasks does a Data Scientist in the AI team have?

When starting off with our team, we also had a quite traditional idea of a Data Scientist. It focused mainly on data acquisition, EDA (exploratory data analysis), feature engineering as well as model selection, training, possibly hyperparameter tuning and evaluation with a data science language of one’s choice (R or Python).

In our AI team, a Data Scientist has the possibility to apply the best solution for a task and the conditions related to it without any restrictions. If, for example, few data are available, they can resort to classical machine learning models (in one project, for example, we tested more than 100 models). If a large amount of data is available, deep learning can also be applied.

 
 

Being able to leverage an AI model into business value is a vital challenge.

 
 

When we did our first two projects, we quickly realised that we needed more than just a model with a good predictive power. In the following we present what we take from this insight.

How does the Data Scientist ensure that their model is used in practice?

As already mentioned, a model only creates value in practice once it is actually integrated into business processes. So it must be possible to integrate a model into existing system architectures and software products in the company without a lot of effort. For this reason, we decided to build AI features as microservices for a flexible approach.

This approach made it necessary to include development tasks in our Data Scientists’ responsibilities. In addition to their classical tasks, every Data Scientist now had to be able to create an API (PlumbeR or FastAPI), using their primary data science language (R or Python), that allows for easy interaction with their model and to make it portable in a docker container.

 
 

Our integration solution: Provide AI models in the form of micro-services.

 
 

With this knowledge, our Data Scientists are not only able to provide fully trained models now, but they can also transfer all required steps in individual tasks to automatable pipelines. This way, it is possible to automatically recognise individual steps like model retraining or data recognition as well as concept drifts. 

It is important to mention that Data Scientists who are new to our team are specifically trained in these technologies during their onboarding process.

How would a Data Scientist describe their work in the AI team?

Let’s ask Daniel. Daniel is a Data Scientist in the AI team. He has witnessed the development of this role in the team from the very beginning. In the following statement you can read about his thoughts on it:

“One of the best aspects of being a Data Scientist in the AI team is the privilege of having a lot of freedom to decide on how I organise my work. Once a project goal has been sufficiently specified, it is up to us Data Scientists to solve a certain machine learning problem with the required expertise and a lot of creativity. I especially like the research phase which is scheduled at the beginning of each of our projects. It gives me the chance to continually extend my expertise in many areas such as computer vision and natural language processing (NLP), but also regarding classical machine learning models.

I also appreciate the regular exchange of knowledge within our team as well as the professional cooperation with all colleagues as they always provide helpful advice and honest feedback on one’s projects. Exchanging our views is a lot of fun and everyone’s opinion is appreciated. The technical aspect of my work is also something I enjoy a lot. We constantly work on automating as many process steps as possible via Kubeflow Pipelines, Docker, Google Cloud Platform and on our deep learning rig (codename Rick – and yes, there’s also a Morty). It always gives me a good feeling to see how different components (different programming languages) work together harmoniously and become part of the big picture.”

Curious?

We still have many aspirations. That’s why we are looking for people to support us with their personalities and skills as well as their courage to learn new things and their motivation to help shape the future with us. We look forward to getting to know you during a personal interview.

More interesting articles

You also want to read the other parts of this AI mini-series? Here’s the article overview for you:

 

* Die deutsche Version findest du hier.