Sehr geehrter Herr Hübner,
ich bin auf der Suche nach Informatikstudenten als Praktikanten, die nachher evtl auch ihre Bachelor-/Masterarbeit bei uns schreiben können. Falls die unten beschriebenen Themen Ihren Studenten zusprechen, leiten Sie sie bitte weiter!
Die aktuellen Themen sind:
Topic name: License Management Automatization with KNIME Topic description:
The task is to identify “relevant” entries within a pool of > 100.000 software registration entries from worldwide Conti devices. From a license management viewpoint we are interested in entries where the software has to be licensed, and that are widely used. The challenge is 1) that there are few defined rules how to identify what is relevant, the judgement is mostly based on experience, and 2) that the data is not normalized (e.g. including Chinese characters). The proposed solution is to use KNIME text processing capabilities to normalize the data, and to derive “fuzzy” rules from those software registration entries that are already categorized (datasets exist for relevant and not relevant entries). Basic knowledge of KNIME is needed, there is good documentation/learning for the text processing features on the KNIME website. Skills needed: KNIME / Textmining
-----------------------------------------------------------------------------------------------------------------------------------------------
Topic name: Consolidate monitoring in the Continental.datalake Topic description:
The Continental.datalake is an AWS cloud based infrastructure solution within Continental. Besides storage it serves a wide range of infrastructure components to internal customers from business and it. The tech stack involves "leading edge" technologies like AWS, Terraform, Serverless, Kubernetes, Prometheus. Our workflow includes Scrum with weekly sprints utilizing Jira, Confluence and modern communication tools.
* The Continental.datalake currently has a couple of different monitoring solutions in place, which involve Cloud Watch alters, Cloud Watch Dashboards, Prometheus for Kubernetes and Grafana. * The Continetal.datalake currently has no (only a very limited) "status" page to visualize the status of the Continental.datalake servies. * TODO * Consolidate and improve the different solutions. * Maybe create a concept with a central monitoring solution. * Improve the status page and create a front to back fully automated monitoring solution.
Skills needed: Infrastructure
-----------------------------------------------------------------------------------------------------------------------------------------------
Topic name: EMR scheduling (EMR is Hadoop cluster on AWS) Topic description:
* Users want to schedules clusters as EMR jobs in the datalake. Currently user can spin up cluster via click but cannot define a certain point in time when a workload should start. * We have a similar solution in place for appliances which should be extended for EMR jobs * TODO: * Create a list of requirements based on demand by business customers. * Evaluate a "self written solution" with an integration with Kubeflow Components. * Extend backend API to create Cloudwatch events that handle the scheduling process * Add EMR information in DynamoDB * Refactor frontend to mange EMR jobs
Skills needed:
* Python (boto3), Javascript (VueJS, VUEX, Vuetify) * Serverless Framework, AWS (DynamoDB. EMR, Cloudwatch) * Jenkins, GIT
-----------------------------------------------------------------------------------------------------------------------------------------------
Topic name: Implement a datalake Credential Store
Topic description:
* One major problem of the Continental.datalake is that there is no proper way for handling credentials except writing them into an .env file AWS offers a solution here with the AWS Parameter store: https://docs.aws.amazon.com/systemsmanager/latest/userguide/systems-manager-... * TODO: * Extend our backend with an API to create / update / delete / read parameters (CRUD) * Extend DL Frontend with a user interface to handle the complete parameter lifecycle Ensure that a user only has access to his parameters Evaluate and distribute those parameters to every datalake service
Skills needed:
* Python (boto3), Javascript (VueJS, VUEX, Vuetify) Serverless Framework, AWS (IAM. SSM) Jenkins, GIT
-----------------------------------------------------------------------------------------------------------------------------------------------
Topic name: S3 Fileuploader
Topic description:
* Datalake file browser The Continental.datalake uses AWS S3 as object storage for data * Users have different options for injecting data into S3. The easiest one here is to use the self build file browser. * In his current form this file browser is very basic and should be enhanced. The task here is to have a fully functional s3 file browser embedded into our frontend. e.g: https://s3browser.com/ * TODO: * Remove our old implementation of a file browser Redesign a new frontend and a implementation logic
Skills needed:
* Python (boto3), Javascript (VueJS, VUEX, VUE Router, Vuetify) Jenkins, GIT
Could also be a thesis? maybe
-----------------------------------------------------------------------------------------------------------------------------------------------
Topic name: Jenkins replacement
Topic description:
* Continental.datalake uses Jenkins as CI/CD solution. * The full flavored feature set of Jenkins is not really required. Jenkins comes with lots of problems when operating Jenkins. * TODO: * Create list of requirements for a CI/CD solution for the Continental.datalake Evaluate different solutions, preferred GitHub Actions on our internal GitHub Enterprise Server. Migrate to this new solution.
Skills needed:
* Jenkins, GIT, CI/CD Knowledge, Linux, Docker
-----------------------------------------------------------------------------------------------------------------------------------------------
Topic name: Compare and integrate Sagemaker in Kubeflow
Topic description:
* Continental operates a MLOps platform based on Kubeflow * SageMaker is a (in parts) a competing service * Parts of Sagemaker (training) can also be integrated in Kubeflow * TODO: * Evaluate SageMaker and compare it to Kubeflow * Evaluate how to integrate SageMaker in Kubeflow * Write a Kubeflow component to start a Sagemaker job
Skills needed:
* Docker, Kubernetes, Yaml, Python, AWS, Linux
Vielen lieben Dank!
Mit freundlichen Grüßen/Best regards, Orsolya Rappne Pogany Data Scientist
Group Functions IT / DAP Continental AG Vahrenwalder Str. 9, 30165 Hannover, Germany Phone: +49 511 938 - 13486 Mobile :+49 151 74616236 E-Mail: mailto:orsolya.rappne.pogany@conti.de orsolya.rappne.pogany@conti.de http://www.continental-corporation.com/ http://www.continental-corporation.com
https://www.continental.com ________________________________________________________________________
Continental Aktiengesellschaft, Postfach/Postbox 1 69, D-30001 Hannover Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Prof. Dr.-Ing. Wolfgang Reitzle Vorstand/Executive Board: Nikolai Setzer (Vorsitzender/Chairman), Frank Jourdan, Christian Kötz, Helmut Matschi, Philip Nelles, Dr. Ariane Reinhart, Wolfgang Schäfer, Andreas Wolf Sitz der Gesellschaft/Registered Office: Hannover Registergericht/Registered Court: Amtsgericht Hannover, HRB 3527, USt.-ID-Nr./VAT-ID-No. DE 115645799 ________________________________________________________________________
Proprietary and confidential. Distribution only by express authority of Continental AG or its subsidiaries.