• Spain

IKERLAN

Transactional Data Lake

Ikerlan, S. Coop.

Address: J.M. Arizmendiarrieta 2, 20500 Arrasate-Mondragon, Spain

Company size: >250 employees

 

Contact person & email: Jaione IriondoJiriondo@mondragon.edu

 

Field of traineeship: Engineering, manufacturing and construction

Level of study: Master

Language requirements: B2 / C1 English

Salary: Bachelor Part-Time (16-20h): 630,00€ / Bachelor Full-Time: 709,00€

 

Remote position:

About the company: IKERLAN is a technology center with a solid specialization in the development of embedded systems capable of operating in industrial environments under the compliance of regulations from different sectors (railway, automotive, aeronautical, etc.). For this purpose, IKERLAN has consolidated capabilities and methodologies for the design and development of high-performance hardware and software systems complying with non-functional technical requirements (EMC, electrical safety, environmental), sensor systems with energy autonomy, and reliable wireless and wired communication systems adapted to the requirements of industrial applications.

IKERLAN combines technology transfer activities (contract research), internal research and training of highly qualified personnel. It is currently the trusted technological partner of major companies in Spain and the Basque Country. It has a staff of over 400 people. Apart from contract research for industrial companies, IKERLAN also collaborates with first-class European research partners (both academic and private research actors) within the framework of Horizon 2020 and other competitive research programmes.

Job duties: In recent years, the use of data lakes has become more and more popular, especially in the field of data analytics. This is due to the benefits it brings in terms of data format flexibility, scalability, etc. However, data transactionality is one of the limitations of data lakes, so data consistency is a challenge to be solved. This is where tools such as Apache Hudi or Apache Iceberg come into play, which allow to record the transactions performed (modifications and/or deletions) and to always have a consistent view of the data within the Data Lake. In this project the objective is to investigate the use of these tools and apply them to a real industrial use case focused on pre-processing and formatting massive data for AI tasks.

Concretely, the main goals of the project are as follows:
• Analyze and study the tools for transactional Data Lakes (Apache Hudi and Apache Icerberg).
• Design and develop a transactional Data Lake with these tools.
• Implement the transactional Data Lake in an industrial demonstrator.

The phases of the project will include:
1. Training (learning the concepts related to the project).
2. Review of tools for transactional Data Lakes.
3. Implementation of the transactional Data Lake.
4. Implementation of a demonstrator in which the Data Lake is used for ML processes.
5. Report writing.

 

Specific qualifications: Computer Engineering with Python knowledge.

 

Additional information: