I made a training plan, I think it will be useful not only for me. The plan is focused on self-study courses. Priority is given to free courses in Russian.
Sections:
- Algorithms and data structures. Key section. If you study it, everything else will work out. It is important to get your hand in writing code and using basic structures and algorithms.
- Databases and data warehouses, Business Intelligence. We move from algorithms to data storage and processing.
- Hadoop and Big Data. When the database is not included in the hard drive, or when the data needs to be analyzed, but Excel can no longer load it, big data begins. In my opinion, the transition to this section is necessary only after a thorough study of the two previous ones.
Algorithms and data structures
In my plan, I included learning Python, repeating the basics of mathematics and algorithmization.
- Python Programming
- Python: basics and application
- Linear algebra
- Educational program in discrete mathematics
- Algorithms: theory and practice. Methods
- Algorithms: theory and practice. Data structures
Databases and Warehouses, Business Intelligence
- Book: Martin Kleppman - Highly loaded applications. Programming, scaling, support. The book describes how different data models work, their implementation from the inside out, limitations and choices depending on the task.
- Introduction to Databases
- DBMS Dive
- Introduction to non-relational databases
Topics related to building data warehouses, ETLs, OLAP cubes are highly dependent on tools, so I donโt give links to courses in this document. It is advisable to study such systems when working on a specific project in a particular company. To familiarize yourself with ETL, you can try Talend or Airflow .
In my opinion, it is important to study the modern methodology for designing data warehouses Data Vault link 1 , link 2 . And the best way to learn it is to take and implement it with a simple example. On GitHub there are several examples of implementing Data Vault link . Modern Data Warehouse Book: Modeling the Agile Data Warehouse with Data Vault by Hans Hultgren.
To get acquainted with Business Intelligence tools for end users, you can use the free designer of reports, dashboards, mini data warehouses Power BI Desktop. Training materials: link 1 , link 2 .
Hadoop and big data
- You need to start with an independent implementation of MapReduce without third-party libraries. This will help in the future to better understand multi-threaded implementations. A great Python example is described here .
- Hadoop. A system for processing large amounts of data.
- Introduction to Big Data Engineering
Conclusion
Not everything that you study turns out to be applied at work. Therefore, a graduation project is needed in which you try to apply new knowledge.
There are no topics related to data analysis and Machine Learning, as this applies more to the Data Scientist profession. Also, there are no topics related to AWS, Azure clouds. these topics are highly platform dependent.
Questions to the community:
How adequate is my pumping plan? What to remove or add?
What project do you recommend as a thesis?