{"id":86905,"date":"2025-05-11T21:57:00","date_gmt":"2025-05-11T14:57:00","guid":{"rendered":"https:\/\/itviec1.uptech.vn\/?p=86905"},"modified":"2025-05-11T21:57:00","modified_gmt":"2025-05-11T14:57:00","slug":"cau-hoi-phong-van-data-engineer","status":"publish","type":"post","link":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/","title":{"rendered":"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">N\u1ed9i dung b\u00e0i vi\u1ebft<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Cau_hoi_phong_van_Data_Engineer_ve_Tong_quan_Kien_thuc_nen_tang_CSDL\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 T\u1ed5ng quan &amp; Ki\u1ebfn th\u1ee9c n\u1ec1n t\u1ea3ng CSDL<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Cau_hoi_phong_van_Data_Engineer_ve_Data_Architecture_ETLELT\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Data Architecture &amp; ETL\/ELT<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Cau_hoi_phong_van_Data_Engineer_ve_Big_Data_Streaming\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Big Data &amp; Streaming<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Cau_hoi_phong_van_Data_Engineer_ve_Cloud_Workflow_Orchestration\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Cloud &amp; Workflow Orchestration<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Cau_hoi_phong_van_Data_Engineer_ve_Lap_trinh_Data_Quality\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 L\u1eadp tr\u00ecnh &amp; Data Quality<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Cau_hoi_phong_van_Data_Engineer_ve_Kinh_nghiem_thuc_chien_Ky_nang_coding\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Kinh nghi\u1ec7m th\u1ef1c chi\u1ebfn &amp; K\u1ef9 n\u0103ng coding<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#Tong_ket\" >T\u1ed5ng k\u1ebft<\/a><\/li><\/ul><\/nav><\/div>\n<p><em><strong>Theo b\u00e1o c\u00e1o c\u1ee7a Dice 2025, nhu c\u1ea7u tuy\u1ec3n d\u1ee5ng Data Engineer t\u1ea1i \u0110\u00f4ng Nam \u00c1 t\u0103ng h\u01a1n 40\u202f% m\u1ed7i n\u0103m, v\u01b0\u1ee3t xa Data Analyst v\u00e0 ti\u1ec7m c\u1eadn Software Engineer. Doanh nghi\u1ec7p hi\u1ec3u r\u1eb1ng m\u00f4 h\u00ecnh AI\/BI d\u00f9 \u0111\u1eaft ti\u1ec1n c\u0169ng s\u1ebd v\u00f4 ngh\u0129a n\u1ebfu n\u1ec1n m\u00f3ng d\u1eef li\u1ec7u b\u1ea9n, ph\u00e2n m\u1ea3nh v\u00e0 kh\u00f3 m\u1edf r\u1ed9ng &#8211; v\u00e0 \u0111\u00f3 c\u0169ng ch\u00ednh l\u00e0 l\u00fac Data Engineer tr\u1edf th\u00e0nh \u201cng\u01b0\u1eddi h\u00f9ng th\u1ea7m l\u1eb7ng\u201d c\u1ee7a m\u1ecdi t\u1ed5 ch\u1ee9c h\u01b0\u1edbng d\u1eef li\u1ec7u<\/strong><strong>. <\/strong><strong>V\u1eady n\u1ebfu b\u1ea1n \u0111ang chu\u1ea9n b\u1ecb cho bu\u1ed5i ph\u1ecfng v\u1ea5n Data Engineer s\u1eafp t\u1edbi, sau \u0111\u00e2y l\u00e0 c\u00e1c nh\u00f3m c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer b\u1ea1n c\u1ea7n chu\u1ea9n b\u1ecb \u0111\u1ec3 c\u00f3 m\u1ed9t bu\u1ed5i ph\u1ecfng v\u1ea5n t\u00f3t h\u01a1n!<\/strong><\/em><\/p>\n<p><span style=\"font-weight: 400;\">Th\u1ecb tr\u01b0\u1eddng Vi\u1ec7t Nam b\u1eaft \u0111\u1ea7u kh\u1eaft khe kh\u00f4ng k\u00e9m Singapore hay \u1ea4n \u0110\u1ed9: \u1ee9ng vi\u00ean <\/span>Data Engineer<span style=\"font-weight: 400;\"> ph\u1ea3i \u201cv\u01b0\u1ee3t \u1ea3i\u201d HackerRank\/LeetCode, b\u00e0i SQL n\u00e2ng cao, h\u1ec7 th\u1ed1ng thi\u1ebft k\u1ebf data warehouse tr\u00ean b\u1ea3ng tr\u1eafng, th\u1eadm ch\u00ed live\u2011coding PySpark ho\u1eb7c Airflow DAG. Chu\u1ea9n b\u1ecb k\u1ef9 l\u01b0\u1ee1ng kh\u00f4ng ch\u1ec9 gi\u00fap b\u1ea1n t\u1ef1 tin, m\u00e0 c\u00f2n th\u1ec3 hi\u1ec7n t\u01b0 duy h\u1ec7 th\u1ed1ng &#8211; y\u1ebfu t\u1ed1 nh\u00e0 tuy\u1ec3n d\u1ee5ng \u0111\u00e1nh gi\u00e1 cao h\u01a1n b\u1ea5t k\u1ef3 ch\u1ee9ng ch\u1ec9 \u0111\u00e1m m\u00e2y n\u00e0o.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D\u01b0\u1edbi \u0111\u00e2y l\u00e0 tuy\u1ec3n ch\u1ecdn nh\u1eefng c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer th\u01b0\u1eddng g\u1eb7p, \u0111i k\u00e8m g\u1ee3i \u00fd tr\u1ea3 l\u1eddi ng\u1eafn g\u1ecdn v\u00e0 s\u00e1t th\u1ef1c t\u1ebf, gi\u00fap b\u1ea1n t\u1ef1 tin th\u1ec3 hi\u1ec7n c\u1ea3 ki\u1ebfn th\u1ee9c l\u1eabn t\u01b0 duy h\u1ec7 th\u1ed1ng c\u1ee7a m\u00ecnh <\/span><span style=\"font-weight: 400;\">qua 6\u202fnh\u00f3m ch\u1ee7 \u0111\u1ec1 &#8211; c\u0169ng l\u00e0 6\u202fc\u1ee5m c\u00e2u h\u1ecfi th\u01b0\u1eddng g\u1eb7p nh\u1ea5t:<\/span><\/p>\n<ul>\n<li>Nh\u00f3m 1: T\u1ed5ng quan &amp; Ki\u1ebfn th\u1ee9c n\u1ec1n t\u1ea3ng CSDL<\/li>\n<li>Nh\u00f3m 2: Data Architecture &amp; ETL\/ELT<\/li>\n<li>Nh\u00f3m 3: Big Data &amp; Streaming<\/li>\n<li>Nh\u00f3m 4: Cloud &amp; Workflow Orchestration<\/li>\n<li>Nh\u00f3m 5: L\u1eadp tr\u00ecnh &amp; Data Quality<\/li>\n<li>Nh\u00f3m 6: Kinh nghi\u1ec7m th\u1ef1c chi\u1ebfn &amp; K\u1ef9 n\u0103ng coding<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Data_Engineer_ve_Tong_quan_Kien_thuc_nen_tang_CSDL\"><\/span><b>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 T\u1ed5ng quan &amp; Ki\u1ebfn th\u1ee9c n\u1ec1n t\u1ea3ng CSDL<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">\u0110\u00e2y l\u00e0 ph\u1ea7n gi\u00fap b\u1ea1n th\u1ec3 hi\u1ec7n ki\u1ebfn th\u1ee9c c\u0103n b\u1ea3n v\u1ec1 vai tr\u00f2 c\u1ee7a Data Engineer, m\u1ed1i quan h\u1ec7 v\u1edbi c\u00e1c v\u1ecb tr\u00ed kh\u00e1c trong team d\u1eef li\u1ec7u, v\u00e0 k\u1ef9 n\u0103ng x\u1eed l\u00fd d\u1eef li\u1ec7u truy\u1ec1n th\u1ed1ng nh\u01b0 SQL hay ki\u1ebfn tr\u00fac h\u1ec7 th\u1ed1ng. Nh\u00e0 tuy\u1ec3n d\u1ee5ng th\u01b0\u1eddng nh\u00ecn v\u00e0o nh\u00f3m n\u00e0y \u0111\u1ec3 \u0111\u00e1nh gi\u00e1 t\u01b0 duy n\u1ec1n t\u1ea3ng v\u00e0 kh\u1ea3 n\u0103ng giao ti\u1ebfp k\u1ef9 thu\u1eadt c\u1ee7a b\u1ea1n.<\/span><\/p>\n<h3><b> B\u1ea1n hi\u1ec3u Data Engineer \u0111\u00f3ng vai tr\u00f2 g\u00ec trong chu\u1ed7i gi\u00e1 tr\u1ecb d\u1eef li\u1ec7u c\u1ee7a doanh nghi\u1ec7p?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Data Engineer ch\u1ecbu tr\u00e1ch nhi\u1ec7m x\u00e2y d\u1ef1ng v\u00e0 duy tr\u00ec h\u1ec7 th\u1ed1ng x\u1eed l\u00fd d\u1eef li\u1ec7u, gi\u00fap chuy\u1ec3n \u0111\u1ed5i d\u1eef li\u1ec7u th\u00f4 th\u00e0nh d\u1eef li\u1ec7u s\u1ea1ch, c\u00f3 c\u1ea5u tr\u00fac \u0111\u1ec3 cung c\u1ea5p cho Data Scientist, BI Analyst, h\u1ed7 tr\u1ee3 quy\u1ebft \u0111\u1ecbnh kinh doanh.<\/span><\/p>\n<h3><b>B\u1ea1n th\u01b0\u1eddng ph\u1ed1i h\u1ee3p v\u1edbi Data Scientist, BI Analyst, hay DevOps nh\u01b0 th\u1ebf n\u00e0o trong m\u1ed9t d\u1ef1 \u00e1n d\u1eef li\u1ec7u?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">V\u1edbi Data Scientist: cung c\u1ea5p d\u1eef li\u1ec7u \u0111\u00e3 \u0111\u01b0\u1ee3c l\u00e0m s\u1ea1ch, chu\u1ea9n h\u00f3a v\u00e0 t\u1ed5 ch\u1ee9c h\u1ee3p l\u00fd \u0111\u1ec3 ph\u00e2n t\u00edch v\u00e0 hu\u1ea5n luy\u1ec7n m\u00f4 h\u00ecnh.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">V\u1edbi BI Analyst: chu\u1ea9n b\u1ecb d\u1eef li\u1ec7u t\u1ed1i \u01b0u h\u00f3a cho c\u00e1c b\u00e1o c\u00e1o, dashboard.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">V\u1edbi DevOps: \u0111\u1ea3m b\u1ea3o tri\u1ec3n khai pipeline d\u1eef li\u1ec7u t\u1ef1 \u0111\u1ed9ng, hi\u1ec7u qu\u1ea3, \u1ed5n \u0111\u1ecbnh v\u00e0 b\u1ea3o m\u1eadt.<\/span><\/li>\n<\/ul>\n<h3><b> Khi \u0111\u01b0\u1ee3c giao thi\u1ebft k\u1ebf m\u1ed9t pipeline d\u1eef li\u1ec7u m\u1edbi, quy tr\u00ecnh b\u1ea1n ti\u1ebfp c\u1eadn v\u00e0 ph\u00e2n t\u00edch y\u00eau c\u1ea7u ra sao?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">X\u00e1c \u0111\u1ecbnh r\u00f5 y\u00eau c\u1ea7u nghi\u1ec7p v\u1ee5 v\u00e0 m\u1ee5c ti\u00eau c\u1ea7n \u0111\u1ea1t.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Thu th\u1eadp th\u00f4ng tin ngu\u1ed3n d\u1eef li\u1ec7u (d\u1ea1ng d\u1eef li\u1ec7u, \u0111\u1ecbnh d\u1ea1ng, kh\u1ed1i l\u01b0\u1ee3ng).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Ch\u1ecdn c\u00f4ng c\u1ee5 v\u00e0 ki\u1ebfn tr\u00fac ph\u00f9 h\u1ee3p (batch ho\u1eb7c streaming).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Thi\u1ebft k\u1ebf m\u00f4 h\u00ecnh l\u01b0u tr\u1eef v\u00e0 x\u1eed l\u00fd (schema, partitioning, indexing).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Tri\u1ec3n khai ki\u1ec3m th\u1eed, \u0111\u00e1nh gi\u00e1 hi\u1ec7u n\u0103ng v\u00e0 b\u1ea3o tr\u00ec h\u1ec7 th\u1ed1ng.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n c\u00f3 th\u1ec3 ph\u00e2n bi\u1ec7t ng\u1eafn g\u1ecdn OLTP (Online Transaction Processing) v\u00e0 OLAP (Online Analytical Processing)?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">OLTP (Online Transaction Processing) l\u00e0 h\u1ec7 th\u1ed1ng x\u1eed l\u00fd giao d\u1ecbch tr\u1ef1c tuy\u1ebfn, chuy\u00ean d\u00f9ng \u0111\u1ec3 qu\u1ea3n l\u00fd c\u00e1c giao d\u1ecbch th\u01b0\u1eddng xuy\u00ean nh\u01b0 th\u00eam, s\u1eeda, xo\u00e1 d\u1eef li\u1ec7u trong c\u01a1 s\u1edf d\u1eef li\u1ec7u &#8211; v\u00ed d\u1ee5 nh\u01b0 h\u1ec7 th\u1ed1ng ng\u00e2n h\u00e0ng ho\u1eb7c b\u00e1n h\u00e0ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">OLAP (Online Analytical Processing) l\u00e0 h\u1ec7 th\u1ed1ng x\u1eed l\u00fd ph\u00e2n t\u00edch tr\u1ef1c tuy\u1ebfn, \u0111\u01b0\u1ee3c thi\u1ebft k\u1ebf \u0111\u1ec3 h\u1ed7 tr\u1ee3 truy v\u1ea5n ph\u00e2n t\u00edch ph\u1ee9c t\u1ea1p, gi\u00fap t\u1ed5ng h\u1ee3p v\u00e0 ph\u00e2n t\u00edch d\u1eef li\u1ec7u l\u1edbn nh\u1eb1m ph\u1ee5c v\u1ee5 vi\u1ec7c ra quy\u1ebft \u0111\u1ecbnh \u2013 v\u00ed d\u1ee5 nh\u01b0 b\u00e1o c\u00e1o kinh doanh, ph\u00e2n t\u00edch xu h\u01b0\u1edbng.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">\u0110i\u1ec3m kh\u00e1c bi\u1ec7t ch\u00ednh gi\u1eefa hai kh\u00e1i ni\u1ec7m n\u00e0y l\u00e0:<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Ti\u00eau ch\u00ed<\/b><\/td>\n<td><b>OLTP<\/b><\/td>\n<td><b>OLAP<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">M\u1ee5c \u0111\u00edch<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Giao d\u1ecbch nhanh, th\u1eddi gian th\u1ef1c<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Ph\u00e2n t\u00edch d\u1eef li\u1ec7u l\u1ecbch s\u1eed<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Ki\u1ec3u d\u1eef li\u1ec7u<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Chi ti\u1ebft, ng\u1eafn h\u1ea1n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">T\u1ed5ng h\u1ee3p, d\u00e0i h\u1ea1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Truy v\u1ea5n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">CRUD \u0111\u01a1n gi\u1ea3n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Truy v\u1ea5n ph\u00e2n t\u00edch ph\u1ee9c t\u1ea1p<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><b> B\u1ea1n t\u1ed1i \u01b0u truy v\u1ea5n SQL b\u1eb1ng c\u00e1ch n\u00e0o (indexing, partitioning, explain plan, caching\u2026)?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">S\u1eed d\u1ee5ng index h\u1ee3p l\u00fd.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Chia partition theo th\u1eddi gian ho\u1eb7c category.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">D\u00f9ng EXPLAIN PLAN \u0111\u1ec3 ph\u00e2n t\u00edch truy v\u1ea5n.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">S\u1eed d\u1ee5ng cache cho c\u00e1c truy v\u1ea5n th\u01b0\u1eddng xuy\u00ean.<\/span><\/li>\n<\/ul>\n<blockquote><p><em>\u0110\u1ecdc th\u00eam: <a href=\"https:\/\/itviec.com\/blog\/function-trong-sql\/\" target=\"_blank\" rel=\"noopener\"><strong>T\u1ed5ng h\u1ee3p 90+ function trong SQL c\u1ea7n bi\u1ebft<\/strong><\/a><\/em><\/p><\/blockquote>\n<h3><b> Trong tr\u01b0\u1eddng h\u1ee3p n\u00e0o b\u1ea1n \u01b0u ti\u00ean d\u00f9ng NoSQL thay v\u00ec SQL?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Khi d\u1eef li\u1ec7u kh\u00f4ng c\u00f3 c\u1ea5u tr\u00fac r\u00f5 r\u00e0ng ho\u1eb7c c\u1ea5u tr\u00fac thay \u0111\u1ed5i th\u01b0\u1eddng xuy\u00ean.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Khi c\u1ea7n kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng theo chi\u1ec1u ngang (horizontal scalability).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Khi c\u1ea7n th\u1eddi gian \u0111\u00e1p \u1ee9ng c\u1ef1c nhanh (real-time).<\/span><\/li>\n<\/ul>\n<blockquote><p><em>\u0110\u1ecdc th\u00eam: <a href=\"https:\/\/itviec.com\/blog\/sql-vs-nosql\/\" target=\"_blank\" rel=\"noopener\"><strong>SQL vs NoSQL: C\u00e1ch ch\u1ecdn h\u1ec7 qu\u1ea3n tr\u1ecb c\u01a1 s\u1edf d\u1eef li\u1ec7u ph\u00f9 h\u1ee3p<\/strong><\/a><\/em><\/p><\/blockquote>\n<h2><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Data_Engineer_ve_Data_Architecture_ETLELT\"><\/span><b>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Data Architecture &amp; ETL\/ELT<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Khi d\u1eef li\u1ec7u b\u1eaft \u0111\u1ea7u ph\u1ee9c t\u1ea1p, b\u00e0i to\u00e1n kh\u00f4ng ch\u1ec9 l\u00e0 vi\u1ebft truy v\u1ea5n n\u1eefa &#8211; m\u00e0 l\u00e0 thi\u1ebft k\u1ebf h\u1ec7 th\u1ed1ng, ch\u1ecdn m\u00f4 h\u00ecnh l\u01b0u tr\u1eef, qu\u1ea3n l\u00fd ch\u1ea5t l\u01b0\u1ee3ng v\u00e0 d\u00f2ng ch\u1ea3y d\u1eef li\u1ec7u (data pipeline). Nh\u00f3m c\u00e2u h\u1ecfi n\u00e0y ki\u1ec3m tra kh\u1ea3 n\u0103ng thi\u1ebft k\u1ebf h\u1ea1 t\u1ea7ng d\u1eef li\u1ec7u linh ho\u1ea1t, c\u00f3 th\u1ec3 m\u1edf r\u1ed9ng, \u0111\u1ed3ng th\u1eddi t\u1ed1i \u01b0u hi\u1ec7u qu\u1ea3 v\u1eadn h\u00e0nh v\u00e0 chi ph\u00ed.<\/span><\/p>\n<h3><b> So s\u00e1nh s\u1ef1 kh\u00e1c bi\u1ec7t gi\u1eefa Data Warehouse v\u00e0 Data Lake. Khi n\u00e0o n\u00ean c\u00e2n nh\u1eafc m\u00f4 h\u00ecnh \u201clakehouse\u201d?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Data Warehouse l\u01b0u tr\u1eef d\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac v\u00e0 \u0111\u01b0\u1ee3c x\u1eed l\u00fd, t\u1ed1i \u01b0u cho c\u00e1c t\u00e1c v\u1ee5 ph\u00e2n t\u00edch chuy\u00ean s\u00e2u v\u00e0 b\u00e1o c\u00e1o kinh doanh. Data Lake l\u00e0 n\u01a1i l\u01b0u tr\u1eef d\u1eef li\u1ec7u th\u00f4, ch\u01b0a qua x\u1eed l\u00fd, ph\u00f9 h\u1ee3p v\u1edbi d\u1eef li\u1ec7u \u0111a d\u1ea1ng v\u00e0 linh ho\u1ea1t.<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Ti\u00eau ch\u00ed<\/strong><\/td>\n<td><strong>Data Warehouse<\/strong><\/td>\n<td><strong>Data Lake<\/strong><\/td>\n<td><strong>Lakehouse<\/strong><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">M\u1ee5c \u0111\u00edch<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Ph\u00e2n t\u00edch d\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac<\/span><\/td>\n<td><span style=\"font-weight: 400;\">L\u01b0u tr\u1eef d\u1eef li\u1ec7u th\u00f4<\/span><\/td>\n<td><span style=\"font-weight: 400;\">K\u1ebft h\u1ee3p c\u1ea3 hai<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u<\/span><\/td>\n<td><span style=\"font-weight: 400;\">C\u00f3 c\u1ea5u tr\u00fac r\u00f5 r\u00e0ng<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Kh\u00f4ng c\u1ea5u tr\u00fac ho\u1eb7c b\u00e1n c\u1ea5u tr\u00fac<\/span><\/td>\n<td><span style=\"font-weight: 400;\">C\u1ea5u tr\u00fac v\u00e0 b\u00e1n c\u1ea5u tr\u00fac<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Quy tr\u00ecnh x\u1eed l\u00fd<\/span><\/td>\n<td><span style=\"font-weight: 400;\">ETL (Extract-Transform-Load)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">ELT (Extract-Load-Transform)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">ELT, h\u1ed7 tr\u1ee3 x\u1eed l\u00fd nhanh<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Hi\u1ec7u su\u1ea5t truy v\u1ea5n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Nhanh<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Ch\u1eadm h\u01a1n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Nhanh t\u01b0\u01a1ng \u0111\u01b0\u01a1ng Warehouse<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">Lakehouse n\u00ean \u0111\u01b0\u1ee3c c\u00e2n nh\u1eafc khi doanh nghi\u1ec7p c\u1ea7n c\u1ea3 s\u1ef1 linh ho\u1ea1t c\u1ee7a Data Lake v\u00e0 kh\u1ea3 n\u0103ng ph\u00e2n t\u00edch nhanh, m\u1ea1nh m\u1ebd c\u1ee7a Data Warehouse.<\/span><\/p>\n<h3><b> B\u1ea1n t\u1eebng g\u1eb7p nh\u1eefng kh\u00f3 kh\u0103n g\u00ec khi duy tr\u00ec ch\u1ea5t l\u01b0\u1ee3ng d\u1eef li\u1ec7u trong Data Warehouse ho\u1eb7c Data Lake? B\u1ea1n \u0111\u00e3 x\u1eed l\u00fd th\u1ebf n\u00e0o?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">G\u1ee3i \u00fd m\u1ed9t s\u1ed1 kh\u00f3 kh\u0103n ph\u1ed5 bi\u1ebfn v\u00e0 c\u00e1ch x\u1eed l\u00fd:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Sai l\u1ec7ch schema (schema drift), khi c\u1ea5u tr\u00fac d\u1eef li\u1ec7u thay \u0111\u1ed5i kh\u00f4ng b\u00e1o tr\u01b0\u1edbc. Gi\u1ea3i ph\u00e1p l\u00e0 ch\u1eb7n thay \u0111\u1ed5i schema tr\u1ef1c ti\u1ebfp t\u1eeb c\u00e1c team kh\u00e1c, khi thay \u0111\u1ed5i c\u1ea7n ph\u1ea3i c\u00f3 s\u1ef1 ph\u00ea duy\u1ec7t c\u1ee7a team data ho\u1eb7c ph\u1ea3i nh\u1edd team data h\u1ed7 tr\u1ee3.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u thi\u1ebfu ho\u1eb7c c\u1eadp nh\u1eadt tr\u1ec5, g\u00e2y sai l\u1ec7ch trong b\u00e1o c\u00e1o. Tr\u01b0\u1eddng h\u1ee3p n\u00e0y t\u00f4i s\u1ebd trao \u0111\u1ed5i v\u1edbi c\u00e1c ph\u00f2ng ban \u0111\u1ec3 n\u1eafm b\u1eaft v\u1ea5n \u0111\u1ec1, xem n\u00fat th\u1eaft c\u1ed5 chai n\u1eb1m \u1edf \u0111\u00e2u.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u tr\u00f9ng l\u1eb7p ho\u1eb7c kh\u00f4ng nh\u1ea5t qu\u00e1n, l\u00e0m gi\u1ea3m \u0111\u1ed9 tin c\u1eady c\u1ee7a ph\u00e2n t\u00edch. Tr\u01b0\u1eddng h\u1ee3p n\u00e0y c\u00f3 r\u1ea5t nhi\u1ec1u nguy\u00ean nh\u00e2n c\u1ea7n ph\u1ea3i \u0111i\u1ec1u tra k\u1ef9 h\u01a1n.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Kh\u00f3 ki\u1ec3m so\u00e1t lineage v\u00e0 ngu\u1ed3n g\u1ed1c d\u1eef li\u1ec7u khi t\u00edch h\u1ee3p t\u1eeb nhi\u1ec1u h\u1ec7 th\u1ed1ng. Tr\u01b0\u1eddng h\u1ee3p n\u00e0y c\u1ea7n xem log ho\u1eb7c thi\u1ebft k\u1ebf m\u1ed9t h\u1ec7 th\u1ed1ng ETL c\u00f3 th\u1ec3 truy ngu\u1ed3n, ai \u0111ang upload d\u1eef li\u1ec7u g\u00ec, \u0111\u00e2y th\u01b0\u1eddng l\u00e0 d\u1ef1 \u00e1n d\u00e0i v\u00e0 \u0111\u00f4i khi c\u1ea7n ph\u1ea3i n\u00e2ng c\u1ea5p h\u1ec7 th\u1ed1ng.\u00a0<\/span><\/li>\n<\/ul>\n<h3><b> Ph\u00e2n bi\u1ec7t Star Schema v\u00e0 Snowflake Schema; \u01b0u v\u00e0 nh\u01b0\u1ee3c \u0111i\u1ec3m c\u1ee7a m\u1ed7i m\u00f4 h\u00ecnh?<\/b><\/h3>\n<table>\n<tbody>\n<tr>\n<td><b>Ti\u00eau ch\u00ed<\/b><\/td>\n<td><b>Star Schema<\/b><\/td>\n<td><b>Snowflake Schema<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">C\u1ea5u tr\u00fac<\/span><\/td>\n<td><span style=\"font-weight: 400;\">M\u1ed9t b\u1ea3ng fact li\u00ean k\u1ebft tr\u1ef1c ti\u1ebfp nhi\u1ec1u b\u1ea3ng dimension<\/span><\/td>\n<td><span style=\"font-weight: 400;\">B\u1ea3ng dimension ph\u00e2n c\u1ea5p, chi ti\u1ebft h\u01a1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Truy v\u1ea5n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Truy v\u1ea5n nhanh, d\u1ec5 vi\u1ebft<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Truy v\u1ea5n ph\u1ee9c t\u1ea1p h\u01a1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">B\u1ea3o tr\u00ec<\/span><\/td>\n<td><span style=\"font-weight: 400;\">D\u1ec5 b\u1ea3o tr\u00ec<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Kh\u00f3 b\u1ea3o tr\u00ec h\u01a1n do c\u1ea5u tr\u00fac ph\u1ee9c t\u1ea1p<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">L\u01b0u tr\u1eef<\/span><\/td>\n<td><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u d\u01b0 th\u1eeba nhi\u1ec1u<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u00cdt d\u01b0 th\u1eeba h\u01a1n, ti\u1ebft ki\u1ec7m dung l\u01b0\u1ee3ng<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">Star Schema ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c \u1ee9ng d\u1ee5ng c\u1ea7n truy v\u1ea5n nhanh, \u0111\u01a1n gi\u1ea3n. Snowflake Schema th\u00edch h\u1ee3p khi c\u1ea7n gi\u1ea3m d\u01b0 th\u1eeba d\u1eef li\u1ec7u v\u00e0 t\u1ed5 ch\u1ee9c d\u1eef li\u1ec7u chi ti\u1ebft.<\/span><\/p>\n<h3><b> Slowly Changing Dimensions (SCD) l\u00e0 g\u00ec? B\u1ea1n \u0111\u00e3 khi n\u00e0o s\u1eed d\u1ee5ng k\u1ebft h\u1ee3p Type 1 v\u00e0 Type 2?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">SCD l\u00e0 k\u1ef9 thu\u1eadt qu\u1ea3n l\u00fd thay \u0111\u1ed5i d\u1eef li\u1ec7u theo th\u1eddi gian trong Data Warehouse.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Type 1<\/b><span style=\"font-weight: 400;\">: Ghi \u0111\u00e8 d\u1eef li\u1ec7u c\u0169, kh\u00f4ng gi\u1eef l\u1ecbch s\u1eed.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Type 2<\/b><span style=\"font-weight: 400;\">: T\u1ea1o b\u1ea3n ghi m\u1edbi \u0111\u1ec3 gi\u1eef l\u1ecbch s\u1eed thay \u0111\u1ed5i.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">K\u1ebft h\u1ee3p Type 1 v\u00e0 Type 2 khi v\u1eeba c\u1ea7n c\u1eadp nh\u1eadt ngay th\u00f4ng tin m\u1edbi (v\u00ed d\u1ee5: \u0111\u1ecba ch\u1ec9 hi\u1ec7n t\u1ea1i c\u1ee7a kh\u00e1ch h\u00e0ng), v\u1eeba c\u1ea7n l\u01b0u tr\u1eef l\u1ecbch s\u1eed thay \u0111\u1ed5i (v\u00ed d\u1ee5: l\u1ecbch s\u1eed gi\u00e1 b\u00e1n s\u1ea3n ph\u1ea9m).<\/span><\/p>\n<h3><b> Khi n\u00e0o b\u1ea1n ch\u1ecdn s\u1eed d\u1ee5ng surrogate key thay v\u00ec natural key trong thi\u1ebft k\u1ebf b\u1ea3ng Dimension?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Natural key l\u00e0 kh\u00f3a ch\u00ednh \u0111\u01b0\u1ee3c l\u1ea5y t\u1eeb d\u1eef li\u1ec7u th\u1ef1c t\u1ebf c\u00f3 \u00fd ngh\u0129a nghi\u1ec7p v\u1ee5, v\u00ed d\u1ee5 nh\u01b0 m\u00e3 s\u1ed1 nh\u00e2n vi\u00ean, s\u1ed1 CMND.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Surrogate key l\u00e0 kh\u00f3a thay th\u1ebf kh\u00f4ng mang \u00fd ngh\u0129a nghi\u1ec7p v\u1ee5, th\u01b0\u1eddng l\u00e0 s\u1ed1 nguy\u00ean t\u1ef1 \u0111\u1ed9ng t\u0103ng, \u0111\u01b0\u1ee3c d\u00f9ng \u0111\u1ec3 \u0111\u1ecbnh danh duy nh\u1ea5t cho t\u1eebng b\u1ea3n ghi trong b\u1ea3ng.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Khi natural key kh\u00f4ng \u1ed5n \u0111\u1ecbnh, c\u00f3 kh\u1ea3 n\u0103ng thay \u0111\u1ed5i ho\u1eb7c \u0111\u1ec3 t\u1ed1i \u01b0u hi\u1ec7u n\u0103ng join, surrogate key l\u00e0 s\u1ed1 nguy\u00ean t\u1ef1 t\u0103ng s\u1ebd hi\u1ec7u qu\u1ea3 h\u01a1n. V\u00e0 \u0111\u00f4i khi c\u1ea7n b\u1ea3o m\u1eadt th\u00f4ng tin (\u1ea9n natural key nh\u1ea1y c\u1ea3m).<\/span><\/p>\n<h3><b> S\u1ef1 kh\u00e1c bi\u1ec7t ch\u00ednh gi\u1eefa ETL v\u00e0 ELT trong b\u1ed1i c\u1ea3nh c\u00e1c Cloud Data Warehouse nh\u01b0 BigQuery, Snowflake, Redshift?<\/b><\/h3>\n<table>\n<tbody>\n<tr>\n<td><b>Ti\u00eau ch\u00ed<\/b><\/td>\n<td><b>ETL (Extract-Transform-Load)<\/b><\/td>\n<td><b>ELT (Extract-Load-Transform)<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Quy tr\u00ecnh<\/span><\/td>\n<td><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u \u0111\u01b0\u1ee3c x\u1eed l\u00fd tr\u01b0\u1edbc khi t\u1ea3i v\u00e0o DW<\/span><\/td>\n<td><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u t\u1ea3i v\u00e0o DW tr\u01b0\u1edbc, sau \u0111\u00f3 m\u1edbi x\u1eed l\u00fd<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">T\u00ednh linh ho\u1ea1t<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Th\u1ea5p h\u01a1n, ph\u1ea3i x\u00e1c \u0111\u1ecbnh tr\u01b0\u1edbc schema<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Cao h\u01a1n, x\u1eed l\u00fd sau n\u00ean linh ho\u1ea1t h\u01a1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Hi\u1ec7u n\u0103ng<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Hi\u1ec7u n\u0103ng x\u1eed l\u00fd t\u1ea1i ngu\u1ed3n h\u1ea1n ch\u1ebf<\/span><\/td>\n<td><span style=\"font-weight: 400;\">T\u1eadn d\u1ee5ng kh\u1ea3 n\u0103ng x\u1eed l\u00fd m\u1ea1nh c\u1ee7a Cloud DW<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">ELT \u0111\u01b0\u1ee3c \u01b0a chu\u1ed9ng h\u01a1n khi s\u1eed d\u1ee5ng Cloud Data Warehouse do kh\u1ea3 n\u0103ng x\u1eed l\u00fd m\u1ea1nh m\u1ebd v\u00e0 linh ho\u1ea1t trong ph\u00e2n t\u00edch d\u1eef li\u1ec7u sau khi t\u1ea3i v\u00e0o.<\/span><\/p>\n<h3><b> B\u1ea1n c\u00f3 kinh nghi\u1ec7m g\u00ec v\u1edbi c\u00e1c c\u00f4ng c\u1ee5 ETL\/ELT (Airflow, Glue, dbt\u2026)?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">D\u01b0\u1edbi \u0111\u00e2y l\u00e0 m\u1ed9t s\u1ed1 tr\u01b0\u1eddng h\u1ee3p s\u1eed d\u1ee5ng ph\u1ed5 bi\u1ebfn c\u1ee7a m\u1ed7i c\u00f4ng c\u1ee5:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Airflow: \u00c1p d\u1ee5ng \u0111\u1ec3 orchestrate, qu\u1ea3n l\u00fd v\u00e0 l\u00ean l\u1ecbch tr\u00ecnh workflow d\u1eef li\u1ec7u.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Glue: D\u00f9ng trong c\u00e1c t\u00e1c v\u1ee5 t\u1ef1 \u0111\u1ed9ng h\u00f3a ETL, t\u1eadn d\u1ee5ng h\u1ea1 t\u1ea7ng serverless.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">dbt: D\u00f9ng \u0111\u1ec3 qu\u1ea3n l\u00fd v\u00e0 t\u00e1i s\u1eed d\u1ee5ng m\u00e3 SQL, h\u1ed7 tr\u1ee3 m\u00f4 h\u00ecnh h\u00f3a d\u1eef li\u1ec7u hi\u1ec7u qu\u1ea3.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">B\u1ea1n h\u00e3y ch\u1ecdn l\u1ecdc c\u00e1c tr\u01b0\u1eddng h\u1ee3p tr\u00ean \u0111\u1ec3 tr\u1ea3 l\u1eddi c\u00e2u h\u1ecfi n\u00e0y, d\u1ef1a tr\u00ean kinh nghi\u1ec7m c\u00e1 nh\u00e2n \u0111\u00e3 s\u1eed d\u1ee5ng nh\u1eefng c\u00f4ng c\u1ee5 n\u00e0o, cho m\u1ee5c \u0111\u00edch g\u00ec.<\/span><\/p>\n<h3><b> B\u1ea1n \u0111\u00e3 g\u1eb7p v\u1ea5n \u0111\u1ec1 g\u00ec v\u1ec1 \u0111\u1ed9 tr\u1ec5 (latency) ho\u1eb7c b\u1ea5t \u0111\u1ed3ng b\u1ed9 d\u1eef li\u1ec7u khi \u0111\u1ed3ng b\u1ed9 t\u1eeb nhi\u1ec1u ngu\u1ed3n?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">V\u1ea5n \u0111\u1ec1 v\u1edbi \u0111\u1ed9 tr\u1ec5 d\u1eef li\u1ec7u l\u00e0 khi d\u1eef li\u1ec7u t\u1eeb c\u00e1c h\u1ec7 th\u1ed1ng ngu\u1ed3n kh\u00f4ng \u0111\u1ed3ng b\u1ed9, khi\u1ebfn b\u00e1o c\u00e1o kh\u00f4ng c\u1eadp nh\u1eadt k\u1ecbp th\u1eddi.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Gi\u1ea3i ph\u00e1p l\u00e0: \u00c1p d\u1ee5ng streaming pipeline, thi\u1ebft l\u1eadp \u0111\u1ed3ng b\u1ed9 g\u1ea7n real-time, s\u1eed d\u1ee5ng c\u00e1c c\u00f4ng c\u1ee5 nh\u01b0 Kafka, Spark Streaming, hay Apache Flink \u0111\u1ec3 gi\u1ea3m thi\u1ec3u \u0111\u1ed9 tr\u1ec5; x\u00e2y d\u1ef1ng c\u01a1 ch\u1ebf c\u1ea3nh b\u00e1o t\u1ef1 \u0111\u1ed9ng khi ph\u00e1t hi\u1ec7n d\u1eef li\u1ec7u b\u1ea5t th\u01b0\u1eddng ho\u1eb7c \u0111\u1ed3ng b\u1ed9 ch\u1eadm.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Data_Engineer_ve_Big_Data_Streaming\"><\/span><b>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Big Data &amp; Streaming<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Khi h\u1ec7 th\u1ed1ng x\u1eed l\u00fd h\u00e0ng tri\u1ec7u record m\u1ed7i ng\u00e0y ho\u1eb7c d\u1eef li\u1ec7u \u0111\u1ebfn theo th\u1eddi gian th\u1ef1c, b\u1ea1n c\u1ea7n hi\u1ec3u r\u00f5 c\u00f4ng ngh\u1ec7 nh\u01b0 Hadoop, Spark, Kafka &#8211; kh\u00f4ng ch\u1ec9 l\u00fd thuy\u1ebft m\u00e0 l\u00e0 \u1ee9ng d\u1ee5ng th\u1ef1c t\u1ebf. C\u00e1c c\u00e2u h\u1ecfi trong nh\u00f3m n\u00e0y gi\u00fap nh\u00e0 tuy\u1ec3n d\u1ee5ng \u0111\u00e1nh gi\u00e1 b\u1ea1n c\u00f3 kh\u1ea3 n\u0103ng l\u00e0m vi\u1ec7c v\u1edbi d\u1eef li\u1ec7u l\u1edbn, x\u1eed l\u00fd ph\u00e2n t\u00e1n v\u00e0 ki\u1ebfn tr\u00fac streaming hay kh\u00f4ng.<\/span><\/p>\n<h3><b> B\u1ea1n c\u00f3 th\u1ec3 m\u00f4 t\u1ea3 ki\u1ebfn tr\u00fac Hadoop (HDFS, YARN) v\u00e0 m\u1ee5c \u0111\u00edch c\u1ee7a t\u1eebng th\u00e0nh ph\u1ea7n c\u1ed1t l\u00f5i?<\/b><\/h3>\n<ul>\n<li><b>HDFS (Hadoop Distributed File System)<\/b><span style=\"font-weight: 400;\">: l\u01b0u tr\u1eef d\u1eef li\u1ec7u ph\u00e2n t\u00e1n, ch\u1ecbu l\u1ed7i t\u1ed1t, t\u1ed1i \u01b0u h\u00f3a cho truy xu\u1ea5t batch v\u1edbi d\u1eef li\u1ec7u l\u1edbn.<\/span><\/li>\n<li><b>YARN (Yet Another Resource Negotiator)<\/b><span style=\"font-weight: 400;\">: qu\u1ea3n l\u00fd t\u00e0i nguy\u00ean, ph\u00e2n ph\u1ed1i c\u00f4ng vi\u1ec7c t\u00ednh to\u00e1n gi\u1eefa c\u00e1c node trong cluster, t\u1ed1i \u01b0u hi\u1ec7u su\u1ea5t v\u00e0 \u0111\u1ed9 \u1ed5n \u0111\u1ecbnh.<\/span><\/li>\n<\/ul>\n<h3><b> Khi l\u00e0m vi\u1ec7c v\u1edbi Spark, b\u1ea1n quan t\u00e2m nh\u1eefng g\u00ec \u0111\u1ec3 t\u1ed1i \u01b0u partition, shuffle, v\u00e0 tr\u00e1nh bottlenecks?<\/b><\/h3>\n<ul>\n<li><span style=\"font-weight: 400;\">X\u00e1c \u0111\u1ecbnh s\u1ed1 l\u01b0\u1ee3ng partitions ph\u00f9 h\u1ee3p d\u1ef1a tr\u00ean k\u00edch th\u01b0\u1edbc d\u1eef li\u1ec7u.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Gi\u1ea3m thi\u1ec3u shuffle b\u1eb1ng c\u00e1ch t\u1ed1i \u01b0u h\u00f3a vi\u1ec7c s\u1eafp x\u1ebfp, aggregate d\u1eef li\u1ec7u s\u1edbm nh\u1ea5t.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">S\u1eed d\u1ee5ng broadcast join khi m\u1ed9t b\u1ea3ng nh\u1ecf, gi\u00fap gi\u1ea3m l\u01b0\u1ee3ng shuffle.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Monitoring Spark UI \u0111\u1ec3 x\u00e1c \u0111\u1ecbnh v\u00e0 x\u1eed l\u00fd nhanh c\u00e1c bottleneck.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n t\u1eebng s\u1eed d\u1ee5ng Spark Streaming ho\u1eb7c Structured Streaming ch\u01b0a? Ch\u00fang c\u00f3 \u01b0u\/nh\u01b0\u1ee3c \u0111i\u1ec3m g\u00ec?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Spark Streaming<\/b><span style=\"font-weight: 400;\">: X\u1eed l\u00fd micro-batch, t\u00edch h\u1ee3p d\u1ec5 d\u00e0ng v\u1edbi h\u1ec7 sinh th\u00e1i Spark, nh\u01b0ng latency cao h\u01a1n real-time th\u1ef1c s\u1ef1.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Structured Streaming<\/b><span style=\"font-weight: 400;\">: API d\u1ec5 d\u00f9ng h\u01a1n, h\u1ed7 tr\u1ee3 x\u1eed l\u00fd li\u00ean t\u1ee5c, t\u1ed1t h\u01a1n v\u1ec1 fault-tolerance v\u00e0 \u0111\u1ed9 ch\u00ednh x\u00e1c.<\/span><\/li>\n<\/ul>\n<table>\n<tbody>\n<tr>\n<td><b>Ti\u00eau ch\u00ed<\/b><\/td>\n<td><b>Spark Streaming (DStream API)<\/b><\/td>\n<td><b>Structured Streaming (Dataset\/DataFrame API)<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">M\u00f4 h\u00ecnh x\u1eed l\u00fd<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Micro\u2011batch c\u1ed1 \u0111\u1ecbnh (th\u01b0\u1eddng \u2265 500 ms)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Continuous micro\u2011batch (t\u1ef1 \u0111\u1ed9ng, c\u00f3 ch\u1ebf \u0111\u1ed9 continuous processing \u2248 ~1 ms)<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">API<\/span><\/td>\n<td><span style=\"font-weight: 400;\">RDD\u2011like, r\u1eddi r\u1ea1c, ph\u1ea3i qu\u1ea3n l\u00fd tr\u1ea1ng th\u00e1i th\u1ee7 c\u00f4ng<\/span><\/td>\n<td><span style=\"font-weight: 400;\">T\u01b0\u01a1ng t\u1ef1 SQL\/DataFrame; khai b\u00e1o (declarative), qu\u1ea3n l\u00fd tr\u1ea1ng th\u00e1i t\u1ef1 \u0111\u1ed9ng<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Latency<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Cao h\u01a1n real\u2011time (v\u00e0i tr\u0103m ms \u2192 gi\u00e2y)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Th\u1ea5p h\u01a1n (&lt; 100 ms v\u1edbi micro\u2011batch; v\u00e0i ms v\u1edbi ch\u1ebf \u0111\u1ed9 continuous)<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Kh\u1ea3 n\u0103ng ch\u1ecbu l\u1ed7i<\/span><\/td>\n<td><span style=\"font-weight: 400;\">D\u1ef1a v\u00e0o lineage + checkpoint; kh\u00f4i ph\u1ee5c ph\u1ee9c t\u1ea1p khi tr\u1ea1ng th\u00e1i l\u1edbn<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Checkpoint + state store t\u1ed1i \u01b0u; exact\u2011once m\u1ea1nh h\u01a1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">T\u00ednh ch\u00ednh x\u00e1c<\/span><\/td>\n<td><span style=\"font-weight: 400;\">At\u2011least\u2011once m\u1eb7c \u0111\u1ecbnh; d\u1ec5 tr\u00f9ng l\u1eb7p n\u1ebfu kh\u00f4ng c\u1ea9n th\u1eadn<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Exactly\u2011once end\u2011to\u2011end (k\u1ec3 c\u1ea3 v\u1edbi sinks h\u1ed7 tr\u1ee3)<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Semantics th\u1eddi gian<\/span><\/td>\n<td><span style=\"font-weight: 400;\">H\u1ea1n ch\u1ebf; x\u1eed l\u00fd processing\u2011time l\u00e0 ch\u1ee7 y\u1ebfu<\/span><\/td>\n<td><span style=\"font-weight: 400;\">H\u1ed7 tr\u1ee3 event\u2011time, watermark, window tu\u1ef3 bi\u1ebfn<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">T\u00edch h\u1ee3p ngu\u1ed3n\/\u0111\u00edch<\/span><\/td>\n<td><span style=\"font-weight: 400;\">H\u1ec7 sinh th\u00e1i Spark (Kafka, Flume, HDFS, etc.)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">K\u1ebf th\u1eeba to\u00e0n b\u1ed9 + th\u00eam FileSink, JDBC, Delta Lake, Iceberg\u2026<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Back\u2011pressure<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Th\u1ee7 c\u00f4ng (spark.streaming.backpressure.enabled)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">T\u1ef1 \u0111\u1ed9ng \u0111i\u1ec1u ch\u1ec9nh t\u1ed1c \u0111\u1ed9 v\u00e0 k\u00edch th\u01b0\u1edbc batch<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">B\u1ea3o tr\u00ec &amp; t\u01b0\u01a1ng lai<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u0110\u00e3 \u0111\u1ee9ng y\u00ean t\u1eeb Spark 3.x; ch\u1ec9 s\u1eeda l\u1ed7i<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u0110\u01b0\u1ee3c \u01b0u ti\u00ean ph\u00e1t tri\u1ec3n; t\u00ednh n\u0103ng m\u1edbi (Lakehouse, incremental ETL)<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Khi n\u00e0o n\u00ean d\u00f9ng?<\/span><\/td>\n<td><span style=\"font-weight: 400;\">H\u1ec7 c\u0169 c\u1ea7n gi\u1eef nguy\u00ean; y\u00eau c\u1ea7u \u0111\u01a1n gi\u1ea3n, \u00edt thay \u0111\u1ed5i<\/span><\/td>\n<td><span style=\"font-weight: 400;\">H\u1ea7u h\u1ebft use\u2011case m\u1edbi: CDC, streaming ETL, real\u2011time analytics<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>\u01afu \u0111i\u1ec3m chung<\/b><span style=\"font-weight: 400;\">: d\u1ec5 t\u00edch h\u1ee3p, x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn hi\u1ec7u qu\u1ea3.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Nh\u01b0\u1ee3c \u0111i\u1ec3m chung<\/b><span style=\"font-weight: 400;\">: \u0111\u1ed9 tr\u1ec5 cao h\u01a1n c\u00e1c gi\u1ea3i ph\u00e1p streaming real-time kh\u00e1c (Flink, Kafka Streams).<\/span><\/li>\n<\/ul>\n<h3><b> Apache Kafka ho\u1ea1t \u0111\u1ed9ng theo m\u00f4 h\u00ecnh pub-sub nh\u01b0 th\u1ebf n\u00e0o? Gi\u1ea3i th\u00edch Topic, Partition, Consumer Group m\u1ed9t c\u00e1ch d\u1ec5 hi\u1ec3u.<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">M\u00ecnh c\u00f3 th\u1ec3 d\u01b0a ra m\u1ed9t \u1ea9n d\u1ee5 trong m\u1ed9t th\u01b0 vi\u1ec7n s\u00e1ch v\u00e0 c\u00e1ch s\u1eafp x\u1ebfp s\u00e1ch \u0111\u1ec3 di\u1ec5n gi\u1ea3i m\u1ed9t c\u00e1ch d\u1ec5 h\u00ecnh dung c\u00e1ch ho\u1ea1t \u0111\u1ed9ng c\u1ee7a pub-sub trong Kafka.\u00a0<\/span><\/p>\n<p><b>Pub-sub trong Kafka<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><i><span style=\"font-weight: 400;\">Ng\u01b0\u1eddi g\u1eedi (Producer)<\/span><\/i><span style=\"font-weight: 400;\"> \u201c\u0111\u1ea9y\u201d th\u00f4ng \u0111i\u1ec7p v\u00e0o Kafka.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><i><span style=\"font-weight: 400;\">Ng\u01b0\u1eddi nh\u1eadn (Consumer)<\/span><\/i> <b>t\u1ef1 k\u00e9o<\/b><span style=\"font-weight: 400;\"> (pull) th\u00f4ng \u0111i\u1ec7p v\u1ec1, ch\u1ee9 Kafka kh\u00f4ng ch\u1ee7 \u0111\u1ed9ng \u201cnh\u00e9t\u201d d\u1eef li\u1ec7u v\u00e0o tay ng\u01b0\u1eddi nh\u1eadn. C\u00e1ch l\u00e0m n\u00e0y gi\u00fap consumer \u0111\u1ecdc nhanh hay ch\u1eadm t\u00f9y s\u1ee9c m\u00e0 kh\u00f4ng l\u00e0m ngh\u1ebdn h\u1ec7 th\u1ed1ng.<\/span><\/li>\n<\/ul>\n<p><b>Topic \u2013 k\u1ec7 s\u00e1ch<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">H\u00e3y h\u00ecnh dung <\/span><b>Topic<\/b><span style=\"font-weight: 400;\"> nh\u01b0 m\u1ed9t k\u1ec7 s\u00e1ch g\u1eafn nh\u00e3n. M\u1ecdi cu\u1ed1n s\u00e1ch (th\u00f4ng \u0111i\u1ec7p) li\u00ean quan \u0111\u1ebfn c\u00f9ng ch\u1ee7 \u0111\u1ec1 \u0111\u1ec1u \u0111\u01b0\u1ee3c x\u1ebfp l\u00ean \u0111\u00fang k\u1ec7 \u0111\u00f3.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">B\u1ea1n c\u00f3 th\u1ec3 c\u00f3 nhi\u1ec1u k\u1ec7 kh\u00e1c nhau: \u201corders\u201d, \u201clogs\u201d, \u201cpayments\u201d\u2026<\/span><\/li>\n<\/ul>\n<p><b>Partition \u2013 ng\u0103n k\u1ec7<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">M\u1ed7i k\u1ec7 (topic) l\u1ea1i \u0111\u01b0\u1ee3c chia th\u00e0nh nhi\u1ec1u <\/span><b>ng\u0103n (partition)<\/b><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Nh\u1edd chia ng\u0103n, b\u1ea1n c\u00f3 th\u1ec3 \u0111\u1eb7t c\u00e1c ng\u0103n \u1edf nhi\u1ec1u th\u01b0 vi\u1ec7n (broker) kh\u00e1c nhau \u2192 Kafka m\u1edf r\u1ed9ng ngang h\u00e0ng (scale-out) r\u1ea5t d\u1ec5.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">B\u00ean trong m\u1ed9t partition, s\u00e1ch \u0111\u01b0\u1ee3c s\u1eafp theo th\u1ee9 t\u1ef1 th\u1eddi gian \u2013 \u0111\u1ecdc l\u1ea1i lu\u00f4n ra \u0111\u00fang tr\u00ecnh t\u1ef1 g\u1eedi v\u00e0o.<\/span><\/li>\n<\/ul>\n<p><b>Consumer Group \u2013 nh\u00f3m \u0111\u1ed9c gi\u1ea3<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">M\u1ed9t <\/span><b>Consumer Group<\/b><span style=\"font-weight: 400;\"> gi\u1ed1ng \u0111\u1ed9i \u0111\u1ed9c gi\u1ea3 c\u00f9ng \u0111\u1ecdc chung m\u1ed9t k\u1ec7, nh\u01b0ng chia nhau c\u00e1c ng\u0103n.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Quy t\u1eafc: m\u1ed7i partition ch\u1ec9 \u0111\u01b0\u1ee3c m\u1ed9t th\u00e0nh vi\u00ean trong nh\u00f3m x\u1eed l\u00fd \u2192 kh\u00f4ng b\u1ecb tr\u00f9ng l\u1eb7p.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">N\u1ebfu nh\u00f3m c\u00f3 \u00edt ng\u01b0\u1eddi h\u01a1n s\u1ed1 partition, m\u1ed9t ng\u01b0\u1eddi c\u00f3 th\u1ec3 \u0111\u1ecdc nhi\u1ec1u ng\u0103n; th\u00eam ng\u01b0\u1eddi m\u1edbi, Kafka t\u1ef1 c\u00e2n b\u1eb1ng l\u1ea1i \u0111\u1ec3 t\u1ea3i \u0111\u01b0\u1ee3c chia \u0111\u1ec1u.<\/span><\/li>\n<\/ul>\n<blockquote><p><em>Ph\u1ecfng v\u1ea5n \u0111\u1ed9c quy\u1ec1n v\u1edbi Senior Software Engineer t\u1ea1i Ninja Van: <a href=\"https:\/\/itviec.com\/blog\/kafka-la-gi\/\" target=\"_blank\" rel=\"noopener\"><strong>Kafka l\u00e0 g\u00ec? Nh\u1eefng l\u1ee3i \u00edch tuy\u1ec7t v\u1eddi m\u00e0 Kafka mang l\u1ea1i cho Dev<\/strong><\/a><\/em><\/p><\/blockquote>\n<h3><b> Exactly-Once Semantics trong Kafka \u0111\u01b0\u1ee3c \u0111\u1ea3m b\u1ea3o b\u1eb1ng c\u00e1ch n\u00e0o?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Exactly\u2011once semantics l\u00e0 cam k\u1ebft c\u1ee7a h\u1ec7 th\u1ed1ng ph\u00e2n t\u00e1n (v\u00ed d\u1ee5: message queue, stream processing) r\u1eb1ng m\u1ed7i th\u00f4ng \u0111i\u1ec7p hay b\u1ea3n ghi s\u1ebd <\/span><b>\u0111\u01b0\u1ee3c x\u1eed l\u00fd \u0111\u00fang m\u1ed9t v\u00e0 ch\u1ec9 m\u1ed9t l\u1ea7n<\/b><span style=\"font-weight: 400;\"> &#8211; kh\u00f4ng b\u1ecb b\u1ecf s\u00f3t (at\u2011least\u2011once) v\u00e0 c\u0169ng kh\u00f4ng b\u1ecb x\u1eed l\u00fd l\u1eb7p (at\u2011most\u2011once) &#8211; k\u1ec3 c\u1ea3 khi x\u1ea3y ra l\u1ed7i, retry ho\u1eb7c kh\u1edfi \u0111\u1ed9ng l\u1ea1i.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kafka s\u1eed d\u1ee5ng transactional API v\u00e0 offset management \u0111\u1ec3 \u0111\u1ea3m b\u1ea3o m\u1ed7i message \u0111\u01b0\u1ee3c x\u1eed l\u00fd \u0111\u00fang m\u1ed9t l\u1ea7n duy nh\u1ea5t, tr\u00e1nh tr\u00f9ng l\u1eb7p ho\u1eb7c m\u1ea5t d\u1eef li\u1ec7u.<\/span><\/p>\n<h3><b> B\u1ea1n x\u1eed l\u00fd d\u1eef li\u1ec7u tr\u1ec5 (late data) hay d\u1eef li\u1ec7u \u0111\u1ebfn kh\u00f4ng \u0111\u00fang th\u1ee9 t\u1ef1 th\u1eddi gian trong ki\u1ebfn tr\u00fac streaming ra sao?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">T\u00ecm hi\u1ec3u nguy\u00ean nh\u00e2n d\u1eef li\u1ec7u b\u1ecb tr\u1ec5: ch\u1ec9 tr\u1ec5 trong h\u00f4m nay hay m\u1ed7i schedule \u0111\u1ec1u tr\u1ec5, tr\u1ec5 do batch x\u1eed l\u00fd l\u1ea7n n\u00e0y b\u1ecb m\u1eafc k\u1eb9t hay m\u1ed7i schedule \u0111\u1ec1u m\u1eafc k\u1eb9t, t\u1eeb \u0111\u00f3 x\u00e1c \u0111\u1ecbnh b\u1ea3ng ho\u1eb7c logic c\u1ee5 th\u1ec3 d\u1eabn \u0111\u1ebfn vi\u1ec7c \u0111\u00f3.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">\u00c1p d\u1ee5ng c\u01a1 ch\u1ebf watermarking \u0111\u1ec3 x\u00e1c \u0111\u1ecbnh kho\u1ea3ng th\u1eddi gian d\u1eef li\u1ec7u c\u00f3 th\u1ec3 tr\u1ec5.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">S\u1eed d\u1ee5ng windowing \u0111\u1ec3 nh\u00f3m d\u1eef li\u1ec7u theo kho\u1ea3ng th\u1eddi gian h\u1ee3p l\u00fd.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">K\u1ebft h\u1ee3p c\u00e1c chi\u1ebfn l\u01b0\u1ee3c l\u01b0u tr\u1eef t\u1ea1m th\u1eddi (buffering) d\u1eef li\u1ec7u tr\u1ec5 v\u00e0 x\u1eed l\u00fd khi d\u1eef li\u1ec7u \u0111\u1ebfn \u0111\u1ee7 ho\u1eb7c \u0111\u1ea1t timeout nh\u1ea5t \u0111\u1ecbnh.<\/span><\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Data_Engineer_ve_Cloud_Workflow_Orchestration\"><\/span><b>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Cloud &amp; Workflow Orchestration<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Trong th\u1eddi \u0111\u1ea1i cloud-first, vi\u1ec7c tri\u1ec3n khai pipeline tr\u00ean AWS, GCP, Azure kh\u00f4ng c\u00f2n l\u00e0 \u201cc\u1ed9ng th\u00eam\u201d, m\u00e0 g\u1ea7n nh\u01b0 l\u00e0 y\u00eau c\u1ea7u c\u01a1 b\u1ea3n. Nh\u00e0 tuy\u1ec3n d\u1ee5ng s\u1ebd h\u1ecfi b\u1ea1n c\u00e1ch t\u1ed1i \u01b0u chi ph\u00ed cloud, b\u1ea3o m\u1eadt d\u1eef li\u1ec7u, v\u00e0 t\u1ef1 \u0111\u1ed9ng h\u00f3a workflow b\u1eb1ng c\u00e1c c\u00f4ng c\u1ee5 nh\u01b0 Airflow, Prefect. \u0110\u00e2y c\u0169ng l\u00e0 n\u01a1i th\u1ec3 hi\u1ec7n b\u1ea1n c\u00f3 t\u01b0 duy h\u1ec7 th\u1ed1ng v\u00e0 DevOps mindset kh\u00f4ng.<\/span><\/p>\n<h3><b> B\u1ea1n \u0111\u00e3 tri\u1ec3n khai data pipeline tr\u00ean n\u1ec1n t\u1ea3ng Cloud (AWS, GCP, Azure) ch\u01b0a? Chia s\u1ebb m\u1ed9t v\u00ed d\u1ee5 c\u1ee5 th\u1ec3.<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">D\u01b0\u1edbi \u0111\u00e2y l\u00e0 v\u00ed d\u1ee5 quy tr\u00ecnh tri\u1ec3n khai tr\u00ean t\u1eebng n\u1ec1n t\u1ea3ng, b\u1ea1n h\u00e3y l\u1ef1a ch\u1ecdn d\u1ef1a theo kinh nghi\u1ec7m c\u00e1 nh\u00e2n:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>AWS:<\/b><span style=\"font-weight: 400;\"> D\u1eef li\u1ec7u raw \u0111\u01b0\u1ee3c \u0111\u1ea9y v\u00e0o Amazon S3 \u25ba AWS Lambda t\u1ef1 \u0111\u1ed9ng k\u00edch ho\u1ea1t AWS Glue Job (PySpark) \u0111\u1ec3 l\u00e0m s\u1ea1ch &amp; chu\u1ea9n ho\u00e1 \u25ba Ghi v\u00e0o Amazon Redshift; Glue Catalog gi\u1eef metadata, CloudWatch theo d\u00f5i l\u1ed7i.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>GCP:<\/b><span style=\"font-weight: 400;\"> File CSV \u0111\u1ebfn Cloud Storage \u25ba Cloud Functions t\u1ea1o th\u00f4ng \u0111i\u1ec7p Pub\/Sub \u25ba Cloud Dataflow (Apache Beam) bi\u1ebfn \u0111\u1ed5i &amp; chu\u1ea9n ho\u00e1 \u25ba N\u1ea1p k\u1ebft qu\u1ea3 v\u00e0o BigQuery, d\u00f9ng Cloud Monitoring gi\u00e1m s\u00e1t.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Azure:<\/b><span style=\"font-weight: 400;\"> IoT\/CSV \u0111\u1ed5 v\u00e0o Azure Blob Storage \u25ba Event Grid g\u1ecdi Azure Data Factory pipeline, trong \u0111\u00f3 Mapping Data Flows l\u00e0m ETL \u25ba L\u01b0u tr\u1eef ph\u00e2n t\u00edch tr\u00ean Azure Synapse Analytics (SQL Pool); Azure Monitor c\u1ea3nh b\u00e1o l\u1ed7i.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n l\u00e0m g\u00ec \u0111\u1ec3 \u0111\u1ea3m b\u1ea3o b\u1ea3o m\u1eadt v\u00e0 quy\u1ec1n truy c\u1eadp (IAM) \u0111\u1ed1i v\u1edbi d\u1eef li\u1ec7u b\u1ea3o m\u1eadt cao tr\u00ean Cloud?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">X\u00e2y d\u1ef1ng ch\u00ednh s\u00e1ch truy c\u1eadp theo nguy\u00ean t\u1eafc \u00edt quy\u1ec1n nh\u1ea5t (principle of least privilege).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">S\u1eed d\u1ee5ng Multi-factor Authentication (MFA) v\u00e0 ki\u1ec3m so\u00e1t truy c\u1eadp b\u1eb1ng IAM roles.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">M\u00e3 h\u00f3a d\u1eef li\u1ec7u l\u01b0u tr\u1eef (at rest) v\u00e0 d\u1eef li\u1ec7u truy\u1ec1n t\u1ea3i (in transit).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Th\u01b0\u1eddng xuy\u00ean audit v\u00e0 review quy\u1ec1n truy c\u1eadp \u0111\u1ecbnh k\u1ef3.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n c\u00f3 gi\u1ea3i ph\u00e1p ho\u1eb7c kinh nghi\u1ec7m g\u00ec trong vi\u1ec7c ki\u1ec3m so\u00e1t chi ph\u00ed x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn tr\u00ean Cloud?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">\u0110\u1ea7u ti\u00ean b\u1ea1n c\u00f3 th\u1ec3 chia s\u1ebb nh\u1eadn \u0111\u1ecbnh c\u1ee7a m\u00ecnh v\u1ec1 t\u1ea7m quan tr\u1ecdng c\u1ee7a vi\u1ec7c ki\u1ec3m so\u00e1t chi ph\u00ed x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn tr\u00ean Cloud nh\u01b0 sau:\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u ph\u00ecnh to nhanh, ch\u1ec9 m\u1ed9t c\u1ee5m Spark qu\u00ean t\u1eaft c\u0169ng \u201c\u0111\u1ed1t\u201d ng\u00e2n s\u00e1ch trong v\u00e0i gi\u1edd. Vi\u1ec7c theo d\u00f5i s\u00e1t chi ph\u00ed kh\u00f4ng ch\u1ec9 gi\u00fap d\u1ef1 \u00e1n d\u1eef li\u1ec7u c\u00f3 ROI t\u00edch c\u1ef1c, m\u00e0 c\u00f2n gi\u00fap team Data n\u00f3i chuy\u1ec7n d\u1ec5 d\u00e0ng v\u1edbi ph\u00f2ng T\u00e0i ch\u00ednh.<\/span><\/p>\n<p><b>Sau \u0111\u00f3, h\u00e3y \u0111\u1ec1 xu\u1ea5t c\u00e1ch ki\u1ec3m so\u00e1t chi ph\u00ed:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">\u0110\u1eb7t budget alert + dashboard (AWS Cost Explorer, GCP Billing, Azure Cost Management).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Right\u2011size &amp; auto\u2011scale; d\u00f9ng Spot\/Preemptible VM cho batch, serverless cho workload dao \u0111\u1ed9ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1eadt lifecycle tiering (S3 IA \u2192 Glacier, GCS Coldline, Azure Archive) v\u00e0 d\u1ecdn \u201czombie\u201d resources \u0111\u1ecbnh k\u1ef3.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Partition\/n\u00e9n d\u1eef li\u1ec7u, t\u1eadn d\u1ee5ng query cache; cam k\u1ebft Reserved\/Committed Use cho d\u1ecbch v\u1ee5 ch\u1ea1y 24\/7.<\/span><\/li>\n<\/ul>\n<h3><b> Airflow DAG (Directed Acyclic Graph) ho\u1ea1t \u0111\u1ed9ng th\u1ebf n\u00e0o? \u01afu th\u1ebf c\u1ee7a DAG so v\u1edbi vi\u1ec7c ch\u1ea1y job th\u1ee7 c\u00f4ng l\u00e0 g\u00ec?<\/b><\/h3>\n<p><b>C\u00e1ch ho\u1ea1t \u0111\u1ed9ng:<\/b><span style=\"font-weight: 400;\"> M\u1ed9t DAG (Directed Acyclic Graph) trong Apache Airflow m\u00f4 t\u1ea3 workflow d\u01b0\u1edbi d\u1ea1ng c\u00e1c \u201cn\u00fat\u201d task c\u00f9ng quan h\u1ec7 ph\u1ee5 thu\u1ed9c, \u0111\u1ec3 Scheduler t\u1ef1 \u0111\u1ed9ng ch\u1ea1y, gi\u00e1m s\u00e1t v\u00e0 kh\u00f4i ph\u1ee5c khi l\u1ed7i.<\/span><\/p>\n<p><b>\u01afu th\u1ebf c\u1ee7a DAG so v\u1edbi vi\u1ec7c ch\u1ea1y job th\u1ee7 c\u00f4ng:<\/b><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Ti\u00eau ch\u00ed<\/b><\/td>\n<td><b>Airflow DAG<\/b><\/td>\n<td><b>Ch\u1ea1y th\u1ee7 c\u00f4ng (cron\/script)<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Ph\u1ee5 thu\u1ed9c &amp; th\u1ee9 t\u1ef1<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Khai b\u00e1o graph r\u00f5 r\u00e0ng; \u00e9p ch\u1ea1y \u0111\u00fang th\u1ee9 t\u1ef1<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Logic c\u00e0i tay; d\u1ec5 ch\u1ea1y sai b\u01b0\u1edbc<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">L\u1eadp l\u1ecbch &amp; trigger<\/span><\/td>\n<td><span style=\"font-weight: 400;\">schedule_interval, sensor s\u1ef1 ki\u1ec7n; s\u1eeda ngay trong code<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Cron c\u1ed1 \u0111\u1ecbnh; th\u00eam trigger ph\u1ea3i t\u1ef1 vi\u1ebft<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Gi\u00e1m s\u00e1t<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Web UI, tr\u1ea1ng th\u00e1i m\u00e0u, log t\u1eadp trung<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Grep log t\u1eebng m\u00e1y; kh\u00f4ng c\u00f3 dashboard<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Retry &amp; kh\u00f4i ph\u1ee5c<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Retry t\u1ef1 \u0111\u1ed9ng t\u1eebng task; backfill ch\u1ecdn l\u1ecdc<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Th\u01b0\u1eddng rerun to\u00e0n job, d\u1ec5 tr\u00f9ng d\u1eef li\u1ec7u<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">M\u1edf r\u1ed9ng<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Worker pool, scale h\u00e0ng tr\u0103m task<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Parallel kh\u00f3, d\u1ec5 tranh ch\u1ea5p t\u00e0i nguy\u00ean<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><b>N\u1ebfu m\u1ed9t task trong Airflow th\u1ea5t b\u1ea1i, chi\u1ebfn l\u01b0\u1ee3c retry v\u00e0 alert c\u1ee7a b\u1ea1n th\u01b0\u1eddng th\u1ebf n\u00e0o?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">B\u1ea1n c\u00f3 th\u1ec3 m\u1edf \u0111\u1ea7u b\u1eb1ng nguy\u00ean nh\u00e2n ph\u1ed5 bi\u1ebfn khi\u1ebfn task Airflow th\u1ea5t b\u1ea1i:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Network\/API timeout ho\u1eb7c v\u01b0\u1ee3t quota.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Worker thi\u1ebfu RAM\/disk.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Credential ho\u1eb7c IAM h\u1ebft h\u1ea1n\/thi\u1ebfu quy\u1ec1n.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u ho\u1eb7c schema \u0111\u1ed5i b\u1ea5t ng\u1edd.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Bug trong code, version library l\u1ec7ch.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">Sau \u0111\u00f3, tr\u1ea3 l\u1eddi chi\u1ebfn l\u01b0\u1ee3c retry &amp; alert:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u1ea5u h\u00ecnh <\/span><span style=\"font-weight: 400;\">retries=3<\/span><span style=\"font-weight: 400;\"> v\u1edbi <\/span><span style=\"font-weight: 400;\">retry_delay=10 \u2192 20 \u2192 40 ph\u00fat<\/span><span style=\"font-weight: 400;\"> (exponential back\u2011off).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">D\u00f9ng <\/span><span style=\"font-weight: 400;\">on_failure_callback<\/span><span style=\"font-weight: 400;\">: g\u1eedi Slack + email ngay l\u1ea7n l\u1ed7i \u0111\u1ea7u; h\u1ebft retry v\u1eabn l\u1ed7i th\u00ec escalate t\u1edbi DevOps\/PM.<\/span><\/li>\n<\/ul>\n<h3><b> Kh\u00e1c bi\u1ec7t ch\u00ednh gi\u1eefa Airflow v\u1edbi Prefect hay Luigi l\u00e0 g\u00ec?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Airflow, Prefect v\u00e0 Luigi \u0111\u1ec1u l\u00e0 workflow orchestrator m\u00e3 ngu\u1ed3n m\u1edf, gi\u00fap khai b\u00e1o, l\u1eadp l\u1ecbch v\u00e0 gi\u00e1m s\u00e1t c\u00e1c pipeline ETL\/batch. Ch\u00fang di\u1ec5n t\u1ea3 quy tr\u00ecnh d\u01b0\u1edbi d\u1ea1ng \u0111\u1ed3 th\u1ecb task-ph\u1ee5 thu\u1ed9c, h\u1ed7 tr\u1ee3 retry, alert v\u00e0 logging, qua \u0111\u00f3 gi\u1ea3m thao t\u00e1c th\u1ee7 c\u00f4ng.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">M\u1ed9t s\u1ed1 \u0111i\u1ec3m kh\u00e1c bi\u1ec7t ch\u00ednh:<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Ti\u00eau ch\u00ed<\/b><\/td>\n<td><b>Airflow<\/b><\/td>\n<td><b>Prefect<\/b><\/td>\n<td><b>Luigi<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Giao di\u1ec7n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Web UI m\u1ea1nh m\u1ebd, ph\u1ed5 bi\u1ebfn<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Web UI hi\u1ec7n \u0111\u1ea1i, d\u1ec5 d\u00f9ng<\/span><\/td>\n<td><span style=\"font-weight: 400;\">UI \u0111\u01a1n gi\u1ea3n h\u01a1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Ki\u1ebfn tr\u00fac<\/span><\/td>\n<td><span style=\"font-weight: 400;\">DAG-based, scheduler trung t\u00e2m<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Task-based, distributed d\u1ec5 h\u01a1n<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Task-based \u0111\u01a1n gi\u1ea3n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">M\u1ee9c \u0111\u1ed9 linh ho\u1ea1t<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Cao, h\u1ed7 tr\u1ee3 nhi\u1ec1u operator<\/span><\/td>\n<td><span style=\"font-weight: 400;\">R\u1ea5t cao, h\u1ed7 tr\u1ee3 state management t\u1ed1t<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u00cdt h\u01a1n, \u0111\u01a1n gi\u1ea3n h\u01a1n<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">\u0110\u1ed9 ph\u1ed5 bi\u1ebfn<\/span><\/td>\n<td><span style=\"font-weight: 400;\">R\u1ed9ng r\u00e3i, nhi\u1ec1u c\u1ed9ng \u0111\u1ed3ng<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u0110ang t\u0103ng tr\u01b0\u1edfng nhanh<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u00cdt ph\u1ed5 bi\u1ebfn h\u01a1n<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Data_Engineer_ve_Lap_trinh_Data_Quality\"><\/span><b>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 L\u1eadp tr\u00ecnh &amp; Data Quality<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">M\u1ed9t Data Engineer gi\u1ecfi kh\u00f4ng th\u1ec3 thi\u1ebfu k\u1ef9 n\u0103ng l\u1eadp tr\u00ecnh, \u0111\u1eb7c bi\u1ec7t l\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u b\u1eb1ng Python, Scala, ho\u1eb7c Java. B\u1ea1n c\u1ea7n ch\u1ee9ng minh kh\u1ea3 n\u0103ng vi\u1ebft m\u00e3 s\u1ea1ch, t\u1ed1i \u01b0u hi\u1ec7u n\u0103ng v\u00e0 ki\u1ec3m so\u00e1t ch\u1ea5t l\u01b0\u1ee3ng d\u1eef li\u1ec7u t\u1eeb \u0111\u1ea7u v\u00e0o \u0111\u1ebfn \u0111\u1ea7u ra. C\u00e1c c\u00e2u h\u1ecfi \u1edf \u0111\u00e2y gi\u00fap b\u1ea1n th\u1ec3 hi\u1ec7n kh\u1ea3 n\u0103ng coding th\u1ef1c chi\u1ebfn, c\u0169ng nh\u01b0 hi\u1ec3u bi\u1ebft v\u1ec1 ki\u1ec3m th\u1eed v\u00e0 b\u1ea3o tr\u00ec.<\/span><\/p>\n<h3><b> B\u1ea1n th\u01b0\u1eddng x\u1eed l\u00fd d\u1eef li\u1ec7u b\u1eb1ng Python (pandas, PySpark) hay Scala\/Java? L\u00fd do l\u1ef1a ch\u1ecdn?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><a href=\"https:\/\/itviec.com\/blog\/python-la-gi\/\" target=\"_blank\" rel=\"noopener\"><strong>Python<\/strong><\/a> (pandas, PySpark): D\u1ec5 d\u00f9ng, c\u1ed9ng \u0111\u1ed3ng h\u1ed7 tr\u1ee3 l\u1edbn, th\u01b0 vi\u1ec7n phong ph\u00fa, ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c t\u00e1c v\u1ee5 ph\u00e2n t\u00edch nhanh.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Scala\/<a href=\"https:\/\/itviec.com\/blog\/java-la-gi\/\" target=\"_blank\" rel=\"noopener\"><strong>Java<\/strong><\/a>: Hi\u1ec7u n\u0103ng cao h\u01a1n, m\u1ea1nh m\u1ebd trong x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn, ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c pipeline d\u1eef li\u1ec7u l\u1edbn v\u00e0 ph\u1ee9c t\u1ea1p.<\/span><\/li>\n<\/ul>\n<h3><b> Khi d\u00f9ng PySpark, b\u1ea1n ch\u00fa \u00fd g\u00ec v\u1ec1 lazy evaluation v\u00e0 c\u00e1ch gi\u1ea3m shuffle \u0111\u1ec3 c\u1ea3i thi\u1ec7n hi\u1ec7u n\u0103ng?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Lazy evaluation c\u00f3 l\u1ee3i \u00edch: Spark ch\u1ec9 ch\u1ea1y khi g\u1eb7p action, n\u00ean c\u00f3 th\u1eddi gian t\u1ed1i \u01b0u to\u00e0n b\u1ed9 DAG. Nh\u01b0ng h\u1ea1n ch\u1ebf l\u00e0:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Debug kh\u00f3: l\u1ed7i xu\u1ea5t hi\u1ec7n mu\u1ed9n \u1edf b\u01b0\u1edbc action, kh\u00f3 khoanh v\u00f9ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Job d\u1ed3n c\u1ee5c: nhi\u1ec1u transformation t\u00edch lu\u0303y \u2192 action cu\u1ed1i \u0111\u00f2i RAM cao, d\u1ec5 OOM.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">L\u00ed do ph\u1ea3i gi\u1ea3m shuffle v\u00ec: Shuffle di chuy\u1ec3n d\u1eef li\u1ec7u qua m\u1ea1ng + ghi \u0111\u0129a, t\u1ea1o stage m\u1edbi \u2192 ch\u1eadm v\u00e0 t\u1ed1n t\u00e0i nguy\u00ean.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">H\u1ec7 qu\u1ea3 n\u1ebfu l\u1ea1m d\u1ee5ng:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Network I\/O cao, <\/span><i><span style=\"font-weight: 400;\">spill<\/span><\/i><span style=\"font-weight: 400;\"> sang disk, k\u00e9o d\u00e0i GC.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Data skew l\u00e0m v\u00e0i executor ch\u1ea1y l\u00e2u, to\u00e0n job ch\u1edd, d\u1ec5 timeout.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Chi\u1ebfn l\u01b0\u1ee3c t\u1ed1i \u01b0u hi\u1ec7u n\u0103ng c\u00f3 th\u1ec3 l\u00e0:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Gi\u1eef nguy\u00ean partition key; <\/span><span style=\"font-weight: 400;\">broadcast join<\/span><span style=\"font-weight: 400;\"> cho b\u1ea3ng nh\u1ecf.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Tr\u00e1nh <\/span><span style=\"font-weight: 400;\">groupBy<\/span><span style=\"font-weight: 400;\">, <\/span><span style=\"font-weight: 400;\">distinct<\/span><span style=\"font-weight: 400;\">, <\/span><span style=\"font-weight: 400;\">join<\/span><span style=\"font-weight: 400;\"> kh\u00f4ng c\u1ea7n thi\u1ebft; \u01b0u ti\u00ean <\/span><span style=\"font-weight: 400;\">mapPartitions<\/span><span style=\"font-weight: 400;\">\/<\/span><span style=\"font-weight: 400;\">reduceByKey<\/span><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">D\u00f9ng <\/span><span style=\"font-weight: 400;\">repartition()<\/span><span style=\"font-weight: 400;\"> c\u00f3 ch\u1ee7 \u0111\u00edch (\u0111\u1ea7u job), <\/span><span style=\"font-weight: 400;\">coalesce()<\/span><span style=\"font-weight: 400;\"> thu g\u1ecdn (cu\u1ed1i job).<\/span><\/li>\n<\/ul>\n<h3><b> \u1ede Scala\/Java, b\u1ea1n \u0111\u00e3 t\u1eebng ph\u1ea3i x\u1eed l\u00fd v\u1ea5n \u0111\u1ec1 GC (Garbage Collection) khi x\u1eed l\u00fd data l\u1edbn ch\u01b0a?<\/b><\/h3>\n<p><b>Garbage Collection (GC)<\/b><span style=\"font-weight: 400;\"> l\u00e0 c\u01a1 ch\u1ebf JVM t\u1ef1 \u0111\u1ed9ng thu h\u1ed3i v\u00f9ng nh\u1edb c\u1ee7a c\u00e1c \u0111\u1ed1i t\u01b0\u1ee3ng kh\u00f4ng c\u00f2n \u0111\u01b0\u1ee3c tham chi\u1ebfu.<\/span><\/p>\n<p><b>GC l\u00e0 v\u1ea5n \u0111\u1ec1 v\u1edbi kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u l\u1edbn v\u00ec:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><i><span style=\"font-weight: 400;\">Stop\u2011the\u2011world pauses<\/span><\/i><span style=\"font-weight: 400;\">: GC t\u1ea1m d\u1eebng to\u00e0n b\u1ed9 threads, g\u00e2y \u0111\u1ed9 tr\u1ec5, ch\u1eadm stage Spark\/Hadoop.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">N\u1ebfu heap c\u1ea5u h\u00ecnh sai \u2192 Full GC d\u00e0i, Out\u2011Of\u2011Memory, task retry \u2192 t\u1ed1n th\u1eddi gian &amp; chi ph\u00ed.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Pauses l\u00e0m m\u1ea5t k\u1ebft n\u1ed1i driver\u2013executor, job fail, d\u1eef li\u1ec7u ph\u1ea3i x\u1eed l\u00fd l\u1ea1i.<\/span><\/li>\n<\/ul>\n<p><b>Chi\u1ebfn l\u01b0\u1ee3c x\u1eed l\u00fd GC khi ch\u1ea1y data pipeline Scala\/Java:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>T\u1ed1i \u01b0u JVM<\/b><span style=\"font-weight: 400;\">: <\/span><span style=\"font-weight: 400;\">-Xms\/-Xmx<\/span><span style=\"font-weight: 400;\"> ph\u00f9 h\u1ee3p, ch\u1ecdn GC G1\/ZGC, b\u1eadt <\/span><span style=\"font-weight: 400;\">-XX:+UseStringDeduplication<\/span><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Theo d\u00f5i GC log<\/b><span style=\"font-weight: 400;\">: b\u1eadt <\/span><span style=\"font-weight: 400;\">-Xlog:gc*<\/span><span style=\"font-weight: 400;\"> (JDK &gt;= 9) \u2192 ph\u00e2n t\u00edch b\u1eb1ng GCViewer; ch\u1ec9nh <\/span><span style=\"font-weight: 400;\">NewRatio<\/span><span style=\"font-weight: 400;\">, <\/span><span style=\"font-weight: 400;\">G1HeapRegionSize<\/span><span style=\"font-weight: 400;\"> d\u1ef1a tr\u00ean pattern.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Gi\u1ea3m \u0111\u1ed1i t\u01b0\u1ee3ng t\u1ea1o m\u1edbi<\/b><span style=\"font-weight: 400;\">: d\u00f9ng primitive arrays, <\/span><span style=\"font-weight: 400;\">mapPartitions<\/span><span style=\"font-weight: 400;\"> thay v\u00ec <\/span><span style=\"font-weight: 400;\">map<\/span><span style=\"font-weight: 400;\">, cache ch\u1ecdn l\u1ecdc, t\u00e1i s\u1eed d\u1ee5ng buffer.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n \u0111\u1ea3m b\u1ea3o ch\u1ea5t l\u01b0\u1ee3ng d\u1eef li\u1ec7u (Data Quality) nh\u01b0 th\u1ebf n\u00e0o xuy\u00ean su\u1ed1t pipeline?<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">S\u1eed d\u1ee5ng validation rule, schema enforcement t\u1ea1i b\u01b0\u1edbc nh\u1eadp li\u1ec7u.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Ki\u1ec3m tra t\u00ednh to\u00e0n v\u1eb9n v\u00e0 nh\u1ea5t qu\u00e1n c\u1ee7a d\u1eef li\u1ec7u th\u00f4ng qua c\u00e1c b\u01b0\u1edbc x\u1eed l\u00fd trung gian.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">\u0110\u1ecbnh k\u1ef3 audit d\u1eef li\u1ec7u \u0111\u1ec3 ph\u00e1t hi\u1ec7n s\u1edbm c\u00e1c v\u1ea5n \u0111\u1ec1.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">T\u00edch h\u1ee3p t\u1ef1 \u0111\u1ed9ng c\u00e1c c\u1ea3nh b\u00e1o khi d\u1eef li\u1ec7u kh\u00f4ng \u0111\u1ea1t chu\u1ea9n ch\u1ea5t l\u01b0\u1ee3ng.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n c\u00f3 d\u00f9ng c\u00f4ng c\u1ee5 n\u00e0o cho data validation hay t\u1ef1 vi\u1ebft script?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">B\u1ea1n c\u00f3 th\u1ec3 chia s\u1ebb v\u1ec1 2 c\u00f4ng c\u1ee5 data validation l\u00e0 Great Expectations v\u00e0 Soda:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Great Expectations: h\u1ed7 tr\u1ee3 validation rules m\u1ea1nh m\u1ebd, d\u1ec5 d\u00e0ng t\u00edch h\u1ee3p v\u00e0o pipeline.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Soda: th\u00e2n thi\u1ec7n, giao di\u1ec7n t\u1ed1t, ph\u00f9 h\u1ee3p cho ng\u01b0\u1eddi d\u00f9ng kh\u00f4ng chuy\u00ean s\u00e2u k\u1ef9 thu\u1eadt.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">N\u1ebfu t\u1ef1 vi\u1ebft script, b\u1ea1n c\u00f3 th\u1ec3 chia s\u1ebb tr\u01b0\u1eddng h\u1ee3p s\u1eed d\u1ee5ng: khi c\u1ea7n linh ho\u1ea1t, t\u00f9y ch\u1ec9nh s\u00e2u h\u01a1n theo y\u00eau c\u1ea7u \u0111\u1eb7c bi\u1ec7t c\u1ee7a d\u1ef1 \u00e1n.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Data_Engineer_ve_Kinh_nghiem_thuc_chien_Ky_nang_coding\"><\/span><b>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer v\u1ec1 Kinh nghi\u1ec7m th\u1ef1c chi\u1ebfn &amp; K\u1ef9 n\u0103ng coding<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">L\u00fd thuy\u1ebft t\u1ed1t th\u00f4i ch\u01b0a \u0111\u1ee7 &#8211; ph\u1ea7n n\u00e0y l\u00e0 n\u01a1i b\u1ea1n k\u1ec3 l\u1ea1i tr\u1ea3i nghi\u1ec7m th\u1ef1c t\u1ebf c\u1ee7a m\u00ecnh, nh\u1eefng d\u1ef1 \u00e1n b\u1ea1n t\u1eebng l\u00e0m, c\u00e1ch b\u1ea1n gi\u1ea3i quy\u1ebft v\u1ea5n \u0111\u1ec1, v\u00e0 b\u00e0i h\u1ecdc b\u1ea1n r\u00fat ra. Nh\u00e0 tuy\u1ec3n d\u1ee5ng r\u1ea5t \u0111\u00e1nh gi\u00e1 cao c\u00e1c t\u00ecnh hu\u1ed1ng c\u1ee5 th\u1ec3, v\u00ec ch\u00fang ph\u1ea3n \u00e1nh ch\u00e2n th\u1ef1c nh\u1ea5t n\u0103ng l\u1ef1c v\u00e0 t\u01b0 duy Data Engineering c\u1ee7a b\u1ea1n. \u0110\u00e2y c\u0169ng l\u00e0 ph\u1ea7n b\u1ea1n c\u00f3 th\u1ec3 ghi \u0111i\u1ec3m l\u1edbn nh\u1ea5t n\u1ebfu chu\u1ea9n b\u1ecb t\u1ed1t.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">L\u01b0u \u00fd: \u1ede ph\u1ea7n n\u00e0y, t\u00f4i s\u1ebd chia s\u1ebb m\u1ed9t v\u00ed d\u1ee5 cho m\u1ed9t d\u1ef1 \u00e1n th\u1ef1c t\u1ebf m\u00e0 b\u1ea1n c\u00f3 th\u1ec3 tham kh\u1ea3o, nh\u01b0ng t\u1ed1t nh\u1ea5t l\u00e0 b\u1ea1n v\u1eabn n\u00ean t\u1ef1 th\u1ef1c h\u00e0nh m\u1ed9t d\u1ef1 \u00e1n c\u00f3 code ho\u00e0n ch\u1ec9nh \u0111\u1ec3 d\u1ec5 tr\u1ea3 l\u1eddi khi nh\u00e0 tuy\u1ec3n d\u1ee5ng h\u1ecfi chuy\u00ean s\u00e2u.<\/span><\/i><\/p><\/blockquote>\n<h3><b> B\u1ea1n c\u00f3 th\u1ec3 chia s\u1ebb m\u1ed9t d\u1ef1 \u00e1n \u0111i\u1ec3n h\u00ecnh b\u1ea1n t\u1eebng x\u00e2y d\u1ef1ng pipeline t\u1eeb ingest \u2192 transform \u2192 load \u2192 b\u00e1o c\u00e1o?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd c\u00e1ch tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">M\u1edf \u0111\u1ea7u b\u1eb1ng vi\u1ec7c m\u00f4 t\u1ea3 \u0111\u1ea7u v\u00e0o c\u1ee7a pipeline: ngu\u1ed3n d\u1eef li\u1ec7u \u0111\u1ebfn t\u1eeb \u0111\u00e2u (API, log, file batch&#8230;)?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">N\u00eau r\u00f5 kinh nghi\u1ec7m \u1edf t\u1eebng giai \u0111o\u1ea1n:\u00a0<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">B\u1ea1n \u0111\u00e3 s\u1eed d\u1ee5ng c\u00f4ng ngh\u1ec7 n\u00e0o \u0111\u1ec3 ingest d\u1eef li\u1ec7u? (Kafka, Flink, API Gateway&#8230;)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">B\u1ea1n \u0111\u00e3 x\u1eed l\u00fd (transform) d\u1eef li\u1ec7u ra sao? C\u00f3 l\u00e0m s\u1ea1ch, enrich, join nhi\u1ec1u ngu\u1ed3n kh\u00f4ng?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">D\u1eef li\u1ec7u cu\u1ed1i c\u00f9ng \u0111\u01b0\u1ee3c load v\u00e0o \u0111\u00e2u (Data Warehouse, Data Lake)?<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Cu\u1ed1i c\u00f9ng, dashboard\/report \u0111\u01b0\u1ee3c t\u1ea1o nh\u01b0 th\u1ebf n\u00e0o? D\u00f9ng c\u00f4ng c\u1ee5 g\u00ec?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00f3 kh\u1ea3 n\u0103ng tr\u00ecnh b\u00e0y end-to-end pipeline r\u00f5 r\u00e0ng, c\u00f3 logic.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">S\u1ef1 hi\u1ec3u bi\u1ebft v\u1ec1 c\u00f4ng ngh\u1ec7 v\u00e0 l\u00fd do l\u1ef1a ch\u1ecdn c\u00f4ng ngh\u1ec7 ph\u00f9 h\u1ee3p.<\/span><\/li>\n<\/ul>\n<p><b>V\u00ed d\u1ee5 d\u1ef1 \u00e1n tham kh\u1ea3o:\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Ph\u1ec5u Th\u01b0\u01a1ng m\u1ea1i \u0111i\u1ec7n t\u1eed Th\u1eddi Gian Th\u1ef1c &#8211; h\u1ec7 th\u1ed1ng thu th\u1eadp log click v\u00e0 \u0111\u01a1n h\u00e0ng t\u1ee9c th\u1eddi, l\u00e0m s\u1ea1ch-l\u00e0m gi\u00e0u tr\u00ean Spark Streaming, ghi Delta Lake r\u1ed3i tr\u1ef1c quan ho\u00e1 funnel &amp; cohort ngay l\u1eadp t\u1ee9c tr\u00ean Tableau.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Ngu\u1ed3n<\/b><span style=\"font-weight: 400;\">: log click\u2011stream (Web &amp; App) + REST API \u0111\u01a1n h\u00e0ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Ingest<\/b><span style=\"font-weight: 400;\">: Kafka Connect (Debezium cho MySQL, HTTP Source cho API), partition theo <\/span><span style=\"font-weight: 400;\">event_date<\/span><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Transform<\/b><span style=\"font-weight: 400;\">: Spark Structured Streaming tr\u00ean Databricks \u2014 chu\u1ea9n ho\u00e1 UTC, dedup, enrich v\u1edbi user\u2011profile, join click \u2194 order.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Load<\/b><span style=\"font-weight: 400;\">: ghi Delta Lake tr\u00ean S3 \u2192 Snowflake b\u1eb1ng Snowpipe.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Report<\/b><span style=\"font-weight: 400;\">: Tableau live\u2011query Snowflake hi\u1ec3n th\u1ecb funnel &amp; cohort.<\/span><\/li>\n<\/ul>\n<p><b>L\u00fd do ch\u1ecdn c\u00e1c c\u00f4ng ngh\u1ec7 n\u00e0y:<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Ch\u1ecdn Kafka + Kafka Connect, Spark Structured Streaming, Delta Lake, Snowflake \/ Snowpipe v\u00e0 Tableau v\u00ec t\u1ea5t c\u1ea3 \u0111\u1ec1u c\u00f3 b\u1ea3n OSS ho\u1eb7c g\u00f3i free-trial &#8211; c\u00f3 th\u1ec3 spin-up Docker tr\u00ean laptop 8 GB RAM \u0111\u1ec3 ingest log, x\u1eed l\u00fd b\u1eb1ng Spark local mode, l\u01b0u Delta file v\u00e0o \u1ed5 \u0111\u0129a, r\u1ed3i d\u00f9ng Snowflake free tier (5 GB) l\u00e0m warehouse v\u00e0 Tableau Public \u0111\u1ec3 tr\u1ef1c quan h\u00f3a.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Nh\u1edd v\u1eady m\u00e0 ng\u01b0\u1eddi m\u1edbi, hay d\u1ef1 \u00e1n c\u00f3 kinh ph\u00ed h\u1ea1n ch\u1ebf v\u1eabn d\u1ef1ng \u0111\u01b0\u1ee3c pipeline m\u1eabu cho portfolio, d\u1ec5 h\u1ecdc, d\u1ec5 m\u1edf r\u1ed9ng, kh\u00f4ng t\u1ed1n chi ph\u00ed ph\u1ea7n c\u1ee9ng n\u1eb7ng.<\/span><\/p>\n<h3><b> Cho v\u00ed d\u1ee5 v\u1ec1 vi\u1ec7c b\u1ea1n t\u1ed1i \u01b0u pipeline th\u00e0nh c\u00f4ng (gi\u1ea3m chi ph\u00ed ho\u1eb7c t\u0103ng t\u1ed1c \u0111\u1ed9 x\u1eed l\u00fd). B\u1ea1n \u0111\u00e3 l\u00e0m g\u00ec?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Tr\u00ecnh b\u00e0y hi\u1ec7n tr\u1ea1ng pipeline tr\u01b0\u1edbc khi t\u1ed1i \u01b0u.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">M\u00f4 t\u1ea3 c\u1ee5 th\u1ec3 thay \u0111\u1ed5i b\u1ea1n th\u1ef1c hi\u1ec7n (chuy\u1ec3n batch \u2192 streaming, caching, indexing, parallelism&#8230;)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">K\u1ebft qu\u1ea3 \u0111o l\u01b0\u1eddng c\u1ea3i thi\u1ec7n: latency gi\u1ea3m bao nhi\u00eau ph\u1ea7n tr\u0103m, chi ph\u00ed h\u1ea1 t\u1ea7ng gi\u1ea3m bao nhi\u00eau?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">T\u01b0 duy t\u1ed1i \u01b0u h\u00f3a hi\u1ec7u qu\u1ea3 v\u1ec1 m\u1eb7t th\u1eddi gian, chi ph\u00ed.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Bi\u1ebft s\u1eed d\u1ee5ng metrics \u0111\u1ec3 ch\u1ee9ng minh hi\u1ec7u qu\u1ea3 t\u1ed1i \u01b0u.<\/span><\/li>\n<\/ul>\n<p><b>V\u00ed d\u1ee5 d\u1ef1 \u00e1n tham kh\u1ea3o:\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Gi\u1ea3m Chi Ph\u00ed Batch \u2192 Streaming: chuy\u1ec3n pipeline x\u1eed l\u00fd 6 gi\u1edd\/l\u1ea7n sang lu\u1ed3ng streaming Parquet k\u00e8m cache Redis, gi\u00fap h\u1ea1 latency xu\u1ed1ng 35 ph\u00fat v\u00e0 c\u1eaft 52 % chi ph\u00ed EMR.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Tr\u01b0\u1edbc<\/b><span style=\"font-weight: 400;\">: Spark batch 6 h \/ l\u1ea7n, l\u01b0u CSV; latency ~4 h, 12 EC2 r5.4x large.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Thay \u0111\u1ed5i<\/b><span style=\"font-weight: 400;\">: chuy\u1ec3n streaming, \u0111\u1ed5i CSV \u2192 Parquet, b\u1eadt auto\u2011scaling, cache dimension nh\u1ecf tr\u00ean Redis.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n \u0111\u00e3 bao gi\u1edd g\u1eb7p t\u00ecnh hu\u1ed1ng d\u1eef li\u1ec7u b\u1ecb thi\u1ebfu\/sai nhi\u1ec1u? C\u00e1ch b\u1ea1n nh\u1eadn di\u1ec7n v\u00e0 x\u1eed l\u00fd ra sao?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">M\u00f4 t\u1ea3 c\u1ee5 th\u1ec3 t\u00ecnh hu\u1ed1ng (d\u1eef li\u1ec7u sai format, null b\u1ea5t th\u01b0\u1eddng, duplicate&#8230;)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n ph\u00e1t hi\u1ec7n ra v\u1ea5n \u0111\u1ec1 b\u1eb1ng c\u00e1ch n\u00e0o (alert, test, log)?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00e1ch b\u1ea1n x\u1eed l\u00fd: rule l\u00e0m s\u1ea1ch, k\u1ef9 thu\u1eadt h\u1ed3i ph\u1ee5c, li\u00ean h\u1ec7 team kh\u00e1c&#8230;<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">K\u1ef9 n\u0103ng ph\u00e1t hi\u1ec7n l\u1ed7i d\u1eef li\u1ec7u t\u1ef1 \u0111\u1ed9ng ho\u1eb7c b\u1eb1ng c\u00f4ng c\u1ee5.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Quy tr\u00ecnh x\u1eed l\u00fd v\u00e0 c\u1ea3i ti\u1ebfn pipeline \u0111\u1ec3 tr\u00e1nh l\u1ed7i l\u1eb7p l\u1ea1i.<\/span><\/li>\n<\/ul>\n<p><b>V\u00ed d\u1ee5 d\u1ef1 \u00e1n tham kh\u1ea3o: <\/b><span style=\"font-weight: 400;\">Kh\u00f4i ph\u1ee5c c\u1ed9t gi\u00e1, ph\u00e1t hi\u1ec7n \u0111\u1ed9t bi\u1ebfn null \u1edf tr\u01b0\u1eddng <\/span><span style=\"font-weight: 400;\">price<\/span><span style=\"font-weight: 400;\"> qua Great Expectations, v\u00e1 API, backfill Kafka ba ng\u00e0y v\u00e0 b\u1ed5 sung ki\u1ec3m th\u1eed schema \u0111\u1ec3 ng\u1eeba l\u1eb7p l\u1ea1i.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>S\u1ef1 c\u1ed1<\/b><span style=\"font-weight: 400;\">: c\u1ed9t <\/span><span style=\"font-weight: 400;\">price<\/span><span style=\"font-weight: 400;\"> th\u1ec9nh tho\u1ea3ng null v\u00ec upstream b\u1ecf field khi = 0.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Ph\u00e1t hi\u1ec7n<\/b><span style=\"font-weight: 400;\">: Great Expectations rule \u201cnon\u2011null &gt; 99.5 %\u201d g\u1eedi alert Slack.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>X\u1eed l\u00fd<\/b><span style=\"font-weight: 400;\">: update API tr\u1ea3 0, backfill 3 ng\u00e0y t\u1eeb raw Kafka, th\u00eam JSON Schema contract &amp; unit test.<\/span><\/li>\n<\/ul>\n<h3><b> V\u1edbi SQL, b\u1ea1n th\u01b0\u1eddng g\u1eb7p kh\u00f3 kh\u0103n nh\u1ea5t \u1edf ph\u1ea7n n\u00e0o (window functions, CTE, pivot\u2026)? L\u00e0m sao \u0111\u1ec3 v\u01b0\u1ee3t qua?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">X\u00e1c \u0111\u1ecbnh k\u1ef9 thu\u1eadt SQL b\u1ea1n t\u1eebng g\u1eb7p kh\u00f3 kh\u0103n.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">T\u00ecm v\u00e0 chia s\u1ebb m\u1ed9t v\u00ed d\u1ee5 c\u1ee5 th\u1ec3 b\u1ea1n \u0111\u00e3 gi\u1ea3i quy\u1ebft th\u00e0nh c\u00f4ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n h\u1ecdc t\u1eeb \u0111\u00e2u, \u00e1p d\u1ee5ng nh\u01b0 th\u1ebf n\u00e0o \u0111\u1ec3 v\u01b0\u1ee3t qua v\u1ea5n \u0111\u1ec1 \u0111\u00f3?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00f3 t\u01b0 duy ph\u1ea3n bi\u1ec7n v\u00e0 h\u1ecdc h\u1ecfi ch\u1ee7 \u0111\u1ed9ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Bi\u1ebft \u00e1p d\u1ee5ng SQL n\u00e2ng cao \u0111\u00fang ng\u1eef c\u1ea3nh.<\/span><\/li>\n<\/ul>\n<p><b>V\u00ed d\u1ee5 m\u1eabu tr\u1ea3 l\u1eddi ph\u1ecfng v\u1ea5n, h\u00e3y ch\u1ec9nh s\u1eeda theo kinh nghi\u1ec7m c\u00e1 nh\u00e2n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Kh\u00f3 kh\u0103n c\u1ee7a t\u00f4i n\u1eb1m \u1edf ph\u1ea7n Window functions t\u1eebng l\u00e0m t\u00f4i xoay x\u1edf m\u00e3i. Khi c\u1ea7n l\u1ea5y b\u1ea3n ghi m\u1edbi nh\u1ea5t c\u1ee7a m\u1ed7i user, t\u00f4i chuy\u1ec3n t\u1eeb sub-query sang <\/span><span style=\"font-weight: 400;\">row_number()<\/span><span style=\"font-weight: 400;\"> trong CTE, gi\u1ea3m th\u1eddi gian t\u1eeb 12s xu\u1ed1ng 3s. B\u00e0i h\u1ecdc r\u00fat ra l\u00e0: Lu\u00f4n th\u1eed tr\u00ean data nh\u1ecf, benchmark r\u1ed3i ghi ch\u00fa l\u1ea1i.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">T\u00f4i ch\u01b0a g\u1eb7p nhi\u1ec1u kh\u00f3 kh\u0103n, nh\u01b0ng \u0111\u1ec3 l\u00e0m t\u1ed1t v\u1edbi Window functions, m\u1ed7i t\u1ed1i t\u00f4i h\u1ecdc n\u1eeda ti\u1ebfng kh\u00f3a Advanced SQL, th\u1ef1c h\u00e0nh <\/span><span style=\"font-weight: 400;\">lag()<\/span><span style=\"font-weight: 400;\"> v\u00e0 <\/span><span style=\"font-weight: 400;\">rank()<\/span><span style=\"font-weight: 400;\"> tr\u00ean b\u00e1o c\u00e1o c\u0169 r\u1ed3i nh\u1edd Senior review. M\u1ee5c ti\u00eau c\u1ee7a t\u00f4i l\u00e0 hai th\u00e1ng n\u1eefa \u0111\u01b0a m\u1ed9t truy v\u1ea5n window v\u00e0o production.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n c\u00f3 tham gia gi\u1ea3i b\u00e0i thu\u1eadt to\u00e1n\/c\u1ea5u tr\u00fac d\u1eef li\u1ec7u (HackerRank, LeetCode) cho v\u1ecb tr\u00ed Data Engineer kh\u00f4ng? B\u1ea1n g\u1eb7p d\u1ea1ng b\u00e0i g\u00ec?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">K\u1ec3 t\u00ean platform v\u00e0 s\u1ed1 l\u01b0\u1ee3ng b\u00e0i b\u1ea1n \u0111\u00e3 l\u00e0m.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00e1c ch\u1ee7 \u0111\u1ec1 hay g\u1eb7p: m\u1ea3ng, chu\u1ed7i, heap, hashmap, two pointers&#8230;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n c\u00f3 chi\u1ebfn l\u01b0\u1ee3c luy\u1ec7n t\u1eadp nh\u01b0 th\u1ebf n\u00e0o? Theo ch\u1ee7 \u0111\u1ec1, theo m\u1ee9c \u0111\u1ed9?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00f3 \u0111\u1ea7u t\u01b0 nghi\u00eam t\u00fac \u0111\u1ec3 r\u00e8n luy\u1ec7n k\u1ef9 n\u0103ng coding.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Hi\u1ec3u r\u00f5 m\u1ee5c ti\u00eau c\u1ee7a c\u00e1c v\u00f2ng technical test.<\/span><\/li>\n<\/ul>\n<p><b>V\u00ed d\u1ee5 m\u1eabu tr\u1ea3 l\u1eddi ph\u1ecfng v\u1ea5n, h\u00e3y ch\u1ec9nh s\u1eeda theo kinh nghi\u1ec7m c\u00e1 nh\u00e2n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">N\u1ebfu c\u00f3 nhi\u1ec1u kinh nghi\u1ec7m: T\u00f4i gi\u1ea3i kho\u1ea3ng 250 b\u00e0i LeetCode, t\u1eadp trung heap, hashmap v\u00e0 sliding-window; luy\u1ec7n ba tu\u1ea7n theo ch\u1ee7 \u0111\u1ec1, tu\u1ea7n cu\u1ed1i mock test 90 ph\u00fat. V\u00f2ng coding \u1edf c\u00f4ng ty X ho\u00e0n th\u00e0nh trong 35 ph\u00fat, \u0111\u1ea1t \u0111i\u1ec3m t\u1ed1i \u0111a.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">N\u1ebfu ch\u01b0a l\u00e0m nhi\u1ec1u: T\u00f4i \u0111\u00e3 l\u00e0m 30 b\u00e0i LeetCode; \u01b0u ti\u00ean two-pointers v\u00e0 heap v\u00ec s\u00e1t b\u00e0i test Data Engineer. \u0110\u1eb7t m\u1ee5c ti\u00eau 10 b\u00e0i\/tu\u1ea7n v\u00e0 vi\u1ebft blog ng\u1eafn ghi l\u1ea1i c\u00e1ch gi\u1ea3i \u0111\u1ec3 l\u00ean tay.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n c\u00f3 d\u1ef1 \u00e1n c\u00e1 nh\u00e2n n\u00e0o (portfolio) v\u1ec1 ETL, Streaming, hay Machine Learning Pipeline c\u00f3 th\u1ec3 chia s\u1ebb kh\u00f4ng?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">M\u00f4 t\u1ea3 ng\u1eafn g\u1ecdn v\u1ec1 b\u00e0i to\u00e1n, d\u1eef li\u1ec7u, c\u00f4ng c\u1ee5 s\u1eed d\u1ee5ng.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Pipeline g\u1ed3m nh\u1eefng b\u01b0\u1edbc n\u00e0o? X\u1eed l\u00fd batch hay streaming?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Link GitHub ho\u1eb7c demo n\u1ebfu c\u00f3.<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00f3 ch\u1ee7 \u0111\u1ed9ng h\u1ecdc qua d\u1ef1 \u00e1n th\u1ef1c t\u1ebf.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Kh\u1ea3 n\u0103ng tr\u00ecnh b\u00e0y r\u00f5 r\u00e0ng, t\u1ef1 tin v\u1ec1 s\u1ea3n ph\u1ea9m c\u1ee7a m\u00ecnh.<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t v\u00ed d\u1ee5 v\u1ec1 th\u1ef1c h\u00e0nh:<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Trong d\u1ef1 \u00e1n n\u00e0y, m\u1ee5c ti\u00eau c\u1ee7a t\u00f4i l\u00e0 cung c\u1ea5p cho trader m\u1ed9t ch\u1ec9 b\u00e1o \u201cnhi\u1ec7t \u0111\u1ed9 c\u1ea3m x\u00fac th\u1ecb tr\u01b0\u1eddng\u201d (bullish hay bearish) c\u00e0ng g\u1ea7n th\u1eddi gian th\u1ef1c c\u00e0ng t\u1ed1t, t\u1eeb \u0111\u00f3 h\u1ed7 tr\u1ee3 h\u1ecd ra quy\u1ebft \u0111\u1ecbnh giao d\u1ecbch.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">M\u00ecnh x\u00e2y h\u1ec7 th\u1ed1ng \u0111o sentiment crypto g\u1ea7n realtime cho trader. Tweet ch\u1ee9a 50 t\u1eeb kho\u00e1 \u0111\u01b0\u1ee3c h\u00fat t\u1eeb <\/span><b>Twitter Streaming API<\/b><span style=\"font-weight: 400;\">, gi\u00e1 kh\u1edbp l\u1ec7nh l\u1ea5y qua <\/span><b>Binance WebSocket<\/b><span style=\"font-weight: 400;\">, c\u1ea3 hai \u0111\u1ea9y v\u00e0o <\/span><b>Kafka<\/b><span style=\"font-weight: 400;\"> (6 partition) \u0111\u1ec3 ch\u1ed1ng ngh\u1ebdn. <\/span><b>Flink SQL<\/b><span style=\"font-weight: 400;\"> l\u00e0m s\u1ea1ch tweet, g\u1ecdi <\/span><b>BERT\u2011sentiment<\/b><span style=\"font-weight: 400;\"> trong UDF, r\u1ed3i join c\u1eeda s\u1ed5 30 s v\u1edbi d\u00f2ng gi\u00e1; k\u1ebft qu\u1ea3 stream ghi v\u00e0o <\/span><b>ClickHouse<\/b><span style=\"font-weight: 400;\"> (partition theo ph\u00fat, TTL 180 ng\u00e0y). Airflow cu\u1ed1i ng\u00e0y xu\u1ea5t Parquet l\u00ean S3 + ch\u1ea1y <\/span><b>dbt<\/b><span style=\"font-weight: 400;\"> cho ph\u00e2n t\u00edch offline. <\/span><b>Grafana<\/b><span style=\"font-weight: 400;\"> \u0111\u1ecdc ClickHouse hi\u1ec3n th\u1ecb heat\u2011map c\u1ea3m x\u00fac ch\u1ed3ng gi\u00e1, b\u1eafn alert Telegram khi FUD &gt; 0.7; <\/span><b>Tableau<\/b><span style=\"font-weight: 400;\"> d\u00f9ng layer batch v\u1ebd t\u01b0\u01a1ng quan 7 ng\u00e0y<\/span><\/p>\n<h3><b> B\u1ea1n \u0111\u00e3 t\u1eebng l\u00e0m vi\u1ec7c v\u1edbi version control cho d\u1eef li\u1ec7u ch\u01b0a? B\u1ea1n d\u00f9ng c\u00f4ng c\u1ee5 g\u00ec \u0111\u1ec3 qu\u1ea3n l\u00fd schema\/data versioning?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">\u0110\u00e3 d\u00f9ng Delta Lake, Apache Hudi ho\u1eb7c c\u00e1c c\u00f4ng c\u1ee5 t\u01b0\u01a1ng t\u1ef1 ch\u01b0a?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n qu\u1ea3n l\u00fd version schema nh\u01b0 th\u1ebf n\u00e0o khi c\u00f3 thay \u0111\u1ed5i?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">L\u00e0m sao \u0111\u1ea3m b\u1ea3o backward\/forward compatibility?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Bi\u1ebft \u00e1p d\u1ee5ng c\u00e1c c\u00f4ng c\u1ee5 hi\u1ec7n \u0111\u1ea1i \u0111\u1ec3 qu\u1ea3n l\u00fd thay \u0111\u1ed5i d\u1eef li\u1ec7u.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Hi\u1ec3u r\u00f5 t\u1ea7m quan tr\u1ecdng c\u1ee7a versioning trong m\u00f4i tr\u01b0\u1eddng production.<\/span><\/li>\n<\/ul>\n<p><b>V\u00ed d\u1ee5 m\u1eabu tr\u1ea3 l\u1eddi ph\u1ecfng v\u1ea5n, h\u00e3y ch\u1ec9nh s\u1eeda theo kinh nghi\u1ec7m c\u00e1 nh\u00e2n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">N\u1ebfu \u0111\u00e3 t\u1eebng l\u00e0m: Trong d\u1ef1 \u00e1n marketing-analytics, t\u00f4i d\u00f9ng Delta Lake + DBT. M\u1ed7i thay \u0111\u1ed5i schema qua PR, CI ki\u1ec3m tra breaking change; nh\u1edd time-travel t\u00f4i t\u1eebng rollback b\u1ea3ng fact v\u1ec1 b\u1ea3n 01:00 12-04-2025 ch\u1ec9 trong hai ph\u00fat.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Tr\u01b0\u1eddng h\u1ee3p ch\u01b0a l\u00e0m nhi\u1ec1u: T\u00f4i ch\u1ec9 qu\u1ea3n l\u00fd script migration tr\u00ean Git. T\u00f4i \u0111ang d\u1ef1ng lab Iceberg tr\u00ean Docker \u0111\u1ec3 th\u1eed snapshot; qu\u00fd t\u1edbi \u0111\u1ec1 xu\u1ea5t d\u00f9ng Delta Lake cho b\u1ea3ng s\u1ef1 ki\u1ec7n, b\u1eaft \u0111\u1ea7u t\u1eeb staging r\u1ed3i m\u1edbi l\u00ean production.<\/span><\/li>\n<\/ul>\n<h3><b> B\u1ea1n c\u00f3 kinh nghi\u1ec7m l\u00e0m vi\u1ec7c v\u1edbi CI\/CD cho pipeline d\u1eef li\u1ec7u kh\u00f4ng? Quy tr\u00ecnh b\u1ea1n x\u00e2y d\u1ef1ng nh\u01b0 th\u1ebf n\u00e0o?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">M\u00f4 t\u1ea3 tool b\u1ea1n s\u1eed d\u1ee5ng (GitHub Actions, GitLab CI, dbt Cloud&#8230;)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Pipeline build\/test\/deploy c\u00f3 b\u01b0\u1edbc n\u00e0o?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n validate d\u1eef li\u1ec7u hay test code nh\u01b0 th\u1ebf n\u00e0o?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00f3 t\u01b0 duy DevOps cho h\u1ec7 th\u1ed1ng d\u1eef li\u1ec7u.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Bi\u1ebft \u0111\u1ea3m b\u1ea3o ch\u1ea5t l\u01b0\u1ee3ng khi release pipeline m\u1edbi.<\/span><\/li>\n<\/ul>\n<h3><b> Khi vi\u1ebft script x\u1eed l\u00fd d\u1eef li\u1ec7u, b\u1ea1n l\u00e0m g\u00ec \u0111\u1ec3 \u0111\u1ea3m b\u1ea3o kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng v\u00e0 b\u1ea3o tr\u00ec d\u1ec5 d\u00e0ng v\u1ec1 sau?<\/b><\/h3>\n<p><b>G\u1ee3i \u00fd tr\u1ea3 l\u1eddi:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n c\u00f3 chia module, t\u00e1ch logic, vi\u1ebft function reusable kh\u00f4ng?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">B\u1ea1n c\u00f3 logging, error handling kh\u00f4ng?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">C\u00f3 vi\u1ebft t\u00e0i li\u1ec7u\/README k\u00e8m theo kh\u00f4ng?<\/span><\/li>\n<\/ul>\n<p><b>M\u1ed9t c\u00e2u tr\u1ea3 l\u1eddi t\u1ed1t c\u1ea7n th\u1ec3 hi\u1ec7n:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Tu\u00e2n th\u1ee7 clean code v\u00e0 best practices.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">H\u01b0\u1edbng \u0111\u1ebfn s\u1ea3n ph\u1ea9m d\u1ec5 chuy\u1ec3n giao v\u00e0 m\u1edf r\u1ed9ng b\u1edfi ng\u01b0\u1eddi kh\u00e1c.<\/span><\/li>\n<\/ul>\n<blockquote><p><em>\u0110\u1ecdc th\u00eam: <a href=\"https:\/\/itviec.com\/blog\/ci-cd-la-gi\/\" target=\"_blank\" rel=\"noopener\"><strong>CI\/CD l\u00e0 g\u00ec? L\u1ee3i \u00edch v\u00e0 c\u00e1c nguy\u00ean t\u1eafc tri\u1ec3n khai CI\/CD v\u00e0o quy tr\u00ecnh ph\u00e1t tri\u1ec3n ph\u1ea7n m\u1ec1m<\/strong><\/a><\/em><\/p><\/blockquote>\n<h2><span class=\"ez-toc-section\" id=\"Tong_ket\"><\/span><b>T\u1ed5ng k\u1ebft<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Chu\u1ea9n b\u1ecb cho m\u1ed9t bu\u1ed5i ph\u1ecfng v\u1ea5n Data Engineer kh\u00f4ng ch\u1ec9 l\u00e0 h\u1ecdc thu\u1ed9c ki\u1ebfn th\u1ee9c hay nh\u1edb c\u00e1c c\u00e2u tr\u1ea3 l\u1eddi m\u1eabu, m\u00e0 l\u00e0 h\u00e0nh tr\u00ecnh b\u1ea1n x\u00e2y d\u1ef1ng n\u1ec1n t\u1ea3ng t\u01b0 duy h\u1ec7 th\u1ed1ng, logic v\u00e0 kh\u1ea3 n\u0103ng ph\u1ea3n x\u1ea1 trong nh\u1eefng t\u00ecnh hu\u1ed1ng th\u1ef1c t\u1ebf. T\u1eeb SQL c\u01a1 b\u1ea3n, c\u1ea5u tr\u00fac d\u1eef li\u1ec7u, \u0111\u1ebfn ki\u1ebfn tr\u00fac Big Data, Cloud v\u00e0 quy tr\u00ecnh ETL ph\u1ee9c t\u1ea1p &#8211; m\u1ecdi kh\u00eda c\u1ea1nh \u0111\u1ec1u c\u00f3 th\u1ec3 l\u00e0 m\u1ea3nh gh\u00e9p t\u1ea1o n\u00ean s\u1ef1 kh\u00e1c bi\u1ec7t trong bu\u1ed5i ph\u1ecfng v\u1ea5n.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u0110i\u1ec1u quan tr\u1ecdng nh\u1ea5t l\u00e0: <\/span><b>b\u1ea1n kh\u00f4ng c\u1ea7n ph\u1ea3i bi\u1ebft m\u1ecdi th\u1ee9, nh\u01b0ng ph\u1ea3i th\u1eadt s\u1ef1 hi\u1ec3u nh\u1eefng g\u00ec b\u1ea1n \u0111\u00e3 l\u00e0m v\u00e0 \u0111\u00e3 h\u1ecdc<\/b><span style=\"font-weight: 400;\">. H\u00e3y d\u00e0nh th\u1eddi gian x\u00e2y d\u1ef1ng m\u1ed9t portfolio th\u1eadt ch\u1ea5t l\u01b0\u1ee3ng, r\u00e8n luy\u1ec7n k\u1ef9 n\u0103ng gi\u1ea3i quy\u1ebft v\u1ea5n \u0111\u1ec1, v\u00e0 lu\u00f4n c\u1eadp nh\u1eadt c\u00f4ng ngh\u1ec7 m\u1edbi. D\u00f9 b\u1ea1n \u0111ang l\u00e0 sinh vi\u00ean m\u1edbi ra tr\u01b0\u1eddng hay \u0111\u00e3 c\u00f3 kinh nghi\u1ec7m, m\u1ed9t th\u00e1i \u0111\u1ed9 h\u1ecdc h\u1ecfi nghi\u00eam t\u00fac v\u00e0 s\u1ef1 ki\u00ean tr\u00ec s\u1ebd lu\u00f4n d\u1eabn b\u1ea1n \u0111\u1ebfn \u0111\u00fang c\u01a1 h\u1ed9i.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ch\u00fac b\u1ea1n t\u00ecm th\u1ea5y th\u1eadt nhi\u1ec1u gi\u00e1 tr\u1ecb h\u1eefu \u00edch trong b\u00e0i vi\u1ebft n\u00e0y, v\u00e0 s\u1edbm chinh ph\u1ee5c \u0111\u01b0\u1ee3c c\u00f4ng vi\u1ec7c m\u01a1 \u01b0\u1edbc trong l\u0129nh v\u1ef1c Data Engineering. H\u00e3y t\u1ef1 tin, t\u00edch c\u1ef1c v\u00e0 \u0111\u1eebng qu\u00ean r\u1eb1ng &#8211; <\/span><b>m\u1ecdi k\u1ef9 s\u01b0 gi\u1ecfi \u0111\u1ec1u t\u1eebng b\u1eaft \u0111\u1ea7u t\u1eeb nh\u1eefng d\u00f2ng d\u1eef li\u1ec7u \u0111\u1ea7u ti\u00ean.<\/b><b><\/b><\/p>\n<blockquote><p><em>\u0110\u1ecdc th\u00eam: <a href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-analyst\/\" target=\"_blank\" rel=\"noopener\"><strong>Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Analyst th\u01b0\u1eddng g\u1eb7p<\/strong><\/a><\/em><\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Theo b\u00e1o c\u00e1o c\u1ee7a Dice 2025, nhu c\u1ea7u tuy\u1ec3n d\u1ee5ng Data Engineer t\u1ea1i \u0110\u00f4ng Nam \u00c1 t\u0103ng h\u01a1n 40\u202f% m\u1ed7i n\u0103m, v\u01b0\u1ee3t xa Data Analyst v\u00e0 ti\u1ec7m c\u1eadn Software Engineer. Doanh nghi\u1ec7p hi\u1ec3u r\u1eb1ng m\u00f4 h\u00ecnh AI\/BI d\u00f9 \u0111\u1eaft ti\u1ec1n c\u0169ng s\u1ebd v\u00f4 ngh\u0129a n\u1ebfu n\u1ec1n m\u00f3ng d\u1eef li\u1ec7u b\u1ea9n, ph\u00e2n m\u1ea3nh v\u00e0 kh\u00f3 m\u1edf [&hellip;]<\/p>\n","protected":false},"author":222,"featured_media":86908,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_gspb_post_css":"","footnotes":""},"categories":[109,105,94],"tags":[],"class_list":["post-86905","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-chuyen-mon-it","category-phong-van-it","category-su-nghiep-it"],"blocksy_meta":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.8 (Yoast SEO v27.8) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn - ITviec Blog<\/title>\n<meta name=\"description\" content=\"T\u1ed5ng h\u1ee3p 6 nh\u00f3m c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer th\u01b0\u1eddng g\u1eb7p theo \u0111a d\u1ea1ng ch\u1ee7 \u0111\u1ec1 k\u00e8m tips tr\u1ea3 l\u1eddi, gi\u00fap b\u1ea1n t\u00f9y bi\u1ebfn theo kinh nghi\u1ec7m.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/\" \/>\n<meta property=\"og:locale\" content=\"vi_VN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn\" \/>\n<meta property=\"og:description\" content=\"Theo b\u00e1o c\u00e1o c\u1ee7a Dice 2025, nhu c\u1ea7u tuy\u1ec3n d\u1ee5ng Data Engineer t\u1ea1i \u0110\u00f4ng Nam \u00c1 t\u0103ng h\u01a1n 40\u202f% m\u1ed7i n\u0103m, v\u01b0\u1ee3t xa Data Analyst v\u00e0 ti\u1ec7m c\u1eadn Software Engineer.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/\" \/>\n<meta property=\"og:site_name\" content=\"ITviec Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/ITviec\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-11T14:57:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/05\/cau-hoi-phong-van-data-engineer-vippro-scaled.png\" \/>\n\t<meta property=\"og:image:width\" content=\"640\" \/>\n\t<meta property=\"og:image:height\" content=\"337\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Nguy\u1ec5n H\u1eefu V\u0103n\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@ITviec\" \/>\n<meta name=\"twitter:site\" content=\"@ITviec\" \/>\n<meta name=\"twitter:label1\" content=\"\u0110\u01b0\u1ee3c vi\u1ebft b\u1edfi\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nguy\u1ec5n H\u1eefu V\u0103n\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u01af\u1edbc t\u00ednh th\u1eddi gian \u0111\u1ecdc\" \/>\n\t<meta name=\"twitter:data2\" content=\"36 ph\u00fat\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn - ITviec Blog","description":"T\u1ed5ng h\u1ee3p 6 nh\u00f3m c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer th\u01b0\u1eddng g\u1eb7p theo \u0111a d\u1ea1ng ch\u1ee7 \u0111\u1ec1 k\u00e8m tips tr\u1ea3 l\u1eddi, gi\u00fap b\u1ea1n t\u00f9y bi\u1ebfn theo kinh nghi\u1ec7m.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/","og_locale":"vi_VN","og_type":"article","og_title":"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn","og_description":"Theo b\u00e1o c\u00e1o c\u1ee7a Dice 2025, nhu c\u1ea7u tuy\u1ec3n d\u1ee5ng Data Engineer t\u1ea1i \u0110\u00f4ng Nam \u00c1 t\u0103ng h\u01a1n 40\u202f% m\u1ed7i n\u0103m, v\u01b0\u1ee3t xa Data Analyst v\u00e0 ti\u1ec7m c\u1eadn Software Engineer.","og_url":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/","og_site_name":"ITviec Blog","article_publisher":"https:\/\/www.facebook.com\/ITviec","article_published_time":"2025-05-11T14:57:00+00:00","og_image":[{"width":640,"height":337,"url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/05\/cau-hoi-phong-van-data-engineer-vippro-scaled.png","type":"image\/png"}],"author":"Nguy\u1ec5n H\u1eefu V\u0103n","twitter_card":"summary_large_image","twitter_creator":"@ITviec","twitter_site":"@ITviec","twitter_misc":{"\u0110\u01b0\u1ee3c vi\u1ebft b\u1edfi":"Nguy\u1ec5n H\u1eefu V\u0103n","\u01af\u1edbc t\u00ednh th\u1eddi gian \u0111\u1ecdc":"36 ph\u00fat"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#article","isPartOf":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/"},"author":{"name":"Nguy\u1ec5n H\u1eefu V\u0103n","@id":"https:\/\/itviec.com\/blog\/#\/schema\/person\/a77cc13f89eaa58f59d8772448febe5f"},"headline":"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn","datePublished":"2025-05-11T14:57:00+00:00","mainEntityOfPage":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/"},"wordCount":9565,"publisher":{"@id":"https:\/\/itviec.com\/blog\/#organization"},"image":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#primaryimage"},"thumbnailUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/05\/cau-hoi-phong-van-data-engineer-vippro-scaled.png","articleSection":["Chuy\u00ean m\u00f4n IT","Ph\u1ecfng v\u1ea5n IT","S\u1ef1 nghi\u1ec7p IT"],"inLanguage":"vi"},{"@type":"WebPage","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/","url":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/","name":"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn - ITviec Blog","isPartOf":{"@id":"https:\/\/itviec.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#primaryimage"},"image":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#primaryimage"},"thumbnailUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/05\/cau-hoi-phong-van-data-engineer-vippro-scaled.png","datePublished":"2025-05-11T14:57:00+00:00","description":"T\u1ed5ng h\u1ee3p 6 nh\u00f3m c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer th\u01b0\u1eddng g\u1eb7p theo \u0111a d\u1ea1ng ch\u1ee7 \u0111\u1ec1 k\u00e8m tips tr\u1ea3 l\u1eddi, gi\u00fap b\u1ea1n t\u00f9y bi\u1ebfn theo kinh nghi\u1ec7m.","breadcrumb":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#breadcrumb"},"inLanguage":"vi","potentialAction":[{"@type":"ReadAction","target":["https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/"]}]},{"@type":"ImageObject","inLanguage":"vi","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#primaryimage","url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/05\/cau-hoi-phong-van-data-engineer-vippro-scaled.png","contentUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/05\/cau-hoi-phong-van-data-engineer-vippro-scaled.png","width":640,"height":337,"caption":"c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n data engineer - itviec blog"},{"@type":"BreadcrumbList","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Chuy\u00ean m\u00f4n IT","item":"https:\/\/itviec.com\/blog\/chuyen-mon-it\/"},{"@type":"ListItem","position":2,"name":"Top 40+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer ph\u1ed5 bi\u1ebfn"}]},{"@type":"WebSite","@id":"https:\/\/itviec.com\/blog\/#website","url":"https:\/\/itviec.com\/blog\/","name":"ITviec Blog","description":"IT Jobs &amp; People in Vietnam","publisher":{"@id":"https:\/\/itviec.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/itviec.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"vi"},{"@type":"Organization","@id":"https:\/\/itviec.com\/blog\/#organization","name":"ITviec","url":"https:\/\/itviec.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"vi","@id":"https:\/\/itviec.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2018\/12\/itviec-black-square-facebook.png","contentUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2018\/12\/itviec-black-square-facebook.png","width":1800,"height":1800,"caption":"ITviec"},"image":{"@id":"https:\/\/itviec.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/ITviec","https:\/\/x.com\/ITviec","https:\/\/www.linkedin.com\/company\/itviec","https:\/\/www.youtube.com\/channel\/UCYthAQ3bcGr57M_ag5gHDvQ"]},{"@type":"Person","@id":"https:\/\/itviec.com\/blog\/#\/schema\/person\/a77cc13f89eaa58f59d8772448febe5f","name":"Nguy\u1ec5n H\u1eefu V\u0103n","image":{"@type":"ImageObject","inLanguage":"vi","@id":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2024\/03\/TR-Nguyen-Huu-Van-vippro-e1712136004193-100x100.jpg","url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2024\/03\/TR-Nguyen-Huu-Van-vippro-e1712136004193-100x100.jpg","contentUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2024\/03\/TR-Nguyen-Huu-Van-vippro-e1712136004193-100x100.jpg","caption":"Nguy\u1ec5n H\u1eefu V\u0103n"},"url":"https:\/\/itviec.com\/blog\/author\/nguyen-huu-van-2\/"}]}},"_links":{"self":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts\/86905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/users\/222"}],"replies":[{"embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/comments?post=86905"}],"version-history":[{"count":0,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts\/86905\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/media\/86908"}],"wp:attachment":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/media?parent=86905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/categories?post=86905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/tags?post=86905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}