{"id":90220,"date":"2025-07-31T22:09:30","date_gmt":"2025-07-31T15:09:30","guid":{"rendered":"https:\/\/itviec.com\/blog\/?p=90220"},"modified":"2025-07-31T22:09:34","modified_gmt":"2025-07-31T15:09:34","slug":"cau-hoi-phong-van-big-data-engineer","status":"publish","type":"post","link":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/","title":{"rendered":"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">N\u1ed9i dung b\u00e0i vi\u1ebft<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cac_chu_de_cau_hoi_phong_van_Big_Data_Engineer_pho_bien\" >C\u00e1c ch\u1ee7 \u0111\u1ec1 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cau_hoi_phong_van_Big_Data_Engineer_ve_kien_thuc_tong_quan\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 ki\u1ebfn th\u1ee9c t\u1ed5ng quan<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cau_hoi_phong_van_Big_Data_Engineer_ve_he_sinh_thai_Hadoop\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 h\u1ec7 sinh th\u00e1i Hadoop<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cau_hoi_phong_van_Big_Data_Engineer_ve_Apache_Spark\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 Apache Spark<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cau_hoi_phong_van_Big_Data_Engineer_ve_he_thong_Streaming\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 h\u1ec7 th\u1ed1ng Streaming<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cau_hoi_phong_van_Big_Data_Engineer_ve_quan_ly_va_toi_uu_hoa_du_lieu\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 qu\u1ea3n l\u00fd v\u00e0 t\u1ed1i \u01b0u h\u00f3a d\u1eef li\u1ec7u<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Cau_hoi_phong_van_Big_Data_Engineer_ve_kinh_nghiem_lam_viec_va_thuc_chien\" >C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 kinh nghi\u1ec7m l\u00e0m vi\u1ec7c v\u00e0 th\u1ef1c chi\u1ebfn<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#Tong_ket_cau_hoi_phong_van_Big_Data_Engineer\" >T\u1ed5ng k\u1ebft c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer<\/a><\/li><\/ul><\/nav><\/div>\n\n<p><em><strong>Big Data \u0111ang tr\u1edf th\u00e0nh m\u1ed9t xu h\u01b0\u1edbng kh\u00f4ng th\u1ec3 thi\u1ebfu \u0111\u1ed1i v\u1edbi doanh nghi\u1ec7p trong k\u1ef7 nguy\u00ean d\u1eef li\u1ec7u. N\u1ebfu b\u1ea1n s\u1eafp tham gia ph\u1ecfng v\u1ea5n cho v\u1ecb tr\u00ed Big Data Engineer, b\u00e0i vi\u1ebft sau \u0111\u00e2y s\u1ebd t\u1ed5ng h\u1ee3p b\u1ed9 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer nh\u1eb1m gi\u00fap b\u1ea1n chu\u1ea9n b\u1ecb k\u1ef9 l\u01b0\u1ee1ng h\u01a1n \u0111\u1ec3 t\u1ef1 tin v\u01b0\u1ee3t qua bu\u1ed5i ph\u1ecfng v\u1ea5n.<\/strong><\/em><\/p>\n\n\n\n<p>\u0110\u1ecdc b\u00e0i vi\u1ebft n\u00e0y \u0111\u1ec3 tham kh\u1ea3o c\u00e1ch tr\u1ea3 l\u1eddi c\u00e1c c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ki\u1ebfn th\u1ee9c t\u1ed5ng quan c\u1ee7a Big Data<\/li>\n\n\n\n<li>H\u1ec7 sinh th\u00e1i Hadoop, Apache Spark v\u00e0 h\u1ec7 th\u1ed1ng Streaming<\/li>\n\n\n\n<li>Qu\u1ea3n l\u00fd v\u00e0 t\u1ed1i \u01b0u h\u00f3a d\u1eef li\u1ec7u<\/li>\n\n\n\n<li>Kinh nghi\u1ec7m th\u1ef1c t\u1ebf<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u1ecdc chi ti\u1ebft: <strong><a href=\"https:\/\/itviec.com\/blog\/luong-big-data-engineer\/\" target=\"_blank\" rel=\"noreferrer noopener\">L\u01b0\u01a1ng Big Data Engineer th\u1ef1c t\u1ebf t\u1ea1i Vi\u1ec7t Nam v\u00e0 qu\u1ed1c t\u1ebf m\u1edbi nh\u1ea5t<\/a><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cac-ch\u1ee7-d\u1ec1-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-ph\u1ed5-bi\u1ebfn\"><span class=\"ez-toc-section\" id=\"Cac_chu_de_cau_hoi_phong_van_Big_Data_Engineer_pho_bien\"><\/span><strong>C\u00e1c ch\u1ee7 \u0111\u1ec1 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>C\u00e1c c\u00e2u h\u1ecfi trong bu\u1ed5i ph\u1ecfng v\u1ea5n v\u1ecb tr\u00ed Big Data Engineer kh\u00f4ng ch\u1ec9 xoay quanh k\u1ef9 n\u0103ng k\u1ef9 thu\u1eadt, m\u00e0 c\u00f2n gi\u00fap nh\u00e0 tuy\u1ec3n d\u1ee5ng \u0111\u00e1nh gi\u00e1 t\u01b0 duy logic, kh\u1ea3 n\u0103ng x\u1eed l\u00fd v\u1ea5n \u0111\u1ec1 v\u00e0 m\u1ee9c \u0111\u1ed9 th\u00edch nghi c\u1ee7a \u1ee9ng vi\u00ean trong m\u00f4i tr\u01b0\u1eddng th\u1ef1c t\u1ebf.&nbsp;<\/p>\n\n\n\n<p>D\u01b0\u1edbi \u0111\u00e2y l\u00e0 c\u00e1c ch\u1ee7 \u0111\u1ec1 ph\u1ed5 bi\u1ebfn th\u01b0\u1eddng xu\u1ea5t hi\u1ec7n trong c\u00e1c c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-ki\u1ebfn-th\u1ee9c-t\u1ed5ng-quan-v\u1ec1-big-data\"><strong>Ki\u1ebfn th\u1ee9c t\u1ed5ng quan v\u1ec1 Big Data<\/strong><\/h3>\n\n\n\n<p>Ki\u1ebfn th\u1ee9c t\u1ed5ng quan v\u1ec1 Big Data bao g\u1ed3m vi\u1ec7c hi\u1ec3u r\u00f5 kh\u00e1i ni\u1ec7m, \u0111\u1eb7c \u0111i\u1ec3m, vai tr\u00f2 v\u00e0 \u1ee9ng d\u1ee5ng c\u1ee7a Big Data trong th\u1ef1c t\u1ebf kinh doanh.&nbsp;<\/p>\n\n\n\n<p>Nh\u00e0 tuy\u1ec3n d\u1ee5ng s\u1ebd h\u1ecfi v\u1ec1 c\u00e1c ch\u1ee7 \u0111\u1ec1 n\u00e0y \u0111\u1ec3 ki\u1ec3m tra hi\u1ec3u bi\u1ebft c\u01a1 b\u1ea3n v\u00e0 kh\u1ea3 n\u0103ng nh\u1eadn th\u1ee9c v\u1ec1 t\u1ea7m quan tr\u1ecdng c\u1ee7a d\u1eef li\u1ec7u l\u1edbn trong c\u00f4ng vi\u1ec7c c\u1ee7a \u1ee9ng vi\u00ean.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u1ecdc chi ti\u1ebft: <strong><a href=\"https:\/\/itviec.com\/blog\/dinh-nghia-big-data-la-gi\/\" target=\"_blank\" rel=\"noreferrer noopener\">Big Data l\u00e0 g\u00ec: 7 \u0111\u1eb7c \u0111i\u1ec3m v\u00e0 t\u00ednh ch\u1ea5t quan tr\u1ecdng c\u1ee7a Big Data<\/a><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-h\u1ec7-sinh-thai-hadoop\"><strong>H\u1ec7 sinh th\u00e1i Hadoop<\/strong><\/h3>\n\n\n\n<p>H\u1ec7 sinh th\u00e1i Hadoop l\u00e0 m\u1ed9t t\u1eadp h\u1ee3p c\u00e1c c\u00f4ng c\u1ee5 v\u00e0 c\u00f4ng ngh\u1ec7 m\u00e3 ngu\u1ed3n m\u1edf h\u1ed7 tr\u1ee3 l\u01b0u tr\u1eef, x\u1eed l\u00fd v\u00e0 ph\u00e2n t\u00edch l\u01b0\u1ee3ng d\u1eef li\u1ec7u kh\u1ed5ng l\u1ed3 m\u1ed9t c\u00e1ch hi\u1ec7u qu\u1ea3 v\u00e0 linh ho\u1ea1t. Hadoop r\u1ea5t quan tr\u1ecdng v\u1edbi Big Data Engineer v\u00ec n\u00f3 cung c\u1ea5p n\u1ec1n t\u1ea3ng \u0111\u1ec3 x\u00e2y d\u1ef1ng v\u00e0 qu\u1ea3n l\u00fd c\u00e1c gi\u1ea3i ph\u00e1p d\u1eef li\u1ec7u l\u1edbn.&nbsp;<\/p>\n\n\n\n<p>Khi h\u1ecfi v\u1ec1 ch\u1ee7 \u0111\u1ec1 n\u00e0y, nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1 kinh nghi\u1ec7m v\u1eadn h\u00e0nh, hi\u1ec3u bi\u1ebft v\u1ec1 ki\u1ebfn tr\u00fac, c\u00e1c th\u00e0nh ph\u1ea7n c\u1ed1t l\u00f5i, \u01b0u \u0111i\u1ec3m v\u00e0 nh\u01b0\u1ee3c \u0111i\u1ec3m c\u1ee7a Hadoop.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-apache-spark-va-h\u1ec7-th\u1ed1ng-streaming\"><strong>Apache Spark v\u00e0 h\u1ec7 th\u1ed1ng Streaming<\/strong><\/h3>\n\n\n\n<p>Apache Spark l\u00e0 framework x\u1eed l\u00fd d\u1eef li\u1ec7u nhanh, h\u1ed7 tr\u1ee3 x\u1eed l\u00fd d\u1eef li\u1ec7u h\u00e0ng lo\u1ea1t v\u00e0 d\u1eef li\u1ec7u th\u1eddi gian th\u1ef1c (streaming). Spark \u0111\u00f3ng vai tr\u00f2 quan tr\u1ecdng trong vi\u1ec7c x\u1eed l\u00fd v\u00e0 ph\u00e2n t\u00edch d\u1eef li\u1ec7u l\u1edbn theo th\u1eddi gian th\u1ef1c.&nbsp;<\/p>\n\n\n\n<p>\u0110\u00e2y l\u00e0 ph\u1ea7n ki\u1ec3m tra k\u1ef9 n\u0103ng x\u1eed l\u00fd d\u1eef li\u1ec7u theo lu\u1ed3ng v\u00e0 kinh nghi\u1ec7m th\u1ef1c h\u00e0nh v\u1edbi c\u00e1c c\u00f4ng c\u1ee5 ph\u1ed5 bi\u1ebfn hi\u1ec7n nay nh\u01b0 Spark v\u00e0 Hadoop MapReduce.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-qu\u1ea3n-ly-va-t\u1ed1i-\u01b0u-hoa-d\u1eef-li\u1ec7u\"><strong>Qu\u1ea3n l\u00fd v\u00e0 t\u1ed1i \u01b0u h\u00f3a d\u1eef li\u1ec7u<\/strong><\/h3>\n\n\n\n<p>Qu\u1ea3n l\u00fd v\u00e0 t\u1ed1i \u01b0u h\u00f3a d\u1eef li\u1ec7u li\u00ean quan \u0111\u1ebfn vi\u1ec7c \u0111\u1ea3m b\u1ea3o hi\u1ec7u su\u1ea5t, kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng, b\u1ea3o m\u1eadt v\u00e0 qu\u1ea3n l\u00fd d\u1eef li\u1ec7u ph\u00e2n t\u00e1n trong h\u1ec7 th\u1ed1ng Big Data.&nbsp;<\/p>\n\n\n\n<p>V\u1edbi ch\u1ee7 \u0111\u1ec1 n\u00e0y, nh\u00e0 tuy\u1ec3n d\u1ee5ng s\u1ebd \u0111\u00e1nh gi\u00e1 kinh nghi\u1ec7m c\u1ea3i thi\u1ec7n hi\u1ec7u n\u0103ng h\u1ec7 th\u1ed1ng, t\u1ed1i \u01b0u h\u00f3a chi ph\u00ed, b\u1ea3o v\u1ec7 d\u1eef li\u1ec7u, v\u00e0 duy tr\u00ec kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng linh ho\u1ea1t c\u1ee7a h\u1ec7 th\u1ed1ng.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-kinh-nghi\u1ec7m-lam-vi\u1ec7c-va-th\u1ef1c-chi\u1ebfn\"><strong>Kinh nghi\u1ec7m l\u00e0m vi\u1ec7c v\u00e0 th\u1ef1c chi\u1ebfn<\/strong><\/h3>\n\n\n\n<p>Ngo\u00e0i l\u00fd thuy\u1ebft, nh\u00e0 tuy\u1ec3n d\u1ee5ng th\u01b0\u1eddng s\u1ebd h\u1ecfi v\u1ec1 nh\u1eefng kinh nghi\u1ec7m th\u1ef1c t\u1ebf, d\u1ef1 \u00e1n m\u00e0 \u1ee9ng vi\u00ean \u0111\u00e3 tri\u1ec3n khai v\u00e0 c\u00e1c v\u1ea5n \u0111\u1ec1 m\u00e0 h\u1ecd \u0111\u00e3 gi\u1ea3i quy\u1ebft. Qua \u0111\u00e2y, nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n ki\u1ec3m tra kh\u1ea3 n\u0103ng \u00e1p d\u1ee5ng l\u00fd thuy\u1ebft v\u00e0o th\u1ef1c ti\u1ec5n, k\u1ef9 n\u0103ng x\u1eed l\u00fd v\u1ea5n \u0111\u1ec1 ph\u00e1t sinh v\u00e0 kh\u1ea3 n\u0103ng ph\u1ed1i h\u1ee3p l\u00e0m vi\u1ec7c theo nh\u00f3m c\u1ee7a \u1ee9ng vi\u00ean.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u1ecdc chi ti\u1ebft: <a href=\"https:\/\/itviec.com\/blog\/big-data-engineer-la-gi\/\"><strong>Big Data Engineer l\u00e0 g\u00ec: T\u1ea7m quan tr\u1ecdng c\u1ee7a v\u1ecb tr\u00ed n\u00e0y trong c\u00f4ng ty<\/strong><\/a><\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-v\u1ec1-ki\u1ebfn-th\u1ee9c-t\u1ed5ng-quan\"><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Big_Data_Engineer_ve_kien_thuc_tong_quan\"><\/span><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 ki\u1ebfn th\u1ee9c t\u1ed5ng quan<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-fresher-junior\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Fresher\/Junior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-hay-gi\u1ea3i-thich-ba-lo\u1ea1i-d\u1eef-li\u1ec7u-structured-semi-structured-va-unstructured\"><strong>H\u00e3y gi\u1ea3i th\u00edch ba lo\u1ea1i d\u1eef li\u1ec7u: structured, semi-structured v\u00e0 unstructured.<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Lo\u1ea1i d\u1eef li\u1ec7u<\/strong><\/td><td><strong>\u0110\u1ecbnh ngh\u0129a<\/strong><\/td><td><strong>V\u00ed d\u1ee5<\/strong><\/td><td><strong>C\u00f4ng c\u1ee5 x\u1eed l\u00fd ph\u1ed5 bi\u1ebfn<\/strong><\/td><\/tr><tr><td><strong>Structured<\/strong><\/td><td>D\u1ea1ng d\u1eef li\u1ec7u \u0111\u01b0\u1ee3c t\u1ed5 ch\u1ee9c ch\u1eb7t ch\u1ebd theo h\u00e0ng v\u00e0 c\u1ed9t, th\u01b0\u1eddng l\u01b0u tr\u1eef trong c\u00e1c h\u1ec7 qu\u1ea3n tr\u1ecb c\u01a1 s\u1edf d\u1eef li\u1ec7u quan h\u1ec7. D\u1eef li\u1ec7u d\u1ec5 d\u00e0ng truy v\u1ea5n b\u1eb1ng SQL v\u00e0 ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c h\u1ec7 th\u1ed1ng y\u00eau c\u1ea7u t\u00ednh to\u00e0n v\u1eb9n v\u00e0 r\u00f5 r\u00e0ng v\u1ec1 schema<\/td><td>B\u1ea3ng \u0111i\u1ec3m c\u1ee7a h\u1ecdc sinh, th\u00f4ng tin kh\u00e1ch h\u00e0ng trong c\u01a1 s\u1edf d\u1eef li\u1ec7u<\/td><td>SQL, RDBMS (MySQL, PostgreSQL, Oracle)<\/td><\/tr><tr><td><strong>Semi-structured<\/strong><\/td><td>D\u1eef li\u1ec7u kh\u00f4ng tu\u00e2n theo m\u00f4 h\u00ecnh b\u1ea3ng c\u1ee9ng nh\u1eafc, nh\u01b0ng v\u1eabn c\u00f3 c\u1ea5u tr\u00fac ng\u1ea7m \u0111\u1ecbnh nh\u1edd v\u00e0o \u0111\u1ecbnh d\u1ea1ng \u0111\u00e1nh d\u1ea5u nh\u01b0 JSON, XML ho\u1eb7c c\u1eb7p key-value. Th\u01b0\u1eddng xu\u1ea5t hi\u1ec7n trong log h\u1ec7 th\u1ed1ng, API response, ho\u1eb7c file c\u1ea5u h\u00ecnh.<\/td><td>D\u1eef li\u1ec7u JSON t\u1eeb API, file XML, log server, metadata c\u1ee7a h\u00ecnh \u1ea3nh.<\/td><td>NoSQL (MongoDB, Cassandra), Hadoop, Spark, Hive<\/td><\/tr><tr><td><strong>Unstructured<\/strong><\/td><td>D\u1eef li\u1ec7u kh\u00f4ng c\u00f3 c\u1ea5u tr\u00fac x\u00e1c \u0111\u1ecbnh, kh\u00f4ng th\u1ec3 ph\u00e2n t\u00edch tr\u1ef1c ti\u1ebfp b\u1eb1ng c\u00e1c h\u1ec7 th\u1ed1ng truy\u1ec1n th\u1ed1ng. Th\u01b0\u1eddng y\u00eau c\u1ea7u x\u1eed l\u00fd b\u1eb1ng k\u1ef9 thu\u1eadt \u0111\u1eb7c th\u00f9 nh\u01b0 NLP, Computer Vision ho\u1eb7c AI.<\/td><td>V\u0103n b\u1ea3n t\u1ef1 do, h\u00ecnh \u1ea3nh, video, \u00e2m thanh, b\u00ecnh lu\u1eadn m\u1ea1ng x\u00e3 h\u1ed9i.<\/td><td>Spark, Hadoop, TensorFlow, NLP Toolkit, Computer Vision Tools<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-hay-cho-bi\u1ebft-pipeline-d\u1eef-li\u1ec7u-la-gi\"><strong>H\u00e3y cho bi\u1ebft pipeline d\u1eef li\u1ec7u l\u00e0 g\u00ec?<\/strong><\/h4>\n\n\n\n<p>Pipeline d\u1eef li\u1ec7u l\u00e0 m\u1ed9t chu\u1ed7i c\u00e1c b\u01b0\u1edbc t\u1ef1 \u0111\u1ed9ng gi\u00fap di chuy\u1ec3n v\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u t\u1eeb ngu\u1ed3n \u0111\u1ea7u v\u00e0o \u0111\u1ebfn h\u1ec7 th\u1ed1ng \u0111\u00edch nh\u01b0 data warehouse, data lake ho\u1eb7c h\u1ec7 th\u1ed1ng ph\u00e2n t\u00edch.<\/p>\n\n\n\n<p>Qu\u00e1 tr\u00ecnh n\u00e0y th\u01b0\u1eddng bao g\u1ed3m c\u00e1c b\u01b0\u1edbc: tr\u00edch xu\u1ea5t (extract) d\u1eef li\u1ec7u t\u1eeb ngu\u1ed3n nh\u01b0 database, API, ho\u1eb7c file log; chuy\u1ec3n \u0111\u1ed5i (transform) \u0111\u1ec3 l\u00e0m s\u1ea1ch, chu\u1ea9n h\u00f3a v\u00e0 t\u00edch h\u1ee3p d\u1eef li\u1ec7u; cu\u1ed1i c\u00f9ng l\u00e0 t\u1ea3i (load) v\u00e0o n\u01a1i l\u01b0u tr\u1eef ph\u1ee5c v\u1ee5 ph\u00e2n t\u00edch. Pipeline \u0111\u00f3ng vai tr\u00f2 then ch\u1ed1t trong vi\u1ec7c \u0111\u1ea3m b\u1ea3o d\u1eef li\u1ec7u lu\u00f4n \u0111\u01b0\u1ee3c x\u1eed l\u00fd \u0111\u1ed3ng nh\u1ea5t, \u0111\u00e1ng tin c\u1eady v\u00e0 s\u1eb5n s\u00e0ng cho c\u00e1c h\u1ec7 th\u1ed1ng downstream nh\u01b0 dashboard, AI\/ML ho\u1eb7c b\u00e1o c\u00e1o kinh doanh.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-overfitting-la-gi-no-co-th\u1ec3-x\u1ea3y-ra-trong-cac-bai-toan-big-data-nh\u01b0-th\u1ebf-nao\"><strong>Overfitting l\u00e0 g\u00ec? N\u00f3 c\u00f3 th\u1ec3 x\u1ea3y ra trong c\u00e1c b\u00e0i to\u00e1n Big Data nh\u01b0 th\u1ebf n\u00e0o?<\/strong><\/h4>\n\n\n\n<p>Overfitting l\u00e0 hi\u1ec7n t\u01b0\u1ee3ng x\u1ea3y ra khi m\u1ed9t m\u00f4 h\u00ecnh h\u1ecdc m\u00e1y h\u1ecdc qu\u00e1 k\u1ef9 v\u00e0o d\u1eef li\u1ec7u hu\u1ea5n luy\u1ec7n,&nbsp; \u0111\u1ebfn m\u1ee9c ghi nh\u1edb c\u1ea3 nhi\u1ec5u v\u00e0 chi ti\u1ebft kh\u00f4ng quan tr\u1ecdng. Thay v\u00ec h\u1ecdc \u0111\u01b0\u1ee3c quy lu\u1eadt t\u1ed5ng qu\u00e1t. K\u1ebft qu\u1ea3 l\u00e0 m\u00f4 h\u00ecnh cho hi\u1ec7u su\u1ea5t r\u1ea5t t\u1ed1t tr\u00ean d\u1eef li\u1ec7u hu\u1ea5n luy\u1ec7n, nh\u01b0ng hi\u1ec7u su\u1ea5t k\u00e9m khi \u00e1p d\u1ee5ng l\u00ean d\u1eef li\u1ec7u m\u1edbi, do thi\u1ebfu kh\u1ea3 n\u0103ng t\u1ed5ng qu\u00e1t h\u00f3a.<\/p>\n\n\n\n<p>Th\u00f4ng th\u01b0\u1eddng m\u1ecdi ng\u01b0\u1eddi th\u01b0\u1eddng ngh\u0129 overfitting x\u1ea3y ra khi kh\u00f4ng c\u00f3 \u0111\u1ee7 d\u1eef li\u1ec7u, tuy nhi\u00ean trong c\u00e1c b\u00e0i to\u00e1n Big Data, overfitting v\u1eabn c\u00f3 th\u1ec3 x\u1ea3y ra, d\u00f9 b\u1ea1n \u0111ang l\u00e0m vi\u1ec7c v\u1edbi l\u01b0\u1ee3ng d\u1eef li\u1ec7u r\u1ea5t l\u1edbn. \u0110i\u1ec1u n\u00e0y th\u01b0\u1eddng xu\u1ea5t hi\u1ec7n khi:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>M\u00f4 h\u00ecnh qu\u00e1 ph\u1ee9c t\u1ea1p so v\u1edbi m\u1ee5c ti\u00eau (s\u1ed1 l\u01b0\u1ee3ng tham s\u1ed1 l\u1edbn h\u01a1n m\u1ee9c c\u1ea7n thi\u1ebft).<\/li>\n\n\n\n<li>D\u1eef li\u1ec7u m\u1ea5t c\u00e2n b\u1eb1ng ho\u1eb7c c\u00f3 qu\u00e1 nhi\u1ec1u bi\u1ebfn kh\u00f4ng li\u00ean quan.<\/li>\n\n\n\n<li>Thi\u1ebfu c\u00e1c k\u1ef9 thu\u1eadt nh\u01b0 r\u00e0ng bu\u1ed9c m\u00f4 h\u00ecnh (\u0111\u1ec3 gi\u1edbi h\u1ea1n m\u1ee9c \u0111\u1ed9 ph\u1ee9c t\u1ea1p), ho\u1eb7c \u0111\u00e1nh gi\u00e1 ch\u00e9o \u0111\u1ec3 ki\u1ec3m tra kh\u1ea3 n\u0103ng d\u1ef1 \u0111o\u00e1n tr\u00ean d\u1eef li\u1ec7u m\u1edbi.<\/li>\n<\/ul>\n\n\n\n<p>Th\u1ef1c t\u1ebf, Big Data kh\u00f4ng t\u1ef1 \u0111\u1ed9ng gi\u00fap tr\u00e1nh overfitting. N\u1ebfu m\u00f4 h\u00ecnh kh\u00f4ng \u0111\u01b0\u1ee3c ki\u1ec3m so\u00e1t t\u1ed1t, b\u1ea1n v\u1eabn c\u00f3 th\u1ec3 &#8220;h\u1ecdc nh\u1ea7m&#8221; t\u1eeb nhi\u1ec5u trong h\u00e0ng tri\u1ec7u d\u00f2ng d\u1eef li\u1ec7u. Do \u0111\u00f3, vi\u1ec7c \u0111\u00e1nh gi\u00e1 m\u00f4 h\u00ecnh qua t\u1eadp test \u0111\u1ed9c l\u1eadp, s\u1eed d\u1ee5ng cross-validation, regularization (L1\/L2), ho\u1eb7c \u0111\u01a1n gi\u1ea3n h\u00f3a m\u00f4 h\u00ecnh l\u00e0 r\u1ea5t quan tr\u1ecdng<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-middle-senior\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Middle\/Senior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-data-lake-la-gi-co-gi-khac-bi\u1ec7t-so-v\u1edbi-data-warehouse\"><strong>Data Lake l\u00e0 g\u00ec, c\u00f3 g\u00ec kh\u00e1c bi\u1ec7t so v\u1edbi Data Warehouse?<\/strong><\/h4>\n\n\n\n<p>Data Lake l\u00e0 m\u1ed9t h\u1ec7 th\u1ed1ng l\u01b0u tr\u1eef t\u1eadp trung, cho ph\u00e9p l\u01b0u tr\u1eef d\u1eef li\u1ec7u \u1edf m\u1ecdi \u0111\u1ecbnh d\u1ea1ng: t\u1eeb c\u00f3 c\u1ea5u tr\u00fac (structured), b\u00e1n c\u1ea5u tr\u00fac (semi-structured) \u0111\u1ebfn phi c\u1ea5u tr\u00fac (unstructured), m\u00e0 kh\u00f4ng c\u1ea7n x\u1eed l\u00fd tr\u01b0\u1edbc ho\u1eb7c \u0111\u1ecbnh ngh\u0129a schema ngay t\u1eeb \u0111\u1ea7u (schema-on-read). D\u1eef li\u1ec7u th\u01b0\u1eddng \u0111\u01b0\u1ee3c l\u01b0u \u1edf \u0111\u1ecbnh d\u1ea1ng th\u00f4 (raw format), cho ph\u00e9p ng\u01b0\u1eddi d\u00f9ng linh ho\u1ea1t x\u1eed l\u00fd, ph\u00e2n t\u00edch v\u00e0 kh\u00e1m ph\u00e1 sau n\u00e0y b\u1eb1ng c\u00e1c c\u00f4ng c\u1ee5 kh\u00e1c nhau.<\/p>\n\n\n\n<p>Trong khi \u0111\u00f3, Data Warehouse l\u00e0 m\u1ed9t h\u1ec7 th\u1ed1ng l\u01b0u tr\u1eef d\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac, \u0111\u01b0\u1ee3c x\u1eed l\u00fd v\u00e0 l\u00e0m s\u1ea1ch tr\u01b0\u1edbc khi l\u01b0u (schema-on-write). N\u00f3 t\u1ed1i \u01b0u cho vi\u1ec7c ph\u00e2n t\u00edch d\u1eef li\u1ec7u truy\u1ec1n th\u1ed1ng, truy v\u1ea5n nhanh v\u00e0 t\u1ea1o b\u00e1o c\u00e1o \u0111\u1ecbnh k\u1ef3.&nbsp;<\/p>\n\n\n\n<p>T\u00f3m l\u1ea1i, Data Lake thi\u00ean v\u1ec1 linh ho\u1ea1t v\u00e0 kh\u1ea3 n\u0103ng l\u01b0u tr\u1eef d\u1eef li\u1ec7u l\u1edbn v\u1edbi nhi\u1ec1u \u0111\u1ecbnh d\u1ea1ng, ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c use case AI\/ML, trong khi Data Warehouse th\u00edch h\u1ee3p v\u1edbi c\u00e1c ho\u1ea1t \u0111\u1ed9ng ph\u00e2n t\u00edch kinh doanh truy\u1ec1n th\u1ed1ng \u0111\u00f2i h\u1ecfi t\u1ed1c \u0111\u1ed9 truy v\u1ea5n nhanh v\u00e0 d\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac r\u00f5 r\u00e0ng.<\/p>\n\n\n\n<p>B\u1ea3ng so s\u00e1nh c\u1ee5 th\u1ec3:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Ti\u00eau ch\u00ed<\/strong><\/td><td><strong>Data Lake<\/strong><\/td><td><strong>Data Warehouse<\/strong><\/td><\/tr><tr><td><strong>Lo\u1ea1i d\u1eef li\u1ec7u l\u01b0u tr\u1eef<\/strong><\/td><td>D\u1eef li\u1ec7u th\u00f4: structured, semi-structured, unstructured<\/td><td>Ch\u1ee7 y\u1ebfu l\u00e0 structured data \u0111\u00e3 \u0111\u01b0\u1ee3c x\u1eed l\u00fd<\/td><\/tr><tr><td><strong>Schema<\/strong><\/td><td>\u00c1p d\u1ee5ng khi \u0111\u1ecdc d\u1eef li\u1ec7u (schema-on-read)<\/td><td>\u00c1p d\u1ee5ng khi ghi d\u1eef li\u1ec7u (schema-on-write)<\/td><\/tr><tr><td><strong>Kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng<\/strong><\/td><td>Cao, linh ho\u1ea1t nh\u1edd c\u00f4ng ngh\u1ec7 l\u01b0u tr\u1eef ph\u00e2n t\u00e1n (HDFS, S3&#8230;)<\/td><td>Gi\u1edbi h\u1ea1n h\u01a1n, t\u1ed1n chi ph\u00ed \u0111\u1ec3 m\u1edf r\u1ed9ng<\/td><\/tr><tr><td><strong>Chi ph\u00ed l\u01b0u tr\u1eef<\/strong><\/td><td>Th\u01b0\u1eddng th\u1ea5p h\u01a1n do l\u01b0u d\u1eef li\u1ec7u th\u00f4<\/td><td>Cao h\u01a1n v\u00ec c\u1ea7n x\u1eed l\u00fd v\u00e0 chu\u1ea9n h\u00f3a d\u1eef li\u1ec7u tr\u01b0\u1edbc<\/td><\/tr><tr><td><strong>T\u1ed1c \u0111\u1ed9 truy v\u1ea5n\/hi\u1ec7u n\u0103ng<\/strong><\/td><td>Ch\u1eadm h\u01a1n do c\u1ea7n x\u1eed l\u00fd khi truy v\u1ea5n<\/td><td>T\u1ed1i \u01b0u cho truy v\u1ea5n nhanh, \u0111\u1eb7c bi\u1ec7t l\u00e0 truy v\u1ea5n ph\u1ee9c t\u1ea1p ho\u1eb7c BI<\/td><\/tr><tr><td><strong>C\u00f4ng ngh\u1ec7 ph\u1ed5 bi\u1ebfn<\/strong><\/td><td>Hadoop, Apache Spark, Amazon S3, Azure Data Lake<\/td><td>Amazon Redshift, Google BigQuery, Snowflake, Teradata<\/td><\/tr><tr><td><strong>Tr\u01b0\u1eddng h\u1ee3p s\u1eed d\u1ee5ng \u0111i\u1ec3n h\u00ecnh<\/strong><\/td><td>L\u01b0u tr\u1eef d\u1eef li\u1ec7u \u0111a d\u1ea1ng ph\u1ee5c v\u1ee5 Machine Learning, Data Science<\/td><td>Ph\u00e2n t\u00edch b\u00e1o c\u00e1o kinh doanh, dashboard, truy v\u1ea5n OLAP truy\u1ec1n th\u1ed1ng<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-khi-nao-thi-nosql-la-l\u1ef1a-ch\u1ecdn-phu-h\u1ee3p-h\u01a1n-so-v\u1edbi-sql\"><strong>Khi n\u00e0o th\u00ec NoSQL l\u00e0 l\u1ef1a ch\u1ecdn ph\u00f9 h\u1ee3p h\u01a1n so v\u1edbi SQL?<\/strong><\/h4>\n\n\n\n<p>NoSQL ph\u00f9 h\u1ee3p khi b\u1ea1n c\u1ea7n m\u1ed9t h\u1ec7 th\u1ed1ng linh ho\u1ea1t, d\u1ec5 m\u1edf r\u1ed9ng, ch\u1ecbu t\u1ea3i cao v\u00e0 kh\u00f4ng b\u1ecb r\u00e0ng bu\u1ed9c b\u1edfi schema c\u1ee9ng nh\u1eafc, \u0111\u1eb7c bi\u1ec7t trong c\u00e1c h\u1ec7 th\u1ed1ng Big Data, \u1ee9ng d\u1ee5ng web quy m\u00f4 l\u1edbn ho\u1eb7c h\u1ec7 sinh th\u00e1i d\u1eef li\u1ec7u hi\u1ec7n \u0111\u1ea1i. Tuy nhi\u00ean, NoSQL kh\u00f4ng ph\u1ea3i l\u00e0 l\u1ef1a ch\u1ecdn thay th\u1ebf ho\u00e0n to\u00e0n cho SQL. Trong c\u00e1c h\u1ec7 th\u1ed1ng y\u00eau c\u1ea7u t\u00ednh to\u00e0n v\u1eb9n d\u1eef li\u1ec7u cao, giao d\u1ecbch ph\u1ee9c t\u1ea1p (ACID) ho\u1eb7c ph\u00e2n t\u00edch d\u1eef li\u1ec7u logic v\u1edbi c\u00e1c truy v\u1ea5n ph\u1ee9c t\u1ea1p, SQL v\u1eabn l\u00e0 l\u1ef1a ch\u1ecdn t\u1ed1i \u01b0u.<\/p>\n\n\n\n<p>D\u01b0\u1edbi \u0111\u00e2y l\u00e0 c\u00e1c t\u00ecnh hu\u1ed1ng c\u1ee5 th\u1ec3 m\u00e0 NoSQL l\u00e0 l\u1ef1a ch\u1ecdn ph\u00f9 h\u1ee3p h\u01a1n so v\u1edbi SQL:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Tr\u01b0\u1eddng h\u1ee3p s\u1eed d\u1ee5ng<\/strong><\/td><td><strong>L\u00fd do ch\u1ecdn NoSQL<\/strong><\/td><\/tr><tr><td>\u1ee8ng d\u1ee5ng c\u1ea7n m\u1edf r\u1ed9ng theo chi\u1ec1u ngang<\/td><td>NoSQL nh\u01b0 MongoDB, Cassandra h\u1ed7 tr\u1ee3 ph\u00e2n m\u1ea3nh (sharding) v\u00e0 x\u1eed l\u00fd ph\u00e2n t\u00e1n t\u1ed1t h\u01a1n so v\u1edbi RDBMS<\/td><\/tr><tr><td>D\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac linh ho\u1ea1t ho\u1eb7c thay \u0111\u1ed5i th\u01b0\u1eddng xuy\u00ean<\/td><td>NoSQL kh\u00f4ng y\u00eau c\u1ea7u schema c\u1ed1 \u0111\u1ecbnh, d\u1ec5 th\u00edch \u1ee9ng v\u1edbi m\u00f4 h\u00ecnh d\u1eef li\u1ec7u thay \u0111\u1ed5i<\/td><\/tr><tr><td>H\u1ec7 th\u1ed1ng l\u01b0u tr\u1eef log, s\u1ef1 ki\u1ec7n, c\u1ea3m bi\u1ebfn IoT, m\u1ea1ng x\u00e3 h\u1ed9i<\/td><td>D\u1eef li\u1ec7u phi c\u1ea5u tr\u00fac ho\u1eb7c b\u00e1n c\u1ea5u tr\u00fac, t\u1ed1c \u0111\u1ed9 ghi cao \u2013 NoSQL x\u1eed l\u00fd t\u1ed1t h\u01a1n<\/td><\/tr><tr><td>\u1ee8ng d\u1ee5ng c\u00f3 l\u01b0u l\u01b0\u1ee3ng truy c\u1eadp l\u1edbn, y\u00eau c\u1ea7u ph\u1ea3n h\u1ed3i nhanh<\/td><td>NoSQL t\u1ed1i \u01b0u cho c\u00e1c thao t\u00e1c \u0111\u1ecdc\/ghi t\u1ed1c \u0111\u1ed9 cao, \u0111\u1eb7c bi\u1ec7t trong c\u00e1c \u1ee9ng d\u1ee5ng real-time nh\u01b0 game, chat<\/td><\/tr><tr><td>Khi m\u1ed1i quan h\u1ec7 gi\u1eefa d\u1eef li\u1ec7u kh\u00f4ng ph\u1ee9c t\u1ea1p<\/td><td>Kh\u00f4ng c\u1ea7n JOIN ho\u1eb7c giao d\u1ecbch ACID ph\u1ee9c t\u1ea1p \u2013 NoSQL \u0111\u01a1n gi\u1ea3n v\u00e0 hi\u1ec7u qu\u1ea3 h\u01a1n<\/td><\/tr><tr><td>\u1ee8ng d\u1ee5ng c\u1ea7n l\u01b0u tr\u1eef d\u1eef li\u1ec7u b\u00e1n c\u1ea5u tr\u00fac nh\u01b0 JSON, XML,&#8230;<\/td><td>C\u00e1c NoSQL nh\u01b0 MongoDB cho ph\u00e9p l\u01b0u tr\u1ef1c ti\u1ebfp JSON, d\u1ec5 thao t\u00e1c v\u00e0 truy v\u1ea5n d\u1ea1ng document<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u1ecdc chi ti\u1ebft: <strong><a href=\"https:\/\/itviec.com\/blog\/sql-vs-nosql\/\" target=\"_blank\" rel=\"noreferrer noopener\">SQL vs NoSQL: C\u00e1ch ch\u1ecdn h\u1ec7 qu\u1ea3n tr\u1ecb c\u01a1 s\u1edf d\u1eef li\u1ec7u ph\u00f9 h\u1ee3p<\/a><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-khi-nao-nen-x\u1eed-ly-d\u1eef-li\u1ec7u-theo-lo-khi-nao-nen-x\u1eed-ly-d\u1eef-li\u1ec7u-lu\u1ed3ng-hay-so-sanh-2-ph\u01b0\u01a1ng-phap-nay\"><strong>Khi n\u00e0o n\u00ean x\u1eed l\u00fd d\u1eef li\u1ec7u theo l\u00f4, khi n\u00e0o n\u00ean x\u1eed l\u00fd d\u1eef li\u1ec7u lu\u1ed3ng? H\u00e3y so s\u00e1nh 2 ph\u01b0\u01a1ng ph\u00e1p n\u00e0y.<\/strong><\/h4>\n\n\n\n<p><strong>Batch processing (X\u1eed l\u00fd d\u1eef li\u1ec7u theo l\u00f4)<\/strong> ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c t\u00e1c v\u1ee5 kh\u00f4ng y\u00eau c\u1ea7u x\u1eed l\u00fd t\u1ee9c th\u1eddi, ch\u00fa tr\u1ecdng \u0111\u1ebfn hi\u1ec7u su\u1ea5t x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn trong c\u00e1c khung th\u1eddi gian c\u1ee5 th\u1ec3.<\/p>\n\n\n\n<p><strong>Stream processing (X\u1eed l\u00fd d\u1eef li\u1ec7u lu\u1ed3n)<\/strong> l\u1ea1i l\u00e0 gi\u1ea3i ph\u00e1p l\u00fd t\u01b0\u1edfng khi doanh nghi\u1ec7p c\u1ea7n ph\u1ea3n \u1ee9ng nhanh v\u1edbi d\u1eef li\u1ec7u, \u0111\u1eb7c bi\u1ec7t trong c\u00e1c h\u1ec7 th\u1ed1ng real-time v\u00e0 s\u1ef1 ki\u1ec7n li\u00ean t\u1ee5c.<\/p>\n\n\n\n<p>B\u1ea3ng so s\u00e1nh chi ti\u1ebft hai ph\u01b0\u01a1ng ph\u00e1p n\u00e0y:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Ti\u00eau ch\u00ed<\/strong><\/td><td><strong>Batch Processing (X\u1eed l\u00fd theo l\u00f4)<\/strong><\/td><td><strong>Stream Processing (X\u1eed l\u00fd lu\u1ed3ng)<\/strong><\/td><\/tr><tr><td><strong>\u0110\u1ecbnh ngh\u0129a<\/strong><\/td><td>X\u1eed l\u00fd m\u1ed9t kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u l\u1edbn \u0111\u01b0\u1ee3c thu th\u1eadp trong kho\u1ea3ng th\u1eddi gian nh\u1ea5t \u0111\u1ecbnh.<\/td><td>X\u1eed l\u00fd d\u1eef li\u1ec7u ngay khi n\u00f3 ph\u00e1t sinh, g\u1ea7n nh\u01b0 theo th\u1eddi gian th\u1ef1c.<\/td><\/tr><tr><td><strong>T\u1ed1c \u0111\u1ed9 ph\u1ea3n h\u1ed3i<\/strong><\/td><td>Ch\u1eadm h\u01a1n \u2013 x\u1eed l\u00fd sau khi gom \u0111\u1ee7 d\u1eef li\u1ec7u.<\/td><td>G\u1ea7n nh\u01b0 t\u1ee9c th\u00ec \u2013 x\u1eed l\u00fd li\u00ean t\u1ee5c t\u1eebng b\u1ea3n ghi ho\u1eb7c s\u1ef1 ki\u1ec7n.<\/td><\/tr><tr><td><strong>\u0110\u1ed9 tr\u1ec5 (latency)<\/strong><\/td><td>V\u00e0i ph\u00fat \u0111\u1ebfn h\u00e0ng gi\u1edd, t\u00f9y theo k\u00edch th\u01b0\u1edbc l\u00f4 d\u1eef li\u1ec7u.<\/td><td>V\u00e0i mili gi\u00e2y \u0111\u1ebfn v\u00e0i gi\u00e2y.<\/td><\/tr><tr><td><strong>Kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u<\/strong><\/td><td>L\u1edbn, ph\u00f9 h\u1ee3p v\u1edbi x\u1eed l\u00fd h\u00e0ng GB \u0111\u1ebfn TB d\u1eef li\u1ec7u\/l\u1ea7n.<\/td><td>Nh\u1ecf h\u01a1n t\u1eebng \u0111\u01a1n v\u1ecb, nh\u01b0ng li\u00ean t\u1ee5c.<\/td><\/tr><tr><td><strong>Tr\u01b0\u1eddng h\u1ee3p s\u1eed d\u1ee5ng ph\u1ed5 bi\u1ebfn<\/strong><\/td><td>T\u1ea1o b\u00e1o c\u00e1o \u0111\u1ecbnh k\u1ef3, x\u1eed l\u00fd ETL h\u00e0ng ng\u00e0y, t\u1ed5ng h\u1ee3p d\u1eef li\u1ec7u theo khung th\u1eddi gian.<\/td><td>Ph\u00e1t hi\u1ec7n gian l\u1eadn th\u1eddi gian th\u1ef1c, h\u1ec7 th\u1ed1ng khuy\u1ebfn ngh\u1ecb (real-time recommender), IoT.<\/td><\/tr><tr><td><strong>C\u00f4ng c\u1ee5 ti\u00eau bi\u1ec3u<\/strong><\/td><td>Apache Hadoop, Apache Spark (batch mode), AWS EMR.<\/td><td>Apache Kafka, Apache Flink, Apache Storm, Spark Streaming, Google Dataflow.<\/td><\/tr><tr><td><strong>\u0110\u1ed9 ph\u1ee9c t\u1ea1p khi tri\u1ec3n khai<\/strong><\/td><td>\u0110\u01a1n gi\u1ea3n h\u01a1n, d\u1ec5 ki\u1ec3m so\u00e1t l\u1ed7i do x\u1eed l\u00fd to\u00e0n b\u1ed9 sau khi thu th\u1eadp \u0111\u1ee7 d\u1eef li\u1ec7u.<\/td><td>Ph\u1ee9c t\u1ea1p h\u01a1n, \u0111\u00f2i h\u1ecfi kh\u1ea3 n\u0103ng x\u1eed l\u00fd li\u00ean t\u1ee5c, qu\u1ea3n l\u00fd l\u1ed7i v\u00e0 tr\u1ec5 d\u1eef li\u1ec7u (late events).<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-commodity-hardware-la-gi-gi\u1ea3i-thich-t\u1ea7m-quan-tr\u1ecdng-c\u1ee7a-no-trong-cac-h\u1ec7-th\u1ed1ng-big-data\"><strong>Commodity hardware l\u00e0 g\u00ec? Gi\u1ea3i th\u00edch t\u1ea7m quan tr\u1ecdng c\u1ee7a n\u00f3 trong c\u00e1c h\u1ec7 th\u1ed1ng Big Data.<\/strong><\/h4>\n\n\n\n<p>Commodity hardware l\u00e0 c\u00e1c ph\u1ea7n c\u1ee9ng ph\u1ed5 th\u00f4ng, gi\u00e1 r\u1ebb, kh\u00f4ng y\u00eau c\u1ea7u c\u1ea5u h\u00ecnh cao c\u1ea5p, v\u00ed d\u1ee5 nh\u01b0 server th\u00f4ng th\u01b0\u1eddng, \u1ed5 c\u1ee9ng SATA, CPU ph\u1ed5 th\u00f4ng.&nbsp;<\/p>\n\n\n\n<p>Trong c\u00e1c h\u1ec7 th\u1ed1ng Big Data, \u0111\u1eb7c bi\u1ec7t l\u00e0 Hadoop ho\u1eb7c Spark, ki\u1ebfn tr\u00fac ph\u00e2n t\u00e1n cho ph\u00e9p d\u1eef li\u1ec7u v\u00e0 t\u00e1c v\u1ee5 \u0111\u01b0\u1ee3c chia nh\u1ecf v\u00e0 ch\u1ea1y song song tr\u00ean nhi\u1ec1u m\u00e1y commodity. \u0110i\u1ec1u n\u00e0y gi\u00fap doanh nghi\u1ec7p gi\u1ea3m chi ph\u00ed \u0111\u1ea7u t\u01b0, m\u1edf r\u1ed9ng quy m\u00f4 d\u1ec5 d\u00e0ng theo chi\u1ec1u ngang (horizontal scaling), \u0111\u1ed3ng th\u1eddi \u0111\u1ea3m b\u1ea3o kh\u1ea3 n\u0103ng ch\u1ecbu l\u1ed7i cao: n\u1ebfu m\u1ed9t node h\u1ecfng, d\u1eef li\u1ec7u v\u1eabn \u0111\u01b0\u1ee3c sao l\u01b0u v\u00e0 x\u1eed l\u00fd tr\u00ean c\u00e1c node kh\u00e1c. Ch\u00ednh nh\u1edd s\u1ef1 k\u1ebft h\u1ee3p gi\u1eefa ph\u1ea7n m\u1ec1m ph\u00e2n t\u00e1n v\u00e0 ph\u1ea7n c\u1ee9ng gi\u00e1 r\u1ebb m\u00e0 Big Data tr\u1edf th\u00e0nh gi\u1ea3i ph\u00e1p kh\u1ea3 thi cho c\u00e1c t\u1ed5 ch\u1ee9c c\u1ea7n x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn nh\u01b0ng v\u1eabn t\u1ed1i \u01b0u chi ph\u00ed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-v\u1ec1-h\u1ec7-sinh-thai-hadoop\"><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Big_Data_Engineer_ve_he_sinh_thai_Hadoop\"><\/span><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer<\/strong> <strong>v\u1ec1 h\u1ec7 sinh th\u00e1i Hadoop<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-fresher-junior-0\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Fresher\/Junior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-mo-t\u1ea3-ch\u1ee9c-nang-c\u1ee7a-namenode-va-datanode-trong-hadoop\"><strong>M\u00f4 t\u1ea3 ch\u1ee9c n\u0103ng c\u1ee7a NameNode v\u00e0 DataNode trong Hadoop.<\/strong><\/h4>\n\n\n\n<p><strong>NameNode<\/strong> \u0111\u01b0\u1ee3c v\u00ed nh\u01b0 &#8220;b\u1ed9 n\u00e3o&#8221; c\u1ee7a HDFS. N\u00f3 qu\u1ea3n l\u00fd to\u00e0n b\u1ed9 th\u00f4ng tin c\u1ea5u tr\u00fac h\u1ec7 th\u1ed1ng t\u1ec7p, bao g\u1ed3m t\u00ean t\u1ec7p, th\u01b0 m\u1ee5c, v\u1ecb tr\u00ed l\u01b0u tr\u1eef t\u1eebng ph\u1ea7n d\u1eef li\u1ec7u (block), ph\u00e2n quy\u1ec1n truy c\u1eadp, v\u00e0 tr\u1ea1ng th\u00e1i c\u00e1c node trong h\u1ec7 th\u1ed1ng. Tuy nhi\u00ean, NameNode kh\u00f4ng l\u01b0u d\u1eef li\u1ec7u th\u1ef1c t\u1ebf, m\u00e0 ch\u1ec9 l\u01b0u metadata \u2013 d\u1eef li\u1ec7u m\u00f4 t\u1ea3 d\u1eef li\u1ec7u.<\/p>\n\n\n\n<p><strong>DataNode<\/strong> l\u00e0 n\u01a1i th\u1ef1c t\u1ebf l\u01b0u tr\u1eef c\u00e1c kh\u1ed1i d\u1eef li\u1ec7u (data blocks). M\u1ed7i DataNode ch\u1ecbu tr\u00e1ch nhi\u1ec7m l\u01b0u, \u0111\u1ecdc\/ghi d\u1eef li\u1ec7u v\u00e0 g\u1eedi th\u00f4ng tin tr\u1ea1ng th\u00e1i \u0111\u1ecbnh k\u1ef3 v\u1ec1 cho NameNode. Khi ng\u01b0\u1eddi d\u00f9ng y\u00eau c\u1ea7u truy xu\u1ea5t d\u1eef li\u1ec7u, NameNode s\u1ebd cho bi\u1ebft kh\u1ed1i d\u1eef li\u1ec7u n\u1eb1m \u1edf \u0111\u00e2u, c\u00f2n DataNode s\u1ebd l\u00e0 n\u01a1i th\u1ef1c hi\u1ec7n vi\u1ec7c truy\u1ec1n d\u1eef li\u1ec7u.<\/p>\n\n\n\n<p>H\u1ec7 th\u1ed1ng HDFS ho\u1ea1t \u0111\u1ed9ng theo m\u00f4 h\u00ecnh t\u1eadp trung qu\u1ea3n l\u00fd, ph\u00e2n t\u00e1n l\u01b0u tr\u1eef: NameNode duy nh\u1ea5t qu\u1ea3n l\u00fd metadata, trong khi nhi\u1ec1u DataNode ph\u00e2n t\u00e1n \u0111\u1ea3m nhi\u1ec7m l\u01b0u tr\u1eef d\u1eef li\u1ec7u. M\u00f4 h\u00ecnh n\u00e0y cho ph\u00e9p Hadoop x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn v\u1edbi kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng cao, song c\u0169ng \u0111\u00f2i h\u1ecfi NameNode ph\u1ea3i lu\u00f4n s\u1eb5n s\u00e0ng. N\u1ebfu NameNode g\u1eb7p s\u1ef1 c\u1ed1 m\u00e0 kh\u00f4ng c\u00f3 b\u1ea3n sao d\u1ef1 ph\u00f2ng, to\u00e0n b\u1ed9 h\u1ec7 th\u1ed1ng c\u00f3 th\u1ec3 b\u1ecb gi\u00e1n \u0111o\u1ea1n.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-hay-li\u1ec7t-ke-m\u1ed9t-s\u1ed1-tinh-nang-n\u1ed5i-b\u1eadt-c\u1ee7a-hadoop\"><strong>H\u00e3y li\u1ec7t k\u00ea m\u1ed9t s\u1ed1 t\u00ednh n\u0103ng n\u1ed5i b\u1eadt c\u1ee7a Hadoop.<\/strong><\/h4>\n\n\n\n<p>M\u1ed9t s\u1ed1 t\u00ednh n\u0103ng n\u1ed5i b\u1eadt gi\u00fap Hadoop tr\u1edf th\u00e0nh n\u1ec1n t\u1ea3ng ph\u1ed5 bi\u1ebfn trong c\u00e1c h\u1ec7 th\u1ed1ng Big Data:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Kh\u1ea3 n\u0103ng x\u1eed l\u00fd ph\u00e2n t\u00e1n:<\/strong> Hadoop chia nh\u1ecf d\u1eef li\u1ec7u th\u00e0nh c\u00e1c kh\u1ed1i v\u00e0 x\u1eed l\u00fd song song tr\u00ean nhi\u1ec1u node trong c\u1ee5m, gi\u00fap t\u0103ng t\u1ed1c \u0111\u1ed9 x\u1eed l\u00fd v\u00e0 t\u1eadn d\u1ee5ng t\u00e0i nguy\u00ean t\u1ed1i \u0111a.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng ngang (horizontal scaling):<\/strong> C\u00f3 th\u1ec3 d\u1ec5 d\u00e0ng m\u1edf r\u1ed9ng h\u1ec7 th\u1ed1ng b\u1eb1ng c\u00e1ch th\u00eam nhi\u1ec1u m\u00e1y r\u1ebb ti\u1ec1n (commodity hardware) m\u00e0 kh\u00f4ng c\u1ea7n thay \u0111\u1ed5i c\u1ea5u tr\u00fac ph\u1ea7n m\u1ec1m.<\/li>\n\n\n\n<li><strong>Ch\u1ecbu l\u1ed7i cao (fault-tolerance):<\/strong> Khi m\u1ed9t node b\u1ecb l\u1ed7i, h\u1ec7 th\u1ed1ng s\u1ebd t\u1ef1 \u0111\u1ed9ng l\u1ea5y b\u1ea3n sao c\u1ee7a d\u1eef li\u1ec7u t\u1eeb node kh\u00e1c nh\u1edd c\u01a1 ch\u1ebf sao l\u01b0u (replication), \u0111\u1ea3m b\u1ea3o qu\u00e1 tr\u00ecnh x\u1eed l\u00fd kh\u00f4ng b\u1ecb gi\u00e1n \u0111o\u1ea1n.<\/li>\n\n\n\n<li><strong>L\u01b0u tr\u1eef d\u1eef li\u1ec7u l\u1edbn v\u1edbi chi ph\u00ed th\u1ea5p<\/strong>: Nh\u1edd t\u1eadn d\u1ee5ng ph\u1ea7n c\u1ee9ng ph\u1ed5 th\u00f4ng v\u00e0 h\u1ec7 th\u1ed1ng t\u1ec7p ph\u00e2n t\u00e1n HDFS, Hadoop cho ph\u00e9p l\u01b0u tr\u1eef h\u00e0ng petabyte d\u1eef li\u1ec7u v\u1edbi chi ph\u00ed t\u1ed1i \u01b0u.<\/li>\n\n\n\n<li><strong>T\u01b0\u01a1ng th\u00edch v\u1edbi nhi\u1ec1u \u0111\u1ecbnh d\u1ea1ng d\u1eef li\u1ec7u<\/strong>: Hadoop h\u1ed7 tr\u1ee3 d\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac, b\u00e1n c\u1ea5u tr\u00fac v\u00e0 phi c\u1ea5u tr\u00fac, bao g\u1ed3m v\u0103n b\u1ea3n, h\u00ecnh \u1ea3nh, log, video&#8230;<\/li>\n\n\n\n<li><strong>H\u1ec7 sinh th\u00e1i phong ph\u00fa v\u00e0 m\u1edf r\u1ed9ng:<\/strong> Bao quanh Hadoop l\u00e0 c\u00e1c c\u00f4ng c\u1ee5 m\u1ea1nh m\u1ebd nh\u01b0 Hive (truy v\u1ea5n d\u1eef li\u1ec7u), Pig (x\u1eed l\u00fd d\u1eef li\u1ec7u d\u1ea1ng script), HBase (c\u01a1 s\u1edf d\u1eef li\u1ec7u NoSQL), Spark (x\u1eed l\u00fd in-memory), gi\u00fap m\u1edf r\u1ed9ng kh\u1ea3 n\u0103ng ph\u00e2n t\u00edch d\u1eef li\u1ec7u \u0111a d\u1ea1ng.<\/li>\n\n\n\n<li><strong>T\u1ef1 \u0111\u1ed9ng ph\u00e2n ph\u1ed1i v\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u<\/strong>: Ng\u01b0\u1eddi d\u00f9ng kh\u00f4ng c\u1ea7n lo vi\u1ec7c ph\u00e2n chia hay \u0111i\u1ec1u ph\u1ed1i th\u1ee7 c\u00f4ng. Hadoop t\u1ef1 \u0111\u1ed9ng chia nh\u1ecf, ph\u00e2n ph\u1ed1i v\u00e0 gom k\u1ebft qu\u1ea3 x\u1eed l\u00fd.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-b\u1ea1n-bi\u1ebft-gi-v\u1ec1-apache-hive-hay-gi\u1ea3i-thich-s\u1ef1-khac-bi\u1ec7t-gi\u1eefa-hive-va-rdbms\"><strong>B\u1ea1n bi\u1ebft g\u00ec v\u1ec1 Apache Hive? H\u00e3y gi\u1ea3i th\u00edch s\u1ef1 kh\u00e1c bi\u1ec7t gi\u1eefa Hive v\u00e0 RDBMS?<\/strong><\/h4>\n\n\n\n<p>Apache Hive l\u00e0 m\u1ed9t c\u00f4ng c\u1ee5 trong h\u1ec7 sinh th\u00e1i Hadoop cho ph\u00e9p truy v\u1ea5n, ph\u00e2n t\u00edch v\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn l\u01b0u tr\u1eef tr\u00ean HDFS b\u1eb1ng c\u00fa ph\u00e1p gi\u1ed1ng SQL, g\u1ecdi l\u00e0 HiveQL. Thay v\u00ec th\u1ef1c thi truy v\u1ea5n tr\u1ef1c ti\u1ebfp nh\u01b0 c\u00e1c h\u1ec7 qu\u1ea3n tr\u1ecb c\u01a1 s\u1edf d\u1eef li\u1ec7u quan h\u1ec7 (RDBMS), Hive bi\u00ean d\u1ecbch c\u00e1c truy v\u1ea5n th\u00e0nh c\u00e1c t\u00e1c v\u1ee5 MapReduce (ho\u1eb7c Spark jobs) \u0111\u1ec3 ch\u1ea1y tr\u00ean n\u1ec1n t\u1ea3ng ph\u00e2n t\u00e1n.<\/p>\n\n\n\n<p>M\u1eb7c d\u00f9 c\u00fa ph\u00e1p t\u01b0\u01a1ng t\u1ef1 SQL, Hive kh\u00f4ng ph\u1ea3i l\u00e0 m\u1ed9t c\u01a1 s\u1edf d\u1eef li\u1ec7u quan h\u1ec7, v\u00e0 c\u00f3 m\u1ed9t s\u1ed1 \u0111i\u1ec3m kh\u00e1c bi\u1ec7t r\u00f5 r\u1ec7t so v\u1edbi RDBMS:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Ti\u00eau ch\u00ed<\/strong><\/td><td><strong>Hive<\/strong><\/td><td><strong>RDBMS (C\u01a1 s\u1edf d\u1eef li\u1ec7u quan h\u1ec7)<\/strong><\/td><\/tr><tr><td><strong>C\u1ea5u tr\u00fac l\u01b0u tr\u1eef<\/strong><\/td><td>D\u1eef li\u1ec7u \u0111\u01b0\u1ee3c l\u01b0u tr\u1eef tr\u00ean HDFS (t\u1ec7p ph\u00e2n t\u00e1n)<\/td><td>D\u1eef li\u1ec7u l\u01b0u tr\u1eef trong b\u1ea3ng quan h\u1ec7 v\u1edbi schema c\u1ed1 \u0111\u1ecbnh<\/td><\/tr><tr><td><strong>C\u01a1 ch\u1ebf th\u1ef1c thi<\/strong><\/td><td>Ch\u1ea1y tr\u00ean MapReduce, Tez ho\u1eb7c Spark \u2013 th\u01b0\u1eddng c\u00f3 \u0111\u1ed9 tr\u1ec5 cao<\/td><td>Ch\u1ea1y tr\u1ef1c ti\u1ebfp trong engine c\u01a1 s\u1edf d\u1eef li\u1ec7u \u2013 t\u1ed1c \u0111\u1ed9 ph\u1ea3n h\u1ed3i nhanh<\/td><\/tr><tr><td><strong>Lo\u1ea1i d\u1eef li\u1ec7u x\u1eed l\u00fd<\/strong><\/td><td>D\u1eef li\u1ec7u l\u1edbn, x\u1eed l\u00fd h\u00e0ng lo\u1ea1t (batch processing)<\/td><td>D\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac r\u00f5 r\u00e0ng, th\u01b0\u1eddng x\u1eed l\u00fd theo giao d\u1ecbch (OLTP)<\/td><\/tr><tr><td><strong>H\u1ed7 tr\u1ee3 c\u1eadp nh\u1eadt d\u1eef li\u1ec7u<\/strong><\/td><td>H\u1ea1n ch\u1ebf, kh\u00f4ng t\u1ed1i \u01b0u cho thao t\u00e1c INSERT\/UPDATE\/DELETE t\u1eebng d\u00f2ng<\/td><td>H\u1ed7 tr\u1ee3 c\u1eadp nh\u1eadt d\u1eef li\u1ec7u theo h\u00e0ng, \u0111\u1ea3m b\u1ea3o t\u00ednh nh\u1ea5t qu\u00e1n (ACID)<\/td><\/tr><tr><td><strong>Kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng<\/strong><\/td><td>R\u1ea5t cao nh\u1edd n\u1ec1n t\u1ea3ng ph\u00e2n t\u00e1n Hadoop<\/td><td>H\u1ea1n ch\u1ebf h\u01a1n, ph\u1ee5 thu\u1ed9c v\u00e0o c\u1ea5u h\u00ecnh ph\u1ea7n c\u1ee9ng<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>T\u00f3m l\u1ea1i, Hive ph\u00f9 h\u1ee3p v\u1edbi c\u00e1c b\u00e0i to\u00e1n ph\u00e2n t\u00edch d\u1eef li\u1ec7u l\u1edbn (OLAP), n\u01a1i y\u00eau c\u1ea7u x\u1eed l\u00fd kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u kh\u1ed5ng l\u1ed3 theo l\u00f4. Ng\u01b0\u1ee3c l\u1ea1i, RDBMS \u0111\u01b0\u1ee3c t\u1ed1i \u01b0u cho c\u00e1c \u1ee9ng d\u1ee5ng giao d\u1ecbch nh\u1ecf, truy c\u1eadp th\u1eddi gian th\u1ef1c, y\u00eau c\u1ea7u \u0111\u1ed9 ch\u00ednh x\u00e1c v\u00e0 ph\u1ea3n h\u1ed3i nhanh.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-middle-senior-0\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Middle\/Senior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-t\u1ea1i-sao-kich-th\u01b0\u1edbc-block-trong-hdfs-l\u1ea1i-\u1ea3nh-h\u01b0\u1edfng-d\u1ebfn-hi\u1ec7u-su\u1ea5t-h\u1ec7-th\u1ed1ng\"><strong>T\u1ea1i sao k\u00edch th\u01b0\u1edbc block trong HDFS l\u1ea1i \u1ea3nh h\u01b0\u1edfng \u0111\u1ebfn hi\u1ec7u su\u1ea5t h\u1ec7 th\u1ed1ng?<\/strong><\/h4>\n\n\n\n<p>K\u00edch th\u01b0\u1edbc block \u1ea3nh h\u01b0\u1edfng tr\u1ef1c ti\u1ebfp \u0111\u1ebfn hi\u1ec7u su\u1ea5t h\u1ec7 th\u1ed1ng v\u00ec n\u00f3 quy\u1ebft \u0111\u1ecbnh s\u1ed1 l\u01b0\u1ee3ng kh\u1ed1i c\u1ea7n x\u1eed l\u00fd v\u00e0 qu\u1ea3n l\u00fd trong qu\u00e1 tr\u00ecnh l\u01b0u tr\u1eef, truy\u1ec1n t\u1ea3i v\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>N\u1ebfu block qu\u00e1 nh\u1ecf:<\/strong> M\u1ed9t t\u1ec7p l\u1edbn s\u1ebd b\u1ecb chia th\u00e0nh r\u1ea5t nhi\u1ec1u block nh\u1ecf, d\u1eabn \u0111\u1ebfn qu\u00e1 t\u1ea3i metadata t\u1ea1i NameNode (do ph\u1ea3i theo d\u00f5i nhi\u1ec1u kh\u1ed1i d\u1eef li\u1ec7u). \u0110i\u1ec1u n\u00e0y l\u00e0m gi\u1ea3m hi\u1ec7u su\u1ea5t to\u00e0n h\u1ec7 th\u1ed1ng v\u00e0 t\u0103ng \u0111\u1ed9 tr\u1ec5 khi truy c\u1eadp.<\/li>\n\n\n\n<li><strong>N\u1ebfu block qu\u00e1 l\u1edbn:<\/strong> Vi\u1ec7c x\u1eed l\u00fd song song b\u1ecb h\u1ea1n ch\u1ebf v\u00ec c\u00f3 \u00edt block \u0111\u1ec3 ph\u00e2n ph\u1ed1i cho c\u00e1c DataNode, gi\u1ea3m kh\u1ea3 n\u0103ng t\u1eadn d\u1ee5ng t\u00ednh ph\u00e2n t\u00e1n c\u1ee7a Hadoop. Ngo\u00e0i ra, khi \u0111\u1ecdc m\u1ed9t ph\u1ea7n nh\u1ecf c\u1ee7a file, h\u1ec7 th\u1ed1ng c\u00f3 th\u1ec3 ph\u1ea3i t\u1ea3i nguy\u00ean block l\u1edbn, g\u00e2y l\u00e3ng ph\u00ed t\u00e0i nguy\u00ean.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-di\u1ec3m-khac-bi\u1ec7t-chinh-gi\u1eefa-hadoop-phien-b\u1ea3n-1-va-2-la-gi\"><strong>\u0110i\u1ec3m kh\u00e1c bi\u1ec7t ch\u00ednh gi\u1eefa Hadoop phi\u00ean b\u1ea3n 1 v\u00e0 2 l\u00e0 g\u00ec?<\/strong><\/h4>\n\n\n\n<p>S\u1ef1 kh\u00e1c bi\u1ec7t c\u1ed1t l\u00f5i gi\u1eefa Hadoop 1 v\u00e0 Hadoop 2 n\u1eb1m \u1edf ki\u1ebfn tr\u00fac qu\u1ea3n l\u00fd t\u00e0i nguy\u00ean v\u00e0 kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng c\u1ee7a h\u1ec7 th\u1ed1ng.<\/p>\n\n\n\n<p>Trong Hadoop phi\u00ean b\u1ea3n 1, h\u1ec7 th\u1ed1ng s\u1eed d\u1ee5ng JobTracker \u0111\u1ec3 v\u1eeba qu\u1ea3n l\u00fd t\u00e0i nguy\u00ean, v\u1eeba \u0111i\u1ec1u ph\u1ed1i c\u00e1c t\u00e1c v\u1ee5 x\u1eed l\u00fd MapReduce. \u0110i\u1ec1u n\u00e0y d\u1eabn \u0111\u1ebfn n\u00fat th\u1eaft c\u1ed5 chai (bottleneck) khi c\u00f3 nhi\u1ec1u job ch\u1ea1y \u0111\u1ed3ng th\u1eddi, khi\u1ebfn h\u1ec7 th\u1ed1ng kh\u00f3 m\u1edf r\u1ed9ng v\u00e0 k\u00e9m linh ho\u1ea1t.<\/p>\n\n\n\n<p>Hadoop 2 \u0111\u00e3 c\u1ea3i ti\u1ebfn \u0111i\u1ec1u n\u00e0y b\u1eb1ng c\u00e1ch gi\u1edbi thi\u1ec7u YARN (Yet Another Resource Negotiator) \u2013 m\u1ed9t ki\u1ebfn tr\u00fac qu\u1ea3n l\u00fd t\u00e0i nguy\u00ean t\u00e1ch bi\u1ec7t v\u1edbi x\u1eed l\u00fd t\u00e1c v\u1ee5. V\u1edbi YARN:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>ResourceManager qu\u1ea3n l\u00fd t\u00e0i nguy\u00ean chung cho to\u00e0n c\u1ee5m.<\/li>\n\n\n\n<li>ApplicationMaster \u0111i\u1ec1u ph\u1ed1i t\u1eebng \u1ee9ng d\u1ee5ng ri\u00eang bi\u1ec7t.<\/li>\n<\/ul>\n\n\n\n<p>Nh\u1edd \u0111\u00f3, Hadoop 2 h\u1ed7 tr\u1ee3 kh\u00f4ng ch\u1ec9 MapReduce m\u00e0 c\u00f2n c\u00e1c m\u00f4 h\u00ecnh x\u1eed l\u00fd kh\u00e1c nh\u01b0 Spark, Tez, Storm&#8230; \u0110i\u1ec1u n\u00e0y gi\u00fap h\u1ec7 th\u1ed1ng linh ho\u1ea1t h\u01a1n, s\u1eed d\u1ee5ng t\u00e0i nguy\u00ean hi\u1ec7u qu\u1ea3 h\u01a1n v\u00e0 d\u1ec5 m\u1edf r\u1ed9ng quy m\u00f4 x\u1eed l\u00fd.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-hadoop-co-nh\u1eefng-c\u01a1-ch\u1ebf-b\u1ea3o-m\u1eadt-nao-giup-h\u1ea1n-ch\u1ebf-cac-truy-c\u1eadp-khong-d\u01b0\u1ee3c-phep\"><strong>Hadoop c\u00f3 nh\u1eefng c\u01a1 ch\u1ebf b\u1ea3o m\u1eadt n\u00e0o gi\u00fap h\u1ea1n ch\u1ebf c\u00e1c truy c\u1eadp kh\u00f4ng \u0111\u01b0\u1ee3c ph\u00e9p?<\/strong><\/h4>\n\n\n\n<p>M\u1ed9t s\u1ed1 c\u01a1 ch\u1ebf b\u1ea3o m\u1eadt ch\u00ednh trong Hadoop:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>X\u00e1c th\u1ef1c ng\u01b0\u1eddi d\u00f9ng (Authentication)<\/strong>: Hadoop h\u1ed7 tr\u1ee3 x\u00e1c th\u1ef1c ng\u01b0\u1eddi d\u00f9ng th\u00f4ng qua Kerberos &#8211; m\u1ed9t giao th\u1ee9c x\u00e1c th\u1ef1c m\u1ea1nh m\u1ebd d\u1ef1a tr\u00ean v\u00e9 (ticket). \u0110i\u1ec1u n\u00e0y gi\u00fap \u0111\u1ea3m b\u1ea3o ch\u1ec9 nh\u1eefng ng\u01b0\u1eddi d\u00f9ng h\u1ee3p l\u1ec7 m\u1edbi \u0111\u01b0\u1ee3c truy c\u1eadp v\u00e0o h\u1ec7 th\u1ed1ng.<\/li>\n\n\n\n<li><strong>Ph\u00e2n quy\u1ec1n truy c\u1eadp (Authorization):<\/strong> HDFS s\u1eed d\u1ee5ng c\u01a1 ch\u1ebf ph\u00e2n quy\u1ec1n theo ng\u01b0\u1eddi d\u00f9ng\/ nh\u00f3m\/ quy\u1ec1n truy c\u1eadp (ch\u1ea1y theo m\u00f4 h\u00ecnh gi\u1ed1ng UNIX), bao g\u1ed3m quy\u1ec1n \u0111\u1ecdc, ghi v\u00e0 th\u1ef1c thi tr\u00ean file ho\u1eb7c th\u01b0 m\u1ee5c. Ngo\u00e0i ra, m\u1ed9t s\u1ed1 c\u00f4ng c\u1ee5 nh\u01b0 Apache Ranger hay Sentry cho ph\u00e9p thi\u1ebft l\u1eadp ch\u00ednh s\u00e1ch ph\u00e2n quy\u1ec1n chi ti\u1ebft \u1edf m\u1ee9c b\u1ea3ng, c\u1ed9t ho\u1eb7c h\u00e0nh \u0111\u1ed9ng.<\/li>\n\n\n\n<li><strong>M\u00e3 h\u00f3a d\u1eef li\u1ec7u (Encryption):<\/strong> Hadoop h\u1ed7 tr\u1ee3 m\u00e3 h\u00f3a d\u1eef li\u1ec7u khi l\u01b0u tr\u1eef (at rest) v\u00e0 trong qu\u00e1 tr\u00ecnh truy\u1ec1n t\u1ea3i (in transit). \u0110i\u1ec1u n\u00e0y gi\u00fap b\u1ea3o v\u1ec7 d\u1eef li\u1ec7u kh\u1ecfi b\u1ecb \u0111\u00e1nh c\u1eafp, ngay c\u1ea3 khi \u1ed5 \u0111\u0129a ho\u1eb7c k\u1ebft n\u1ed1i b\u1ecb r\u00f2 r\u1ec9.<\/li>\n\n\n\n<li><strong>Ki\u1ec3m so\u00e1t truy c\u1eadp m\u1ea1ng (Network-level security)<\/strong>: C\u00f3 th\u1ec3 c\u1ea5u h\u00ecnh Hadoop ch\u1ec9 cho ph\u00e9p truy c\u1eadp t\u1eeb c\u00e1c IP ho\u1eb7c subnet nh\u1ea5t \u0111\u1ecbnh, \u0111\u1ed3ng th\u1eddi s\u1eed d\u1ee5ng giao th\u1ee9c SSL \u0111\u1ec3 b\u1ea3o m\u1eadt k\u00eanh truy\u1ec1n th\u00f4ng gi\u1eefa c\u00e1c node trong c\u1ee5m.<\/li>\n\n\n\n<li><strong>Ghi log v\u00e0 gi\u00e1m s\u00e1t truy c\u1eadp (Auditing &amp; Monitoring)<\/strong>: C\u00e1c c\u00f4ng c\u1ee5 nh\u01b0 Apache Ranger ho\u1eb7c c\u00e1c gi\u1ea3i ph\u00e1p gi\u00e1m s\u00e1t ngo\u00e0i (Splunk, ELK) cho ph\u00e9p theo d\u00f5i v\u00e0 ghi l\u1ea1i to\u00e0n b\u1ed9 h\u00e0nh vi truy c\u1eadp h\u1ec7 th\u1ed1ng, gi\u00fap d\u1ec5 d\u00e0ng ph\u00e1t hi\u1ec7n h\u00e0nh vi \u0111\u00e1ng ng\u1edd ho\u1eb7c truy v\u1ebft khi x\u1ea3y ra s\u1ef1 c\u1ed1.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-v\u1ec1-apache-spark\"><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Big_Data_Engineer_ve_Apache_Spark\"><\/span><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 Apache Spark<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-fresher-junior-1\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Fresher\/Junior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-spark-khac-gi-so-v\u1edbi-hadoop-mapreduce\"><strong>Spark kh\u00e1c g\u00ec so v\u1edbi Hadoop MapReduce?<\/strong><\/h4>\n\n\n\n<p>C\u1ea3 Apache Spark v\u00e0 Hadoop MapReduce \u0111\u1ec1u l\u00e0 c\u00e1c n\u1ec1n t\u1ea3ng x\u1eed l\u00fd d\u1eef li\u1ec7u ph\u00e2n t\u00e1n. Spark ra \u0111\u1eddi sau v\u1edbi m\u1ee5c ti\u00eau kh\u1eafc ph\u1ee5c nh\u1eefng h\u1ea1n ch\u1ebf v\u1ec1 hi\u1ec7u su\u1ea5t v\u00e0 t\u00ednh linh ho\u1ea1t c\u1ee7a MapReduce. D\u01b0\u1edbi \u0111\u00e2y l\u00e0 nh\u1eefng \u0111i\u1ec3m kh\u00e1c bi\u1ec7t ch\u00ednh:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hi\u1ec7u su\u1ea5t x\u1eed l\u00fd nhanh h\u01a1n nhi\u1ec1u l\u1ea7n: Spark x\u1eed l\u00fd d\u1eef li\u1ec7u trong b\u1ed9 nh\u1edb (in-memory), gi\u00fap gi\u1ea3m \u0111\u00e1ng k\u1ec3 th\u1eddi gian \u0111\u1ecdc\/ghi d\u1eef li\u1ec7u t\u1eeb \u0111\u0129a, v\u1ed1n l\u00e0 \u0111i\u1ec3m y\u1ebfu c\u1ee7a MapReduce. \u0110i\u1ec1u n\u00e0y gi\u00fap Spark nhanh g\u1ea5p 10 \u0111\u1ebfn 100 l\u1ea7n so v\u1edbi MapReduce trong nhi\u1ec1u b\u00e0i to\u00e1n ph\u00e2n t\u00edch.<\/li>\n\n\n\n<li>H\u1ed7 tr\u1ee3 x\u1eed l\u00fd linh ho\u1ea1t h\u01a1n: MapReduce ch\u1ec9 h\u1ed7 tr\u1ee3 x\u1eed l\u00fd theo m\u00f4 h\u00ecnh batch (theo l\u00f4), c\u00f2n Spark h\u1ed7 tr\u1ee3 nhi\u1ec1u m\u00f4 h\u00ecnh x\u1eed l\u00fd kh\u00e1c nhau: batch, streaming (lu\u1ed3ng), x\u1eed l\u00fd \u0111\u1ed3 th\u1ecb (GraphX) v\u00e0 h\u1ecdc m\u00e1y (MLlib), t\u1ea5t c\u1ea3 trong c\u00f9ng m\u1ed9t n\u1ec1n t\u1ea3ng.<\/li>\n\n\n\n<li>L\u1eadp tr\u00ecnh d\u1ec5 h\u01a1n, \u00edt m\u00e3 h\u01a1n: Spark cung c\u1ea5p c\u00e1c API phong ph\u00fa v\u00e0 th\u00e2n thi\u1ec7n v\u1edbi l\u1eadp tr\u00ecnh vi\u00ean (h\u1ed7 tr\u1ee3 Scala, Python, Java, R), cho ph\u00e9p vi\u1ebft c\u00e1c pipeline x\u1eed l\u00fd d\u1eef li\u1ec7u ph\u1ee9c t\u1ea1p ch\u1ec9 v\u1edbi v\u00e0i d\u00f2ng code &#8211; \u0111i\u1ec1u m\u00e0 MapReduce c\u1ea7n r\u1ea5t nhi\u1ec1u \u0111o\u1ea1n m\u00e3 boilerplate \u0111\u1ec3 th\u1ef1c hi\u1ec7n.<\/li>\n\n\n\n<li>Qu\u1ea3n l\u00fd job hi\u1ec7u qu\u1ea3 h\u01a1n: Spark c\u00f3 c\u01a1 ch\u1ebf t\u1ed1i \u01b0u h\u00f3a DAG (Directed Acyclic Graph) gi\u00fap l\u1eadp k\u1ebf ho\u1ea1ch th\u1ef1c thi linh ho\u1ea1t v\u00e0 hi\u1ec7u qu\u1ea3 h\u01a1n so v\u1edbi c\u01a1 ch\u1ebf c\u1ee9ng nh\u1eafc theo t\u1eebng b\u01b0\u1edbc (map \u2192 shuffle \u2192 reduce) c\u1ee7a MapReduce.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-spark-streaming-ho\u1ea1t-d\u1ed9ng-nh\u01b0-th\u1ebf-nao\"><strong>Spark Streaming ho\u1ea1t \u0111\u1ed9ng nh\u01b0 th\u1ebf n\u00e0o?<\/strong><\/h4>\n\n\n\n<p>Spark Streaming cho ph\u00e9p x\u1eed l\u00fd d\u1eef li\u1ec7u lu\u1ed3ng (streaming data) g\u1ea7n nh\u01b0 theo th\u1eddi gian th\u1ef1c b\u1eb1ng c\u00e1ch t\u1eadn d\u1ee5ng s\u1ee9c m\u1ea1nh c\u1ee7a Spark Core. Thay v\u00ec x\u1eed l\u00fd t\u1eebng b\u1ea3n ghi m\u1ed9t nh\u01b0 c\u00e1c h\u1ec7 th\u1ed1ng stream truy\u1ec1n th\u1ed1ng (v\u00ed d\u1ee5: Storm), Spark Streaming s\u1eed d\u1ee5ng m\u00f4 h\u00ecnh micro-batch, t\u1ee9c l\u00e0 chia d\u00f2ng d\u1eef li\u1ec7u li\u00ean t\u1ee5c th\u00e0nh c\u00e1c &#8220;l\u00f4 nh\u1ecf&#8221; (mini-batch) theo chu k\u1ef3 th\u1eddi gian \u0111\u1ecbnh tr\u01b0\u1edbc (v\u00ed d\u1ee5: m\u1ed7i 1 ho\u1eb7c 2 gi\u00e2y). M\u1ed7i batch \u0111\u01b0\u1ee3c x\u1eed l\u00fd nh\u01b0 m\u1ed9t t\u1eadp d\u1eef li\u1ec7u t\u0129nh b\u1eb1ng engine Spark th\u00f4ng th\u01b0\u1eddng.<\/p>\n\n\n\n<p>Quy tr\u00ecnh ho\u1ea1t \u0111\u1ed9ng c\u1ee7a Spark Streaming:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ingest (Thu nh\u1eadn d\u1eef li\u1ec7u)<\/strong>: Nh\u1eadn d\u1eef li\u1ec7u t\u1eeb c\u00e1c ngu\u1ed3n nh\u01b0 Kafka, Flume, socket, HDFS&#8230;<\/li>\n\n\n\n<li><strong>Chia nh\u1ecf theo th\u1eddi gian (micro-batch)<\/strong>: D\u1eef li\u1ec7u \u0111\u01b0\u1ee3c chia th\u00e0nh c\u00e1c batch nh\u1ecf.<\/li>\n\n\n\n<li><strong>X\u1eed l\u00fd song song<\/strong>: M\u1ed7i batch \u0111\u01b0\u1ee3c x\u1eed l\u00fd gi\u1ed1ng nh\u01b0 m\u1ed9t RDD ho\u1eb7c DataFrame, s\u1eed d\u1ee5ng c\u00e1c thao t\u00e1c nh\u01b0 map, reduce, join&#8230;<\/li>\n\n\n\n<li><strong>Xu\u1ea5t k\u1ebft qu\u1ea3<\/strong>: D\u1eef li\u1ec7u \u0111\u1ea7u ra c\u00f3 th\u1ec3 ghi xu\u1ed1ng HDFS, c\u01a1 s\u1edf d\u1eef li\u1ec7u, ho\u1eb7c hi\u1ec3n th\u1ecb l\u00ean dashboard theo th\u1eddi gian th\u1ef1c.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-rdd-va-dataframe-trong-spark-co-gi-khac-nhau\"><strong>RDD v\u00e0 DataFrame trong Spark c\u00f3 g\u00ec kh\u00e1c nhau?<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Ti\u00eau ch\u00ed<\/strong><\/td><td><strong>RDD<\/strong><\/td><td><strong>DataFrame<\/strong><\/td><\/tr><tr><td>C\u1ea5p \u0111\u1ed9 tr\u1eebu t\u01b0\u1ee3ng<\/td><td>Th\u1ea5p, cho ph\u00e9p thao t\u00e1c d\u1eef li\u1ec7u d\u01b0\u1edbi d\u1ea1ng c\u00e1c \u0111\u1ed1i t\u01b0\u1ee3ng ph\u00e2n t\u00e1n kh\u00f4ng c\u00f3 schema. L\u1eadp tr\u00ecnh vi\u00ean c\u1ea7n \u0111\u1ecbnh ngh\u0129a logic x\u1eed l\u00fd chi ti\u1ebft, t\u01b0\u01a1ng t\u1ef1 nh\u01b0 l\u00e0m vi\u1ec7c v\u1edbi danh s\u00e1ch trong ng\u00f4n ng\u1eef l\u1eadp tr\u00ecnh.<\/td><td>Cao, t\u1ed5 ch\u1ee9c d\u1eef li\u1ec7u theo d\u1ea1ng b\u1ea3ng c\u00f3 t\u00ean c\u1ed9t, t\u01b0\u01a1ng t\u1ef1 nh\u01b0 b\u1ea3ng trong c\u01a1 s\u1edf d\u1eef li\u1ec7u. N\u00f3 \u1ea9n b\u1edbt c\u00e1c chi ti\u1ebft k\u1ef9 thu\u1eadt v\u00e0 cho ph\u00e9p vi\u1ebft m\u00e3 ng\u1eafn g\u1ecdn h\u01a1n.<\/td><\/tr><tr><td>Hi\u1ec7u su\u1ea5t x\u1eed l\u00fd<\/td><td>Kh\u00f4ng t\u1ef1 \u0111\u1ed9ng<\/td><td>C\u00f3, nh\u1edd Catalyst Optimizer<\/td><\/tr><tr><td>Kh\u1ea3 n\u0103ng thao t\u00e1c v\u00e0 t\u00edch h\u1ee3p<\/td><td>Linh ho\u1ea1t h\u01a1n khi x\u1eed l\u00fd c\u00e1c thao t\u00e1c ph\u1ee9c t\u1ea1p, \u0111\u1eb7c bi\u1ec7t v\u1edbi d\u1eef li\u1ec7u kh\u00f4ng c\u00f3 c\u1ea5u tr\u00fac ho\u1eb7c thao t\u00e1c kh\u00f4ng ph\u00f9 h\u1ee3p v\u1edbi bi\u1ec3u th\u1ee9c SQL.<\/td><td>D\u1ec5 t\u00edch h\u1ee3p v\u1edbi Spark SQL, r\u1ea5t h\u1eefu \u00edch trong c\u00e1c b\u00e0i to\u00e1n ph\u00e2n t\u00edch d\u1eef li\u1ec7u, b\u00e1o c\u00e1o, ho\u1eb7c khi thao t\u00e1c d\u1eef li\u1ec7u c\u00f3 c\u1ea5u tr\u00fac r\u00f5 r\u00e0ng.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-v\u1ec1-h\u1ec7-th\u1ed1ng-streaming\"><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Big_Data_Engineer_ve_he_thong_Streaming\"><\/span><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer<\/strong> <strong>v\u1ec1 h\u1ec7 th\u1ed1ng Streaming<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-fresher-junior-2\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Fresher\/Junior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-hay-gi\u1ea3i-thich-ki\u1ebfn-truc-lambda-va-tr\u01b0\u1eddng-h\u1ee3p-s\u1eed-d\u1ee5ng-c\u1ee7a-no\"><strong>H\u00e3y gi\u1ea3i th\u00edch ki\u1ebfn tr\u00fac Lambda v\u00e0 tr\u01b0\u1eddng h\u1ee3p s\u1eed d\u1ee5ng c\u1ee7a n\u00f3.<\/strong><\/h4>\n\n\n\n<p>Ki\u1ebfn tr\u00fac Lambda l\u00e0 m\u1ed9t m\u00f4 h\u00ecnh thi\u1ebft k\u1ebf h\u1ec7 th\u1ed1ng x\u1eed l\u00fd d\u1eef li\u1ec7u l\u1edbn \u0111\u01b0\u1ee3c ph\u00e1t tri\u1ec3n \u0111\u1ec3 x\u1eed l\u00fd d\u1eef li\u1ec7u theo th\u1eddi gian th\u1ef1c (real-time) \u0111\u1ed3ng th\u1eddi v\u1eabn \u0111\u1ea3m b\u1ea3o \u0111\u1ed9 ch\u00ednh x\u00e1c v\u00e0 t\u00ednh to\u00e0n v\u1eb9n nh\u1edd k\u1ebft h\u1ee3p gi\u1eefa x\u1eed l\u00fd theo l\u00f4 (batch processing) v\u00e0 x\u1eed l\u00fd lu\u1ed3ng (stream processing) trong c\u00f9ng m\u1ed9t h\u1ec7 th\u1ed1ng. C\u1ea5u tr\u00fac c\u01a1 b\u1ea3n c\u1ee7a ki\u1ebfn tr\u00fac Lambda g\u1ed3m ba t\u1ea7ng ch\u00ednh: Batch Layer, Speed Layer, Serving Layer.<\/p>\n\n\n\n<p>C\u00e1c tr\u01b0\u1eddng h\u1ee3p \u0111i\u1ec3n h\u00ecnh s\u1eed d\u1ee5ng ki\u1ebfn tr\u00fac Lambda:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>H\u1ec7 th\u1ed1ng ph\u00e2n t\u00edch h\u00e0nh vi ng\u01b0\u1eddi d\u00f9ng theo th\u1eddi gian th\u1ef1c (v\u00ed d\u1ee5: theo d\u00f5i h\u00e0nh vi tr\u00ean website ho\u1eb7c \u1ee9ng d\u1ee5ng).<\/li>\n\n\n\n<li>Ph\u00e1t hi\u1ec7n gian l\u1eadn trong giao d\u1ecbch t\u00e0i ch\u00ednh, n\u01a1i c\u1ea7n ph\u1ea3n \u1ee9ng nhanh nh\u01b0ng c\u0169ng c\u1ea7n x\u00e1c nh\u1eadn d\u1eef li\u1ec7u chu\u1ea9n x\u00e1c sau khi x\u1eed l\u00fd \u0111\u1ea7y \u0111\u1ee7.<\/li>\n\n\n\n<li>H\u1ec7 th\u1ed1ng khuy\u1ebfn ngh\u1ecb ho\u1eb7c ph\u00e2n t\u00edch log m\u00e1y ch\u1ee7, k\u1ebft h\u1ee3p d\u1eef li\u1ec7u hi\u1ec7n t\u1ea1i v\u00e0 d\u1eef li\u1ec7u l\u1ecbch s\u1eed.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u1ecdc chi ti\u1ebft: <strong><a href=\"https:\/\/itviec.com\/blog\/aws-lambda-la-gi\/\" target=\"_blank\" rel=\"noreferrer noopener\">AWS Lambda l\u00e0 g\u00ec? C\u1ea9m nang s\u1eed d\u1ee5ng AWS Lambda<\/a><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-khi-nao-thi-kappa-architecture-la-l\u1ef1a-ch\u1ecdn-t\u1ed1i-\u01b0u-h\u01a1n-lambda-architecture\"><strong>Khi n\u00e0o th\u00ec Kappa Architecture l\u00e0 l\u1ef1a ch\u1ecdn t\u1ed1i \u01b0u h\u01a1n Lambda Architecture?<\/strong><\/h4>\n\n\n\n<p>Kappa Architecture l\u00e0 l\u1ef1a ch\u1ecdn t\u1ed1i \u01b0u h\u01a1n Lambda Architecture trong c\u00e1c tr\u01b0\u1eddng h\u1ee3p sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Khi h\u1ec7 th\u1ed1ng c\u1ea7n \u0111\u01a1n gi\u1ea3n, d\u1ec5 b\u1ea3o tr\u00ec: <\/strong>Lambda y\u00eau c\u1ea7u tri\u1ec3n khai song song hai h\u1ec7 th\u1ed1ng (batch + stream), khi\u1ebfn vi\u1ec7c ph\u00e1t tri\u1ec3n v\u00e0 \u0111\u1ed3ng b\u1ed9 logic x\u1eed l\u00fd tr\u1edf n\u00ean ph\u1ee9c t\u1ea1p. V\u1edbi Kappa, b\u1ea1n ch\u1ec9 c\u1ea7n duy tr\u00ec m\u1ed9t pipeline x\u1eed l\u00fd duy nh\u1ea5t, gi\u00fap gi\u1ea3m chi ph\u00ed v\u1eadn h\u00e0nh v\u00e0 r\u1ee7i ro sai l\u1ec7ch logic.<\/li>\n\n\n\n<li><strong>Khi d\u1eef li\u1ec7u ch\u1ee7 y\u1ebfu \u0111\u01b0\u1ee3c x\u1eed l\u00fd theo th\u1eddi gian th\u1ef1c: <\/strong>N\u1ebfu h\u1ec7 th\u1ed1ng kh\u00f4ng c\u1ea7n x\u1eed l\u00fd l\u1ea1i to\u00e0n b\u1ed9 d\u1eef li\u1ec7u l\u1ecbch s\u1eed th\u01b0\u1eddng xuy\u00ean (reprocessing), ho\u1eb7c d\u1eef li\u1ec7u kh\u00f4ng thay \u0111\u1ed5i sau khi \u0111\u01b0\u1ee3c ghi nh\u1eadn, th\u00ec vi\u1ec7c d\u00f9ng m\u1ed9t lu\u1ed3ng x\u1eed l\u00fd li\u00ean t\u1ee5c l\u00e0 \u0111\u1ee7.<\/li>\n\n\n\n<li><strong>Khi c\u00f3 th\u1ec3 ph\u00e1t l\u1ea1i d\u1eef li\u1ec7u (replayable stream): <\/strong>Kappa t\u1eadn d\u1ee5ng c\u00e1c n\u1ec1n t\u1ea3ng nh\u01b0 Apache Kafka \u0111\u1ec3 l\u01b0u tr\u1eef d\u1eef li\u1ec7u d\u00f2ng m\u1ed9t c\u00e1ch b\u1ec1n v\u1eefng, cho ph\u00e9p ph\u00e1t l\u1ea1i d\u1eef li\u1ec7u n\u1ebfu c\u1ea7n x\u1eed l\u00fd l\u1ea1i, thay th\u1ebf vai tr\u00f2 c\u1ee7a batch layer trong Lambda.<\/li>\n\n\n\n<li><strong>\u1ee8ng d\u1ee5ng y\u00eau c\u1ea7u \u0111\u1ed9 tr\u1ec5 th\u1ea5p v\u00e0 c\u1eadp nh\u1eadt li\u00ean t\u1ee5c: <\/strong>C\u00e1c h\u1ec7 th\u1ed1ng nh\u01b0 gi\u00e1m s\u00e1t an ninh, c\u1ea3nh b\u00e1o giao d\u1ecbch gian l\u1eadn, ph\u00e2n t\u00edch c\u1ea3m x\u00fac m\u1ea1ng x\u00e3 h\u1ed9i,&#8230; th\u01b0\u1eddng \u01b0u ti\u00ean ph\u1ea3n h\u1ed3i nhanh h\u01a1n l\u00e0 t\u1ed5ng h\u1ee3p d\u1eef li\u1ec7u l\u1edbn \u0111\u1ecbnh k\u1ef3, do \u0111\u00f3 r\u1ea5t ph\u00f9 h\u1ee3p v\u1edbi ki\u1ebfn tr\u00fac Kappa.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-middle-senior-1\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Middle\/Senior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-lam-th\u1ebf-nao-b\u1ea1n-x\u1eed-ly-tinh-tr\u1ea1ng-tr\u1ec5-late-events-trong-h\u1ec7-th\u1ed1ng-x\u1eed-ly-stream\"><strong>L\u00e0m th\u1ebf n\u00e0o b\u1ea1n x\u1eed l\u00fd t\u00ecnh tr\u1ea1ng tr\u1ec5 (late events) trong h\u1ec7 th\u1ed1ng x\u1eed l\u00fd stream?<\/strong><\/h4>\n\n\n\n<p>C\u00f3 th\u1ec3 \u00e1p d\u1ee5ng c\u00e1c k\u1ef9 thu\u1eadt sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>S\u1eed d\u1ee5ng watermark<\/strong>: Watermark l\u00e0 c\u01a1 ch\u1ebf gi\u00fap h\u1ec7 th\u1ed1ng x\u00e1c \u0111\u1ecbnh \u0111i\u1ec3m c\u1eaft th\u1eddi gian, sau \u0111\u00f3 coi m\u1ecdi s\u1ef1 ki\u1ec7n \u0111\u1ebfn mu\u1ed9n h\u01a1n m\u1ed1c n\u00e0y l\u00e0 &#8220;tr\u1ec5&#8221;. C\u00f4ng c\u1ee5 nh\u01b0 Apache Flink ho\u1eb7c Spark Structured Streaming \u0111\u1ec1u h\u1ed7 tr\u1ee3 watermark \u0111\u1ec3 ki\u1ec3m so\u00e1t \u0111\u1ed9 tr\u1ec5 cho ph\u00e9p v\u00e0 \u0111\u1ea3m b\u1ea3o d\u1eef li\u1ec7u \u0111\u1ebfn ch\u1eadm v\u1eabn \u0111\u01b0\u1ee3c x\u1eed l\u00fd n\u1ebfu trong kho\u1ea3ng th\u1eddi gian cho ph\u00e9p.<\/li>\n\n\n\n<li><strong>Thi\u1ebft l\u1eadp \u201cgrace period\u201d ho\u1eb7c \u201callowed lateness<\/strong>\u201d: C\u00f3 th\u1ec3 c\u1ea5u h\u00ecnh m\u1ed9t kho\u1ea3ng th\u1eddi gian ch\u1edd (v\u00ed d\u1ee5: 5 ph\u00fat) sau th\u1eddi gian th\u1ef1c t\u1ebf \u0111\u1ec3 ti\u1ebfp nh\u1eadn c\u00e1c s\u1ef1 ki\u1ec7n tr\u1ec5. Trong kho\u1ea3ng \u0111\u00f3, h\u1ec7 th\u1ed1ng v\u1eabn c\u1eadp nh\u1eadt k\u1ebft qu\u1ea3 \u0111\u1ea7u ra khi c\u00f3 d\u1eef li\u1ec7u \u0111\u1ebfn mu\u1ed9n.<\/li>\n\n\n\n<li><strong>Buffer d\u1eef li\u1ec7u t\u1ea1m th\u1eddi<\/strong>: Trong m\u1ed9t s\u1ed1 h\u1ec7 th\u1ed1ng, c\u00f3 th\u1ec3 ch\u1ecdn buffer (\u0111\u1ec7m) d\u1eef li\u1ec7u th\u00eam v\u00e0i gi\u00e2y ho\u1eb7c ph\u00fat tr\u01b0\u1edbc khi x\u1eed l\u00fd, nh\u1eb1m ch\u1edd d\u1eef li\u1ec7u tr\u1ec5. \u0110i\u1ec1u n\u00e0y ph\u00f9 h\u1ee3p khi \u01b0u ti\u00ean \u0111\u1ed9 ch\u00ednh x\u00e1c h\u01a1n l\u00e0 t\u1ed1c \u0111\u1ed9 ph\u1ea3n h\u1ed3i t\u1ee9c th\u00ec.<\/li>\n\n\n\n<li><strong>T\u00e1ch x\u1eed l\u00fd ch\u00ednh v\u00e0 x\u1eed l\u00fd b\u1ed5 sung:<\/strong> Thay v\u00ec l\u00e0m ch\u1eadm to\u00e0n b\u1ed9 pipeline, b\u1ea1n c\u00f3 th\u1ec3 x\u1eed l\u00fd d\u1eef li\u1ec7u \u0111\u00fang h\u1ea1n tr\u01b0\u1edbc, sau \u0111\u00f3 x\u1eed l\u00fd d\u1eef li\u1ec7u tr\u1ec5 \u1edf m\u1ed9t nh\u00e1nh ri\u00eang (V\u00ed d\u1ee5: l\u01b0u v\u00e0o side table ho\u1eb7c g\u1eedi c\u1ea3nh b\u00e1o \u0111i\u1ec1u ch\u1ec9nh sau).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-b\u1ea1n-da-t\u1eebng-g\u1eb7p-v\u1ea5n-d\u1ec1-v\u1ec1-back-pressure-trong-cac-h\u1ec7-th\u1ed1ng-x\u1eed-ly-stream-ch\u01b0a-b\u1ea1n-x\u1eed-ly-no-nh\u01b0-th\u1ebf-nao\"><strong>B\u1ea1n \u0111\u00e3 t\u1eebng g\u1eb7p v\u1ea5n \u0111\u1ec1 v\u1ec1 back-pressure trong c\u00e1c h\u1ec7 th\u1ed1ng x\u1eed l\u00fd stream ch\u01b0a? B\u1ea1n x\u1eed l\u00fd n\u00f3 nh\u01b0 th\u1ebf n\u00e0o?<\/strong><\/h4>\n\n\n\n<p>B\u1ea1n c\u00f3 th\u1ec3 \u0111\u01b0a ra m\u1ed9t t\u00ecnh hu\u1ed1ng th\u1ef1c t\u1ebf m\u00e0 b\u1ea1n \u0111\u00e3 x\u1eed l\u00fd th\u00e0nh c\u00f4ng.<\/p>\n\n\n\n<p>G\u1ee3i \u00fd c\u00e1ch tr\u1ea3 l\u1eddi:<\/p>\n\n\n\n<p>T\u00f4i \u0111\u00e3 t\u1eebng g\u1eb7p v\u1ea5n \u0111\u1ec1 back-pressure khi l\u00e0m vi\u1ec7c v\u1edbi m\u1ed9t h\u1ec7 th\u1ed1ng ph\u00e2n t\u00edch log theo th\u1eddi gian th\u1ef1c s\u1eed d\u1ee5ng Apache Kafka v\u00e0 Apache Spark Structured Streaming. Trong giai \u0111o\u1ea1n cao \u0111i\u1ec3m, khi l\u01b0\u1ee3ng truy c\u1eadp t\u0103ng \u0111\u1ed9t bi\u1ebfn, h\u1ec7 th\u1ed1ng kh\u00f4ng x\u1eed l\u00fd k\u1ecbp lu\u1ed3ng d\u1eef li\u1ec7u \u0111\u1ea9y v\u1ec1 t\u1eeb Kafka, d\u1eabn \u0111\u1ebfn \u0111\u1ed9 tr\u1ec5 \u0111\u1ea7u ra t\u0103ng d\u1ea7n, b\u1ed9 nh\u1edb Spark b\u1ecb chi\u1ebfm \u0111\u1ea7y v\u00e0 m\u1ed9t s\u1ed1 batch b\u1ecb treo, ph\u1ea3i retry nhi\u1ec1u l\u1ea7n. Sau khi ki\u1ec3m tra log v\u00e0 gi\u00e1m s\u00e1t b\u1eb1ng Spark UI + Kafka Lag Monitor, t\u00f4i x\u00e1c \u0111\u1ecbnh r\u00f5 r\u00e0ng nguy\u00ean nh\u00e2n \u0111\u1ebfn t\u1eeb vi\u1ec7c t\u1ed1c \u0111\u1ed9 x\u1eed l\u00fd downstream kh\u00f4ng theo k\u1ecbp t\u1ed1c \u0111\u1ed9 \u0111\u1ecdc d\u1eef li\u1ec7u t\u1eeb Kafka, khi\u1ebfn h\u00e0ng \u0111\u1ee3i ng\u00e0y c\u00e0ng \u0111\u1ea7y. \u0110\u00e2y l\u00e0 bi\u1ec3u hi\u1ec7n \u0111i\u1ec3n h\u00ecnh c\u1ee7a back-pressure.<\/p>\n\n\n\n<p>H\u01b0\u1edbng gi\u1ea3i quy\u1ebft m\u00e0 t\u00f4i \u0111\u00e3 th\u1ef1c hi\u1ec7n:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>K\u00edch ho\u1ea1t c\u01a1 ch\u1ebf x\u1eed l\u00fd back-pressure t\u1ef1 \u0111\u1ed9ng trong Spark:<\/strong> T\u00f4i th\u00eam c\u1ea5u h\u00ecnh spark.streaming.backpressure.enabled = true \u0111\u1ec3 Spark t\u1ef1 \u0111i\u1ec1u ch\u1ec9nh t\u1ed1c \u0111\u1ed9 l\u1ea5y d\u1eef li\u1ec7u t\u1eeb Kafka d\u1ef1a tr\u00ean n\u0103ng l\u1ef1c x\u1eed l\u00fd th\u1ef1c t\u1ebf.<\/li>\n\n\n\n<li><strong>T\u1ed1i \u01b0u logic x\u1eed l\u00fd trong pipeline<\/strong>: M\u1ed9t ph\u1ea7n \u0111\u1ed9 tr\u1ec5 \u0111\u1ebfn t\u1eeb vi\u1ec7c groupBy v\u00e0 aggregation theo session r\u1ea5t t\u1ed1n t\u00e0i nguy\u00ean. T\u00f4i \u0111\u00e3 t\u00e1ch x\u1eed l\u00fd th\u00e0nh hai giai \u0111o\u1ea1n, gi\u1ea3m s\u1ed1 l\u01b0\u1ee3ng shuffle v\u00e0 k\u1ebft h\u1ee3p th\u00eam reduceByKey thay v\u00ec groupByKey, gi\u00fap gi\u1ea3m \u00e1p l\u1ef1c l\u00ean driver v\u00e0 executor.<\/li>\n\n\n\n<li><strong>M\u1edf r\u1ed9ng t\u00e0i nguy\u00ean h\u1ec7 th\u1ed1ng<\/strong>: T\u00f4i t\u0103ng s\u1ed1 l\u01b0\u1ee3ng executor v\u00e0 batch interval t\u1eeb 1 gi\u00e2y l\u00ean 5 gi\u00e2y, cho Spark c\u00f3 th\u00eam th\u1eddi gian x\u1eed l\u00fd t\u1eebng batch v\u00e0 gi\u1ea3m \u00e1p l\u1ef1c li\u00ean t\u1ee5c.<\/li>\n\n\n\n<li><strong>T\u1ea1m th\u1eddi scale-in Kafka consumer<\/strong>: Trong l\u00fac kh\u1ea9n c\u1ea5p, t\u00f4i ch\u1ee7 \u0111\u1ed9ng gi\u1ea3m s\u1ed1 l\u01b0\u1ee3ng thread \u0111\u1ecdc Kafka \u0111\u1ec3 l\u00e0m ch\u1eadm t\u1ed1c \u0111\u1ed9 ingest \u0111\u1ea7u v\u00e0o, tr\u00e1nh \u0111\u1ea9y qu\u00e1 t\u1ea3i cho Spark trong th\u1eddi gian ng\u1eafn.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-v\u1ec1-qu\u1ea3n-ly-va-t\u1ed1i-\u01b0u-hoa-d\u1eef-li\u1ec7u\"><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Big_Data_Engineer_ve_quan_ly_va_toi_uu_hoa_du_lieu\"><\/span><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1 qu\u1ea3n l\u00fd v\u00e0 t\u1ed1i \u01b0u h\u00f3a d\u1eef li\u1ec7u<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-fresher-junior-3\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Fresher\/Junior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-vi-sao-fault-tolerance-va-data-replication-l\u1ea1i-c\u1ea7n-thi\u1ebft-trong-h\u1ec7-th\u1ed1ng-big-data\"><strong>V\u00ec sao fault tolerance v\u00e0 data replication l\u1ea1i c\u1ea7n thi\u1ebft trong h\u1ec7 th\u1ed1ng Big Data?<\/strong><\/h4>\n\n\n\n<p>Fault tolerance v\u00e0 data replication c\u1ea7n thi\u1ebft trong h\u1ec7 th\u1ed1ng Big Data v\u00ec:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Fault tolerance (kh\u1ea3 n\u0103ng ch\u1ecbu l\u1ed7i) <\/strong>gi\u00fap h\u1ec7 th\u1ed1ng Big Data v\u1eabn ti\u1ebfp t\u1ee5c ho\u1ea1t \u0111\u1ed9ng ngay c\u1ea3 khi m\u1ed9t ho\u1eb7c nhi\u1ec1u node g\u1eb7p s\u1ef1 c\u1ed1. D\u1eef li\u1ec7u v\u00e0 ti\u1ebfn tr\u00ecnh x\u1eed l\u00fd s\u1ebd kh\u00f4ng b\u1ecb m\u1ea5t, gi\u00fap \u0111\u1ea3m b\u1ea3o t\u00ednh li\u00ean t\u1ee5c c\u1ee7a d\u1ecbch v\u1ee5 v\u00e0 \u0111\u1ed9 tin c\u1eady cho ng\u01b0\u1eddi d\u00f9ng c\u0169ng nh\u01b0 doanh nghi\u1ec7p.<\/li>\n\n\n\n<li><strong>Data replication (sao l\u01b0u d\u1eef li\u1ec7u)<\/strong> l\u00e0 vi\u1ec7c l\u01b0u tr\u1eef nhi\u1ec1u b\u1ea3n sao d\u1eef li\u1ec7u tr\u00ean c\u00e1c node kh\u00e1c nhau trong cluster. Khi m\u1ed9t node b\u1ecb h\u1ecfng ho\u1eb7c m\u1ea5t k\u1ebft n\u1ed1i, h\u1ec7 th\u1ed1ng c\u00f3 th\u1ec3 truy c\u1eadp d\u1eef li\u1ec7u t\u1eeb b\u1ea3n sao d\u1ef1 ph\u00f2ng, tr\u00e1nh m\u1ea5t m\u00e1t d\u1eef li\u1ec7u v\u00e0 gi\u1ea3m r\u1ee7i ro downtime.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-theo-b\u1ea1n-scalability-trong-linh-v\u1ef1c-big-data-d\u01b0\u1ee3c-hi\u1ec3u-nh\u01b0-th\u1ebf-nao\"><strong>Theo b\u1ea1n, scalability trong l\u0129nh v\u1ef1c Big Data \u0111\u01b0\u1ee3c hi\u1ec3u nh\u01b0 th\u1ebf n\u00e0o?<\/strong><\/h4>\n\n\n\n<p>Scalability (kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng) trong l\u0129nh v\u1ef1c Big Data l\u00e0 kh\u1ea3 n\u0103ng c\u1ee7a h\u1ec7 th\u1ed1ng d\u1eef li\u1ec7u trong vi\u1ec7c x\u1eed l\u00fd hi\u1ec7u qu\u1ea3 khi kh\u1ed1i l\u01b0\u1ee3ng, t\u1ed1c \u0111\u1ed9 ho\u1eb7c \u0111\u1ed9 \u0111a d\u1ea1ng c\u1ee7a d\u1eef li\u1ec7u t\u0103ng l\u00ean, m\u00e0 kh\u00f4ng l\u00e0m gi\u1ea3m hi\u1ec7u n\u0103ng ho\u1eb7c l\u00e0m gi\u00e1n \u0111o\u1ea1n d\u1ecbch v\u1ee5.&nbsp;<\/p>\n\n\n\n<p>C\u00f3 th\u1ec3 hi\u1ec3u scalability kh\u00f4ng ch\u1ec9 l\u00e0 \u201cm\u1edf r\u1ed9ng \u0111\u1ec3 x\u1eed l\u00fd \u0111\u01b0\u1ee3c nhi\u1ec1u d\u1eef li\u1ec7u h\u01a1n\u201d, m\u00e0 c\u00f2n l\u00e0 l\u00e0m sao \u0111\u1ec3 h\u1ec7 th\u1ed1ng duy tr\u00ec \u0111\u01b0\u1ee3c \u0111\u1ed9 \u1ed5n \u0111\u1ecbnh, ti\u1ebft ki\u1ec7m t\u00e0i nguy\u00ean v\u00e0 \u0111\u1ea3m b\u1ea3o latency ch\u1ea5p nh\u1eadn \u0111\u01b0\u1ee3c khi l\u01b0\u1ee3ng d\u1eef li\u1ec7u t\u0103ng \u0111\u1ed9t bi\u1ebfn ho\u1eb7c ng\u01b0\u1eddi d\u00f9ng t\u0103ng theo th\u1eddi gian.<\/p>\n\n\n\n<p>Scalability trong Big Data kh\u00f4ng ch\u1ec9 l\u00e0 m\u1ed9t t\u00ednh n\u0103ng h\u1ec7 th\u1ed1ng, m\u00e0 l\u00e0 nguy\u00ean t\u1eafc c\u1ed1t l\u00f5i \u0111\u1ec3 \u0111\u1ea3m b\u1ea3o h\u1ec7 th\u1ed1ng d\u1eef li\u1ec7u lu\u00f4n s\u1eb5n s\u00e0ng ph\u1ee5c v\u1ee5 khi d\u1eef li\u1ec7u ph\u00e1t tri\u1ec3n theo quy m\u00f4 ho\u1eb7c t\u1ed1c \u0111\u1ed9 kh\u00f4ng l\u01b0\u1eddng tr\u01b0\u1edbc \u0111\u01b0\u1ee3c. M\u1ed9t h\u1ec7 th\u1ed1ng c\u00f3 kh\u1ea3 n\u0103ng scale t\u1ed1t l\u00e0 ti\u1ec1n \u0111\u1ec1 \u0111\u1ec3 t\u1ed5 ch\u1ee9c chuy\u1ec3n \u0111\u1ed5i s\u1ed1 v\u00e0 ra quy\u1ebft \u0111\u1ecbnh theo th\u1eddi gian th\u1ef1c m\u1ed9t c\u00e1ch hi\u1ec7u qu\u1ea3.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-s\u1ef1-khac-bi\u1ec7t-gi\u1eefa-m\u1edf-r\u1ed9ng-ngang-horizontal-scaling-va-m\u1edf-r\u1ed9ng-d\u1ecdc-vertical-scaling-la-gi\"><strong>S\u1ef1 kh\u00e1c bi\u1ec7t gi\u1eefa m\u1edf r\u1ed9ng ngang (horizontal scaling) v\u00e0 m\u1edf r\u1ed9ng d\u1ecdc (vertical scaling) l\u00e0 g\u00ec?<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Ti\u00eau ch\u00ed<\/strong><\/td><td><strong>M\u1edf r\u1ed9ng d\u1ecdc (Vertical Scaling)<\/strong><\/td><td><strong>M\u1edf r\u1ed9ng ngang (Horizontal Scaling)<\/strong><\/td><\/tr><tr><td><strong>Kh\u00e1i ni\u1ec7m<\/strong><\/td><td>N\u00e2ng c\u1ea5p c\u1ea5u h\u00ecnh m\u00e1y ch\u1ee7 hi\u1ec7n t\u1ea1i (CPU, RAM, SSD&#8230;)<\/td><td>Th\u00eam nhi\u1ec1u m\u00e1y ch\u1ee7 (node) \u0111\u1ec3 chia t\u1ea3i c\u00f4ng vi\u1ec7c<\/td><\/tr><tr><td><strong>C\u00e1ch tri\u1ec3n khai<\/strong><\/td><td>T\u0103ng s\u1ee9c m\u1ea1nh ph\u1ea7n c\u1ee9ng tr\u00ean m\u1ed9t server duy nh\u1ea5t<\/td><td>Ph\u00e2n t\u00e1n x\u1eed l\u00fd qua nhi\u1ec1u server ho\u1eb7c instance<\/td><\/tr><tr><td><strong>Chi ph\u00ed ban \u0111\u1ea7u<\/strong><\/td><td>Th\u01b0\u1eddng r\u1ebb h\u01a1n (n\u1ebfu ch\u1ec9 n\u00e2ng c\u1ea5p nh\u1ecf)<\/td><td>Cao h\u01a1n do ph\u1ea3i thi\u1ebft l\u1eadp ki\u1ebfn tr\u00fac ph\u00e2n t\u00e1n<\/td><\/tr><tr><td><strong>Gi\u1edbi h\u1ea1n<\/strong><\/td><td>B\u1ecb gi\u1edbi h\u1ea1n b\u1edfi ph\u1ea7n c\u1ee9ng v\u1eadt l\u00fd c\u1ee7a m\u00e1y<\/td><td>G\u1ea7n nh\u01b0 kh\u00f4ng gi\u1edbi h\u1ea1n \u2013 ch\u1ec9 c\u1ea7n th\u00eam node<\/td><\/tr><tr><td><strong>\u0110\u1ed9 ph\u1ee9c t\u1ea1p h\u1ec7 th\u1ed1ng<\/strong><\/td><td>Th\u1ea5p \u2013 d\u1ec5 tri\u1ec3n khai v\u00e0 b\u1ea3o tr\u00ec<\/td><td>Cao h\u01a1n \u2013 c\u1ea7n x\u1eed l\u00fd ph\u00e2n ph\u1ed1i d\u1eef li\u1ec7u, \u0111\u1ed3ng b\u1ed9, c\u00e2n b\u1eb1ng t\u1ea3i<\/td><\/tr><tr><td><strong>T\u00ednh s\u1eb5n s\u00e0ng (High Availability)<\/strong><\/td><td>Kh\u00f3 \u0111\u1ea1t \u2013 khi m\u00e1y ch\u00ednh g\u1eb7p s\u1ef1 c\u1ed1 s\u1ebd \u1ea3nh h\u01b0\u1edfng to\u00e0n h\u1ec7 th\u1ed1ng<\/td><td>D\u1ec5 \u0111\u1ea1t \u2013 c\u00f3 th\u1ec3 thi\u1ebft l\u1eadp cluster, failover, replication<\/td><\/tr><tr><td><strong>Downtime khi n\u00e2ng c\u1ea5p<\/strong><\/td><td>C\u00f3 th\u1ec3 cao (ph\u1ea3i d\u1eebng h\u1ec7 th\u1ed1ng \u0111\u1ec3 n\u00e2ng ph\u1ea7n c\u1ee9ng)<\/td><td>G\u1ea7n nh\u01b0 kh\u00f4ng c\u00f3 \u2013 c\u00f3 th\u1ec3 th\u00eam node \u0111\u1ed9ng tr\u00ean cloud<\/td><\/tr><tr><td><strong>Kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng theo th\u1eddi gian<\/strong><\/td><td>H\u1ea1n ch\u1ebf \u2013 \u0111\u1ebfn m\u1ed9t ng\u01b0\u1ee1ng nh\u1ea5t \u0111\u1ecbnh s\u1ebd kh\u00f4ng th\u1ec3 n\u00e2ng ti\u1ebfp<\/td><td>R\u1ea5t linh ho\u1ea1t \u2013 ph\u00f9 h\u1ee3p v\u1edbi h\u1ec7 th\u1ed1ng Big Data &amp; Cloud<\/td><\/tr><tr><td><strong>\u1ee8ng d\u1ee5ng ph\u00f9 h\u1ee3p<\/strong><\/td><td>App \u0111\u01a1n gi\u1ea3n, monolithic, OLAP nh\u1ecf, th\u1eed nghi\u1ec7m<\/td><td>Big Data platform, h\u1ec7 th\u1ed1ng real-time, cloud-native apps<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-vai-tro-c\u1ee7a-qu\u1ea3n-ly-sieu-d\u1eef-li\u1ec7u-metadate-va-catalog-d\u1eef-li\u1ec7u-trong-moi-tr\u01b0\u1eddng-big-data-la-gi\"><strong>Vai tr\u00f2 c\u1ee7a qu\u1ea3n l\u00fd si\u00eau d\u1eef li\u1ec7u (metadate) v\u00e0 catalog d\u1eef li\u1ec7u trong m\u00f4i tr\u01b0\u1eddng Big Data l\u00e0 g\u00ec?<\/strong><\/h4>\n\n\n\n<p>Vai tr\u00f2 c\u1ee7a hai th\u00e0nh ph\u1ea7n n\u00e0y nh\u01b0 sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>T\u0103ng kh\u1ea3 n\u0103ng t\u00ecm ki\u1ebfm v\u00e0 t\u00e1i s\u1eed d\u1ee5ng d\u1eef li\u1ec7u:<\/strong> Khi l\u00e0m vi\u1ec7c v\u1edbi h\u00e0ng tr\u0103m b\u1ea3ng d\u1eef li\u1ec7u trong data lake ho\u1eb7c data warehouse, vi\u1ec7c c\u00f3 m\u1ed9t data catalog chu\u1ea9n h\u00f3a (nh\u01b0 Apache Atlas, AWS Glue Data Catalog, ho\u1eb7c Amundsen) gi\u00fap ng\u01b0\u1eddi d\u00f9ng c\u00f3 th\u1ec3 nhanh ch\u00f3ng t\u00ecm th\u1ea5y b\u1ea3ng ph\u00f9 h\u1ee3p v\u1edbi nhu c\u1ea7u, hi\u1ec3u \u0111\u01b0\u1ee3c \u0111\u1ecbnh ngh\u0129a c\u1ed9t, lineage v\u00e0 c\u00e1c thu\u1ed9c t\u00ednh li\u00ean quan. \u0110i\u1ec1u n\u00e0y \u0111\u1eb7c bi\u1ec7t h\u1eefu \u00edch khi l\u00e0m vi\u1ec7c \u0111a nh\u00f3m ho\u1eb7c onboarding th\u00e0nh vi\u00ean m\u1edbi.<\/li>\n\n\n\n<li><strong>Cung c\u1ea5p ng\u1eef c\u1ea3nh v\u00e0 ch\u1ea5t l\u01b0\u1ee3ng cho d\u1eef li\u1ec7u: <\/strong>Metadata &#8211; \u0111\u1eb7c bi\u1ec7t l\u00e0 business metadata v\u00e0 technical metadata, gi\u00fap ng\u01b0\u1eddi d\u00f9ng hi\u1ec3u r\u00f5 d\u1eef li\u1ec7u \u0111ang ph\u1ea3n \u00e1nh \u0111i\u1ec1u g\u00ec, \u0111\u01b0\u1ee3c t\u1ea1o ra t\u1eeb \u0111\u00e2u, thay \u0111\u1ed5i nh\u01b0 th\u1ebf n\u00e0o, v\u00e0 c\u00f2n d\u00f9ng \u0111\u01b0\u1ee3c hay kh\u00f4ng. V\u1edbi metadata \u0111\u1ea7y \u0111\u1ee7, t\u00f4i c\u00f3 th\u1ec3 x\u00e1c \u0111\u1ecbnh \u0111\u01b0\u1ee3c d\u1eef li\u1ec7u n\u00e0o \u0111\u00e3 \u0111\u01b0\u1ee3c chu\u1ea9n h\u00f3a, d\u1eef li\u1ec7u n\u00e0o c\u1ea7n ki\u1ec3m tra th\u00eam, t\u1eeb \u0111\u00f3 \u0111\u01b0a ra ph\u00e2n t\u00edch ch\u00ednh x\u00e1c h\u01a1n.<\/li>\n\n\n\n<li><strong>H\u1ed7 tr\u1ee3 qu\u1ea3n tr\u1ecb v\u00e0 tu\u00e2n th\u1ee7 (governance &amp; compliance)<\/strong>: Khi l\u00e0m vi\u1ec7c trong c\u00e1c d\u1ef1 \u00e1n li\u00ean quan \u0111\u1ebfn d\u1eef li\u1ec7u nh\u1ea1y c\u1ea3m (nh\u01b0 t\u00e0i ch\u00ednh, y t\u1ebf), vi\u1ec7c qu\u1ea3n l\u00fd metadata gi\u00fap x\u00e1c \u0111\u1ecbnh th\u00f4ng tin n\u00e0o l\u00e0 PII, ai c\u00f3 quy\u1ec1n truy c\u1eadp, ho\u1eb7c c\u00e1c b\u1ea3ng n\u00e0o c\u1ea7n audit \u0111\u1ecbnh k\u1ef3. \u0110\u00e2y l\u00e0 n\u1ec1n t\u1ea3ng \u0111\u1ec3 t\u1ed5 ch\u1ee9c tu\u00e2n th\u1ee7 c\u00e1c ti\u00eau chu\u1ea9n nh\u01b0 GDPR, HIPAA,&#8230;<\/li>\n\n\n\n<li><strong>T\u0103ng hi\u1ec7u qu\u1ea3 v\u1eadn h\u00e0nh v\u00e0 debug h\u1ec7 th\u1ed1ng<\/strong>: Nh\u1edd vi\u1ec7c l\u01b0u tr\u1eef lineage (ngu\u1ed3n g\u1ed1c v\u00e0 d\u00f2ng ch\u1ea3y c\u1ee7a d\u1eef li\u1ec7u), t\u00f4i c\u00f3 th\u1ec3 d\u1ec5 d\u00e0ng truy ng\u01b0\u1ee3c l\u1ea1i c\u00e1c b\u01b0\u1edbc x\u1eed l\u00fd khi c\u00f3 v\u1ea5n \u0111\u1ec1 x\u1ea3y ra v\u1edbi d\u1eef li\u1ec7u (v\u00ed d\u1ee5 nh\u01b0 sai s\u1ed1, tr\u1ec5 batch, ho\u1eb7c l\u1ed7i ph\u00e2n t\u00edch) m\u00e0 kh\u00f4ng c\u1ea7n d\u00f2 l\u1ea1i to\u00e0n b\u1ed9 pipeline th\u1ee7 c\u00f4ng.<\/li>\n\n\n\n<li><strong>T\u1ed1i \u01b0u h\u00f3a chi ph\u00ed v\u00e0 t\u00e0i nguy\u00ean x\u1eed l\u00fd d\u1eef li\u1ec7u<\/strong>: Trong c\u00e1c h\u1ec7 th\u1ed1ng d\u1eef li\u1ec7u l\u1edbn, kh\u00f4ng ph\u1ea3i l\u00fac n\u00e0o c\u0169ng c\u1ea7n x\u1eed l\u00fd to\u00e0n b\u1ed9 dataset. Nh\u1edd metadata nh\u01b0 k\u00edch th\u01b0\u1edbc b\u1ea3ng, s\u1ed1 l\u01b0\u1ee3ng b\u1ea3n ghi, t\u1ea7n su\u1ea5t truy c\u1eadp, t\u00f4i c\u00f3 th\u1ec3 x\u00e1c \u0111\u1ecbnh b\u1ea3ng n\u00e0o n\u00ean archive, cache, ho\u1eb7c t\u1ed1i \u01b0u l\u1ea1i m\u00f4 h\u00ecnh l\u01b0u tr\u1eef, gi\u00fap gi\u1ea3m chi ph\u00ed cloud v\u00e0 c\u1ea3i thi\u1ec7n hi\u1ec7u su\u1ea5t pipeline.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-middle-senior-2\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Middle\/Senior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-khi-lam-vi\u1ec7c-v\u1edbi-h\u1ec7-th\u1ed1ng-phan-tan-b\u1ea1n-qu\u1ea3n-ly-partitioning-va-shuffling-d\u1eef-li\u1ec7u-ra-sao-d\u1ec3-t\u1ed1i-\u01b0u-hi\u1ec7u-su\u1ea5t\"><strong>Khi l\u00e0m vi\u1ec7c v\u1edbi h\u1ec7 th\u1ed1ng ph\u00e2n t\u00e1n, b\u1ea1n qu\u1ea3n l\u00fd partitioning v\u00e0 shuffling d\u1eef li\u1ec7u ra sao \u0111\u1ec3 t\u1ed1i \u01b0u hi\u1ec7u su\u1ea5t?<\/strong><\/h4>\n\n\n\n<p>\u0110\u1ed1i v\u1edbi Partitioning:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ch\u1ecdn s\u1ed1 l\u01b0\u1ee3ng partition ph\u00f9 h\u1ee3p v\u1edbi t\u00e0i nguy\u00ean h\u1ec7 th\u1ed1ng \u0111\u1ec3 t\u1eadn d\u1ee5ng t\u1ed1i \u0111a kh\u1ea3 n\u0103ng x\u1eed l\u00fd.<\/li>\n\n\n\n<li>Partition d\u1eef li\u1ec7u theo key (hash ho\u1eb7c range) gi\u00fap c\u00e1c ph\u00e9p to\u00e1n nh\u01b0 join, groupBy ch\u1ea1y nhanh h\u01a1n do gi\u1ea3m di chuy\u1ec3n d\u1eef li\u1ec7u gi\u1eefa c\u00e1c node.<\/li>\n\n\n\n<li>Lu\u00f4n ki\u1ec3m tra v\u00e0 x\u1eed l\u00fd data skew (partition qu\u00e1 l\u1edbn), tr\u00e1nh bottleneck.<\/li>\n<\/ul>\n\n\n\n<p>\u0110\u1ed1i v\u1edbi Shuffling:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>H\u1ea1n ch\u1ebf t\u1ed1i \u0111a c\u00e1c ph\u00e9p to\u00e1n g\u00e2y shuffle nh\u01b0 groupByKey, join v\u1edbi b\u1ea3ng l\u1edbn. \u01afu ti\u00ean reduceByKey ho\u1eb7c broadcast join khi c\u00f3 th\u1ec3.<\/li>\n\n\n\n<li>Profile v\u00e0 gi\u00e1m s\u00e1t c\u00e1c job \u0111\u1ec3 ph\u00e1t hi\u1ec7n \u0111i\u1ec3m ngh\u1ebdn do shuffle, t\u1eeb \u0111\u00f3 t\u1ed1i \u01b0u l\u1ea1i logic ho\u1eb7c partitioning.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-khi-schema-thay-d\u1ed5i-ho\u1eb7c-c\u1ea7n-qu\u1ea3n-ly-version-d\u1eef-li\u1ec7u-trong-kho-d\u1eef-li\u1ec7u-b\u1ea1n-s\u1ebd-x\u1eed-ly-th\u1ebf-nao\"><strong>Khi schema thay \u0111\u1ed5i ho\u1eb7c c\u1ea7n qu\u1ea3n l\u00fd version d\u1eef li\u1ec7u trong kho d\u1eef li\u1ec7u, b\u1ea1n s\u1ebd x\u1eed l\u00fd th\u1ebf n\u00e0o?<\/strong><\/h4>\n\n\n\n<p>Khi \u0111\u1ed1i m\u1eb7t v\u1edbi c\u00e1c thay \u0111\u1ed5i schema ho\u1eb7c y\u00eau c\u1ea7u qu\u1ea3n l\u00fd version d\u1eef li\u1ec7u trong kho d\u1eef li\u1ec7u, t\u00f4i th\u01b0\u1eddng \u01b0u ti\u00ean ti\u1ebfp c\u1eadn theo h\u01b0\u1edbng \u1ed5n \u0111\u1ecbnh, d\u1ec5 m\u1edf r\u1ed9ng, d\u1ec5 truy v\u1ebft. C\u1ee5 th\u1ec3, t\u00f4i \u00e1p d\u1ee5ng m\u1ed9t s\u1ed1 nguy\u00ean t\u1eafc v\u00e0 k\u1ef9 thu\u1eadt sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Thi\u1ebft k\u1ebf schema linh ho\u1ea1t &amp; c\u00f3 kh\u1ea3 n\u0103ng version h\u00f3a<\/strong>: S\u1eed d\u1ee5ng c\u00e1c m\u00f4 h\u00ecnh nh\u01b0 SCD Type 2 ho\u1eb7c Data Vault \u0111\u1ec3 \u0111\u1ea3m b\u1ea3o kh\u1ea3 n\u0103ng l\u01b0u l\u1ecbch s\u1eed thay \u0111\u1ed5i. V\u1edbi m\u1ed7i b\u1ea3n ghi, t\u00f4i th\u00eam c\u00e1c c\u1ed9t nh\u01b0 valid_from, valid_to, is_current, ho\u1eb7c record_version \u0111\u1ec3 truy v\u1ebft version v\u00e0 t\u00e1i hi\u1ec7n tr\u1ea1ng th\u00e1i d\u1eef li\u1ec7u theo t\u1eebng th\u1eddi \u0111i\u1ec3m.<\/li>\n\n\n\n<li><strong>Qu\u1ea3n l\u00fd schema b\u1eb1ng Git &amp; CI\/CD<\/strong>: To\u00e0n b\u1ed9 \u0111\u1ecbnh ngh\u0129a schema v\u00e0 pipeline ETL \u0111\u01b0a v\u00e0o Git, k\u1ebft h\u1ee3p v\u1edbi CI\/CD \u0111\u1ec3 t\u1ef1 \u0111\u1ed9ng ki\u1ec3m tra t\u00ednh t\u01b0\u01a1ng th\u00edch (schema compatibility) tr\u01b0\u1edbc khi tri\u1ec3n khai. \u0110i\u1ec1u n\u00e0y gi\u00fap h\u1ea1n ch\u1ebf l\u1ed7i schema breaking v\u00e0 d\u1ec5 rollback khi c\u1ea7n.<\/li>\n\n\n\n<li><strong>T\u00e1ch version d\u1eef li\u1ec7u b\u1eb1ng snapshot ho\u1eb7c partition<\/strong>: V\u1edbi nh\u1eefng b\u1ea3ng l\u1edbn ho\u1eb7c d\u1eef li\u1ec7u c\u1ea7n audit, l\u01b0u th\u00eam c\u00e1c snapshot theo ng\u00e0y (snapshot_date) ho\u1eb7c theo logic nghi\u1ec7p v\u1ee5 (version_id) \u0111\u1ec3 c\u00f3 th\u1ec3 th\u1ef1c hi\u1ec7n truy v\u1ea5n theo th\u1eddi gian ho\u1eb7c so s\u00e1nh c\u00e1c b\u1ea3n ghi gi\u1eefa c\u00e1c version.<\/li>\n\n\n\n<li><strong>\u00c1p d\u1ee5ng schema evolution khi c\u1ea7n thi\u1ebft<\/strong>: Trong c\u00e1c h\u1ec7 th\u1ed1ng s\u1eed d\u1ee5ng Spark ho\u1eb7c Delta Lake, b\u1eadt t\u00ednh n\u0103ng schema evolution \u0111\u1ec3 h\u1ec7 th\u1ed1ng c\u00f3 th\u1ec3 t\u1ef1 c\u1eadp nh\u1eadt schema khi d\u1eef li\u1ec7u m\u1edbi \u0111\u01b0\u1ee3c ingest. Tuy nhi\u00ean, lu\u00f4n k\u00e8m theo b\u01b0\u1edbc ki\u1ec3m tra schema diff \u0111\u1ec3 \u0111\u1ea3m b\u1ea3o an to\u00e0n.<\/li>\n\n\n\n<li><strong>\u0110\u1ea3m b\u1ea3o backward compatibility<\/strong>: Tr\u01b0\u1edbc khi thay \u0111\u1ed5i schema, ki\u1ec3m tra t\u00e1c \u0111\u1ed9ng \u0111\u1ebfn c\u00e1c pipeline downstream. N\u1ebfu c\u00f3 s\u1ef1 ph\u1ee5 thu\u1ed9c, s\u1eed d\u1ee5ng c\u00e1c l\u1edbp view trung gian \u0111\u1ec3 duy tr\u00ec kh\u1ea3 n\u0103ng t\u01b0\u01a1ng th\u00edch trong th\u1eddi gian chuy\u1ec3n \u0111\u1ed5i.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-b\u1ea1n-th\u01b0\u1eddng-ap-d\u1ee5ng-nh\u1eefng-cach-nao-d\u1ec3-tang-t\u1ed1c-d\u1ed9-truy-v\u1ea5n-sql\"><strong>B\u1ea1n th\u01b0\u1eddng \u00e1p d\u1ee5ng nh\u1eefng c\u00e1ch n\u00e0o \u0111\u1ec3 t\u0103ng t\u1ed1c \u0111\u1ed9 truy v\u1ea5n SQL?<\/strong><\/h4>\n\n\n\n<p>M\u1ed9t s\u1ed1 k\u1ef9 thu\u1eadt th\u01b0\u1eddng s\u1eed d\u1ee5ng \u0111\u1ec3 t\u1ed1i \u01b0u h\u00f3a hi\u1ec7u n\u0103ng truy v\u1ea5n:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>S\u1eed d\u1ee5ng ch\u1ec9 m\u1ee5c (Index) h\u1ee3p l\u00fd:<\/strong> T\u1ea1o index cho c\u00e1c c\u1ed9t th\u01b0\u1eddng xuy\u00ean d\u00f9ng trong WHERE, JOIN ho\u1eb7c ORDER BY gi\u00fap truy v\u1ea5n nhanh h\u01a1n r\u1ea5t nhi\u1ec1u. Tuy nhi\u00ean, c\u1ea7n tr\u00e1nh l\u1ea1m d\u1ee5ng index v\u00ec s\u1ebd l\u00e0m ch\u1eadm qu\u00e1 tr\u00ecnh INSERT, UPDATE.<\/li>\n\n\n\n<li><strong>H\u1ea1n ch\u1ebf s\u1eed d\u1ee5ng SELECT *<\/strong>: Thay v\u00ec d\u00f9ng SELECT *, ch\u1ec9 \u0111\u1ecbnh r\u00f5 c\u00e1c c\u1ed9t c\u1ea7n thi\u1ebft. \u0110i\u1ec1u n\u00e0y gi\u00fap gi\u1ea3m l\u01b0\u1ee3ng d\u1eef li\u1ec7u truy\u1ec1n t\u1ea3i kh\u00f4ng c\u1ea7n thi\u1ebft v\u00e0 ti\u1ebft ki\u1ec7m t\u00e0i nguy\u00ean x\u1eed l\u00fd.<\/li>\n\n\n\n<li><strong>Tr\u00e1nh truy v\u1ea5n l\u1ed3ng nhau kh\u00f4ng c\u1ea7n thi\u1ebft:<\/strong> C\u00e1c subquery kh\u00f4ng t\u1ed1i \u01b0u (\u0111\u1eb7c bi\u1ec7t l\u00e0 nh\u1eefng subquery l\u1eb7p l\u1ea1i trong SELECT ho\u1eb7c WHERE) c\u00f3 th\u1ec3 l\u00e0m ch\u1eadm to\u00e0n b\u1ed9 truy v\u1ea5n. T\u00f4i th\u01b0\u1eddng thay th\u1ebf b\u1eb1ng JOIN ho\u1eb7c s\u1eed d\u1ee5ng WITH (CTE) \u0111\u1ec3 c\u1ea3i thi\u1ec7n hi\u1ec7u n\u0103ng.<\/li>\n\n\n\n<li><strong>T\u1ed1i \u01b0u h\u00f3a JOIN<\/strong>: S\u1eafp x\u1ebfp th\u1ee9 t\u1ef1 b\u1ea3ng t\u1eeb nh\u1ecf \u0111\u1ebfn l\u1edbn theo chi\u1ec1u JOIN, tr\u00e1nh JOIN kh\u00f4ng c\u1ea7n thi\u1ebft v\u00e0 lu\u00f4n \u0111\u1ea3m b\u1ea3o c\u00f3 \u0111i\u1ec1u ki\u1ec7n ON r\u00f5 r\u00e0ng, s\u1eed d\u1ee5ng INNER JOIN thay v\u00ec LEFT JOIN n\u1ebfu kh\u00f4ng c\u1ea7n d\u1eef li\u1ec7u b\u00ean tr\u00e1i b\u1eaft bu\u1ed9c.<\/li>\n\n\n\n<li><strong>S\u1eed d\u1ee5ng ph\u00e2n v\u00f9ng (Partition):<\/strong> khi l\u00e0m vi\u1ec7c v\u1edbi c\u00e1c b\u1ea3ng l\u1edbn, t\u1eadn d\u1ee5ng partitioning \u0111\u1ec3 chia nh\u1ecf b\u1ea3ng theo ng\u00e0y ho\u1eb7c danh m\u1ee5c, t\u1eeb \u0111\u00f3 gi\u00fap truy v\u1ea5n ch\u1ec9 t\u00ecm trong ph\u00e2n v\u00f9ng li\u00ean quan thay v\u00ec to\u00e0n b\u1ed9 d\u1eef li\u1ec7u.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u00e2y c\u0169ng ch\u00ednh l\u00e0 m\u1ed9t trong nh\u1eefng c\u00e2u h\u1ecfi th\u01b0\u1eddng g\u1eb7p trong b\u1ed9 <a href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-analyst\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Analyst<\/strong><\/a> v\u00e0 <strong><a href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-data-engineer\/\" target=\"_blank\" rel=\"noreferrer noopener\">C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Data Engineer<\/a><\/strong> m\u00e0 b\u1ea1n n\u00ean tham kh\u1ea3o.<\/em> <\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer-v\u1ec1-kinh-nghi\u1ec7m-lam-vi\u1ec7c-va-th\u1ef1c-chi\u1ebfn\"><span class=\"ez-toc-section\" id=\"Cau_hoi_phong_van_Big_Data_Engineer_ve_kinh_nghiem_lam_viec_va_thuc_chien\"><\/span><strong>C\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer v\u1ec1<\/strong> <strong>kinh nghi\u1ec7m l\u00e0m vi\u1ec7c v\u00e0 th\u1ef1c chi\u1ebfn<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-fresher-junior-4\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Fresher\/Junior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-theo-b\u1ea1n-di\u1ec1u-gi-quan-tr\u1ecdng-h\u01a1n-d\u1eef-li\u1ec7u-ch\u1ea5t-l\u01b0\u1ee3ng-hay-mo-hinh-t\u1ed1t-gi\u1ea3i-thich-ly-do\"><strong>Theo b\u1ea1n, \u0111i\u1ec1u g\u00ec quan tr\u1ecdng h\u01a1n: d\u1eef li\u1ec7u ch\u1ea5t l\u01b0\u1ee3ng hay m\u00f4 h\u00ecnh t\u1ed1t? Gi\u1ea3i th\u00edch l\u00fd do.<\/strong><\/h4>\n\n\n\n<p><strong>Nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>T\u01b0 duy \u01b0u ti\u00ean v\u00e0 ra quy\u1ebft \u0111\u1ecbnh c\u1ee7a b\u1ea1n trong b\u1ed1i c\u1ea3nh th\u1ef1c t\u1ebf.<\/li>\n\n\n\n<li>M\u1ee9c \u0111\u1ed9 am hi\u1ec3u c\u1ee7a b\u1ea1n v\u1ec1 quy tr\u00ecnh l\u00e0m vi\u1ec7c v\u1edbi d\u1eef li\u1ec7u: b\u1ea1n c\u00f3 \u0111ang ch\u1ea1y theo thu\u1eadt to\u00e1n hay hi\u1ec3u vai tr\u00f2 c\u1ee7a ch\u1ea5t l\u01b0\u1ee3ng \u0111\u1ea7u v\u00e0o?<\/li>\n\n\n\n<li>C\u00e1ch b\u1ea1n l\u00fd gi\u1ea3i v\u00e0 b\u1ea3o v\u1ec7 quan \u0111i\u1ec3m c\u1ee7a m\u00ecnh.<\/li>\n<\/ul>\n\n\n\n<p><strong>C\u00e2u tr\u1ea3 l\u1eddi g\u1ee3i \u00fd:<\/strong><\/p>\n\n\n\n<p>Theo t\u00f4i, d\u1eef li\u1ec7u ch\u1ea5t l\u01b0\u1ee3ng quan tr\u1ecdng h\u01a1n m\u00f4 h\u00ecnh t\u1ed1t. L\u00fd do l\u00e0 v\u00ec ngay c\u1ea3 m\u00f4 h\u00ecnh ph\u1ee9c t\u1ea1p nh\u1ea5t c\u0169ng kh\u00f4ng th\u1ec3 t\u1ea1o ra k\u1ebft qu\u1ea3 \u0111\u00e1ng tin c\u1eady n\u1ebfu d\u1eef li\u1ec7u \u0111\u1ea7u v\u00e0o sai l\u1ec7ch, nhi\u1ec5u lo\u1ea1n ho\u1eb7c thi\u1ebfu th\u00f4ng tin.<\/p>\n\n\n\n<p>M\u1ed9t nguy\u00ean t\u1eafc quen thu\u1ed9c trong l\u0129nh v\u1ef1c d\u1eef li\u1ec7u l\u00e0: \u2018Garbage in, garbage out\u2019 \u2013 n\u1ebfu \u0111\u1ea7u v\u00e0o l\u00e0 d\u1eef li\u1ec7u kh\u00f4ng ch\u00ednh x\u00e1c, th\u00ec \u0111\u1ea7u ra c\u0169ng s\u1ebd kh\u00f4ng th\u1ec3 \u0111\u00e1ng tin c\u1eady, d\u00f9 m\u00f4 h\u00ecnh c\u00f3 t\u1ed1i \u01b0u \u0111\u1ebfn \u0111\u00e2u.<\/p>\n\n\n\n<p>T\u00f4i t\u1eebng l\u00e0m m\u1ed9t d\u1ef1 \u00e1n ph\u00e2n t\u00edch h\u00e0nh vi ng\u01b0\u1eddi d\u00f9ng cho m\u1ed9t n\u1ec1n t\u1ea3ng th\u01b0\u01a1ng m\u1ea1i \u0111i\u1ec7n t\u1eed. Ban \u0111\u1ea7u, nh\u00f3m c\u1ee7a t\u00f4i th\u1eed nhi\u1ec1u m\u00f4 h\u00ecnh t\u1eeb \u0111\u01a1n gi\u1ea3n \u0111\u1ebfn ph\u1ee9c t\u1ea1p nh\u01b0 Random Forest, XGBoost, nh\u01b0ng k\u1ebft qu\u1ea3 v\u1eabn ch\u01b0a c\u1ea3i thi\u1ec7n. Sau \u0111\u00f3, ch\u00fang t\u00f4i t\u1eadp trung r\u00e0 so\u00e1t l\u1ea1i to\u00e0n b\u1ed9 d\u1eef li\u1ec7u: x\u1eed l\u00fd d\u1eef li\u1ec7u tr\u00f9ng, chu\u1ea9n h\u00f3a \u0111\u1ecbnh d\u1ea1ng, v\u00e0 \u0111\u1eb7c bi\u1ec7t l\u00e0 x\u00e2y d\u1ef1ng l\u1ea1i feature v\u1ec1 h\u00e0nh vi ng\u01b0\u1eddi d\u00f9ng theo th\u1eddi gian. K\u1ebft qu\u1ea3 l\u00e0 m\u00f4 h\u00ecnh Logistic Regression \u0111\u01a1n gi\u1ea3n nh\u01b0ng ch\u1ea1y tr\u00ean t\u1eadp d\u1eef li\u1ec7u \u201cs\u1ea1ch\u201d l\u1ea1i mang v\u1ec1 \u0111\u1ed9 ch\u00ednh x\u00e1c v\u00e0 F1-score t\u1ed1t h\u01a1n c\u1ea3 XGBoost tr\u01b0\u1edbc \u0111\u00f3.<\/p>\n\n\n\n<p>T\u1ea5t nhi\u00ean, m\u00f4 h\u00ecnh v\u1eabn r\u1ea5t quan tr\u1ecdng, \u0111\u1eb7c bi\u1ec7t trong c\u00e1c b\u00e0i to\u00e1n ph\u1ee9c t\u1ea1p nh\u01b0 x\u1eed l\u00fd \u1ea3nh ho\u1eb7c NLP. Nh\u01b0ng n\u1ebfu b\u1eaft bu\u1ed9c ph\u1ea3i ch\u1ecdn, t\u00f4i \u01b0u ti\u00ean d\u1eef li\u1ec7u ch\u1ea5t l\u01b0\u1ee3ng tr\u01b0\u1edbc. V\u00ec v\u1edbi d\u1eef li\u1ec7u t\u1ed1t, b\u1ea1n lu\u00f4n c\u00f3 th\u1ec3 c\u1ea3i ti\u1ebfn d\u1ea7n m\u00f4 h\u00ecnh theo th\u1eddi gian; c\u00f2n n\u1ebfu d\u1eef li\u1ec7u k\u00e9m, m\u00f4 h\u00ecnh t\u1ed1t \u0111\u1ebfn \u0111\u00e2u c\u0169ng kh\u00f3 c\u1ee9u v\u00e3n.\u201d<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-trong-tr\u01b0\u1eddng-h\u1ee3p-kho-xac-d\u1ecbnh-bug-b\u1ea1n-th\u01b0\u1eddng-ap-d\u1ee5ng-ph\u01b0\u01a1ng-phap-nao-d\u1ec3-tim-ra-nguyen-nhan\"><strong>Trong tr\u01b0\u1eddng h\u1ee3p kh\u00f3 x\u00e1c \u0111\u1ecbnh bug, b\u1ea1n th\u01b0\u1eddng \u00e1p d\u1ee5ng ph\u01b0\u01a1ng ph\u00e1p n\u00e0o \u0111\u1ec3 t\u00ecm ra nguy\u00ean nh\u00e2n?<\/strong><\/h4>\n\n\n\n<p><strong>Nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>T\u01b0 duy ph\u00e2n t\u00edch v\u00e0 quy tr\u00ecnh suy lu\u1eadn l\u1ed7i c\u1ee7a b\u1ea1n: B\u1ea1n c\u00f3 l\u00e0m vi\u1ec7c logic, c\u00f3 h\u1ec7 th\u1ed1ng, hay ch\u1ec9 x\u1eed l\u00fd theo c\u1ea3m t\u00ednh?<\/li>\n\n\n\n<li>Kh\u1ea3 n\u0103ng debug v\u00e0 mindset gi\u1ea3i quy\u1ebft v\u1ea5n \u0111\u1ec1 th\u1ef1c t\u1ebf: B\u1ea1n x\u1eed l\u00fd bug ki\u1ec3u \u201c\u0111o\u00e1n m\u00f2\u201d hay bi\u1ebft c\u00e1ch ki\u1ec3m so\u00e1t ph\u1ea1m vi ki\u1ec3m tra?<\/li>\n<\/ul>\n\n\n\n<p><strong>G\u1ee3i \u00fd c\u00e2u tr\u1ea3 l\u1eddi:<\/strong><\/p>\n\n\n\n<p>Khi g\u1eb7p bug kh\u00f3 x\u00e1c \u0111\u1ecbnh, t\u00f4i th\u01b0\u1eddng \u00e1p d\u1ee5ng ph\u01b0\u01a1ng ph\u00e1p \u2018khoanh v\u00f9ng + ki\u1ec3m ch\u1ee9ng\u2019 t\u1eebng gi\u1ea3 thuy\u1ebft m\u1ed9t c\u00e1ch c\u00f3 h\u1ec7 th\u1ed1ng. Quy tr\u00ecnh c\u1ee5 th\u1ec3 nh\u01b0 sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Reproduce bug: Tr\u01b0\u1edbc ti\u00ean, t\u00f4i lu\u00f4n c\u1ed1 g\u1eafng t\u00e1i hi\u1ec7n l\u1ea1i bug m\u1ed9t c\u00e1ch \u1ed5n \u0111\u1ecbnh, b\u1eb1ng c\u00e1ch ghi l\u1ea1i input, m\u00f4i tr\u01b0\u1eddng, h\u00e0nh vi ng\u01b0\u1eddi d\u00f9ng n\u1ebfu c\u00f3.<\/li>\n\n\n\n<li>\u0110\u1ecdc l\u1ea1i log v\u00e0 theo d\u00f5i c\u00e1c error trace: T\u00f4i d\u00f9ng log \u1edf c\u00e1c m\u1ed1c logic \u0111\u1ec3 x\u00e1c \u0111\u1ecbnh bug x\u1ea3y ra \u1edf giai \u0111o\u1ea1n n\u00e0o: input \u2192 x\u1eed l\u00fd \u2192 l\u01b0u tr\u1eef \u2192 hi\u1ec3n th\u1ecb. Nhi\u1ec1u bug kh\u00f4ng do code sai m\u00e0 do d\u1eef li\u1ec7u \u0111\u1ea7u v\u00e0o b\u1ea5t th\u01b0\u1eddng ho\u1eb7c kh\u00e1c k\u1ef3 v\u1ecdng.<\/li>\n\n\n\n<li>Ki\u1ec3m tra t\u1eebng \u2018hypothesis\u2019 m\u1ed9t: T\u00f4i li\u1ec7t k\u00ea c\u00e1c nguy\u00ean nh\u00e2n c\u00f3 th\u1ec3 x\u1ea3y ra (v\u00ed d\u1ee5: sai logic x\u1eed l\u00fd, d\u1eef li\u1ec7u null, l\u1ed7i type cast, race condition\u2026) v\u00e0 lo\u1ea1i tr\u1eeb t\u1eebng c\u00e1i b\u1eb1ng c\u00e1ch vi\u1ebft test nh\u1ecf ho\u1eb7c log r\u00f5 h\u01a1n.<\/li>\n\n\n\n<li>So s\u00e1nh v\u1edbi phi\u00ean b\u1ea3n tr\u01b0\u1edbc (n\u1ebfu c\u00f3): N\u1ebfu bug m\u1edbi xu\u1ea5t hi\u1ec7n, t\u00f4i ki\u1ec3m tra git diff \u0111\u1ec3 xem commit n\u00e0o c\u00f3 th\u1ec3 li\u00ean quan. \u0110i\u1ec1u n\u00e0y gi\u00fap t\u00f4i r\u00fat ng\u1eafn th\u1eddi gian ph\u00e2n t\u00edch \u0111\u00e1ng k\u1ec3.<\/li>\n\n\n\n<li>T\u00ecm ki\u1ebfm c\u00e1c v\u1ea5n \u0111\u1ec1 t\u01b0\u01a1ng t\u1ef1: V\u1edbi c\u00e1c l\u1ed7i ph\u1ee9c t\u1ea1p ho\u1eb7c li\u00ean quan th\u01b0 vi\u1ec7n ngo\u00e0i, t\u00f4i tra c\u1ee9u tr\u00ean GitHub Issues, Stack Overflow ho\u1eb7c changelog c\u1ee7a th\u01b0 vi\u1ec7n \u0111\u1ec3 xem c\u00f3 ai g\u1eb7p l\u1ed7i t\u01b0\u01a1ng t\u1ef1 ch\u01b0a.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cau-h\u1ecfi-danh-cho-middle-senior-1\"><strong>C\u00e2u h\u1ecfi d\u00e0nh cho Middle\/Senior<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-n\u1ebfu-d\u01b0\u1ee3c-giao-thi\u1ebft-k\u1ebf-h\u1ec7-th\u1ed1ng-x\u1eed-ly-d\u1eef-li\u1ec7u-phan-tan-b\u1ea1n-s\u1ebd-ti\u1ebfp-c\u1eadn-ra-sao\"><strong>N\u1ebfu \u0111\u01b0\u1ee3c giao thi\u1ebft k\u1ebf h\u1ec7 th\u1ed1ng x\u1eed l\u00fd d\u1eef li\u1ec7u ph\u00e2n t\u00e1n, b\u1ea1n s\u1ebd ti\u1ebfp c\u1eadn ra sao?<\/strong><\/h4>\n\n\n\n<p><strong>Nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>T\u01b0 duy h\u1ec7 th\u1ed1ng v\u00e0 c\u00e1ch b\u1ea1n ti\u1ebfp c\u1eadn v\u1ea5n \u0111\u1ec1 t\u1eeb t\u1ed5ng th\u1ec3 \u0111\u1ebfn chi ti\u1ebft.<\/li>\n\n\n\n<li>Hi\u1ec3u bi\u1ebft th\u1ef1c t\u1ebf c\u1ee7a b\u1ea1n v\u1ec1 c\u00e1c th\u00e0nh ph\u1ea7n ch\u00ednh trong h\u1ec7 sinh th\u00e1i d\u1eef li\u1ec7u l\u1edbn (big data).<\/li>\n\n\n\n<li>Kh\u1ea3 n\u0103ng c\u00e2n nh\u1eafc gi\u1eefa t\u00ednh \u0111\u00fang \u0111\u1eafn \u2013 hi\u1ec7u n\u0103ng \u2013 kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng \u2013 \u0111\u1ed9 tin c\u1eady.<\/li>\n<\/ul>\n\n\n\n<p><strong>G\u1ee3i \u00fd c\u00e2u tr\u1ea3 l\u1eddi:<\/strong><\/p>\n\n\n\n<p>N\u1ebfu \u0111\u01b0\u1ee3c giao thi\u1ebft k\u1ebf m\u1ed9t h\u1ec7 th\u1ed1ng x\u1eed l\u00fd d\u1eef li\u1ec7u ph\u00e2n t\u00e1n, t\u00f4i s\u1ebd b\u1eaft \u0111\u1ea7u b\u1eb1ng c\u00e1ch l\u00e0m r\u00f5 b\u00e0i to\u00e1n t\u1eeb g\u00f3c \u0111\u1ed9 kinh doanh v\u00e0 d\u1eef li\u1ec7u. T\u00f4i s\u1ebd \u0111\u1eb7t c\u00e2u h\u1ecfi: h\u1ec7 th\u1ed1ng c\u1ea7n x\u1eed l\u00fd d\u1eef li\u1ec7u batch hay streaming? D\u1eef li\u1ec7u \u0111\u1ebfn t\u1eeb \u0111\u00e2u, c\u00f3 c\u1ea7n x\u1eed l\u00fd th\u1eddi gian th\u1ef1c kh\u00f4ng, kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u m\u1ed7i ng\u00e0y l\u00e0 bao nhi\u00eau? \u0110\u1ed9 tr\u1ec5 ch\u1ea5p nh\u1eadn \u0111\u01b0\u1ee3c l\u00e0 bao l\u00e2u? V\u00e0 ai s\u1ebd s\u1eed d\u1ee5ng k\u1ebft qu\u1ea3 \u0111\u1ea7u ra \u2013 data analyst, h\u1ec7 th\u1ed1ng recommendation, hay dashboard BI?<\/p>\n\n\n\n<p>Sau khi hi\u1ec3u r\u00f5 y\u00eau c\u1ea7u, t\u00f4i s\u1ebd chia h\u1ec7 th\u1ed1ng th\u00e0nh c\u00e1c l\u1edbp ch\u00ednh:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ingestion layer: N\u1ebfu d\u1eef li\u1ec7u \u0111\u1ebfn li\u00ean t\u1ee5c t\u1eeb nhi\u1ec1u ngu\u1ed3n (log, sensor, event tracking&#8230;), t\u00f4i s\u1ebd d\u00f9ng Kafka \u0111\u1ec3 thu nh\u1eadn theo c\u01a1 ch\u1ebf pub\/sub, \u0111\u1ea3m b\u1ea3o m\u1edf r\u1ed9ng t\u1ed1t v\u00e0 ch\u1ed1ng m\u1ea5t d\u1eef li\u1ec7u.<\/li>\n\n\n\n<li>Storage layer: V\u1edbi batch data, t\u00f4i c\u00f3 th\u1ec3 ch\u1ecdn S3 ho\u1eb7c HDFS. N\u1ebfu x\u1eed l\u00fd streaming ho\u1eb7c c\u1ea7n ph\u1ea3n h\u1ed3i nhanh, t\u00f4i \u01b0u ti\u00ean Redis ho\u1eb7c m\u1ed9t c\u01a1 s\u1edf d\u1eef li\u1ec7u NoSQL nh\u01b0 Cassandra.<\/li>\n\n\n\n<li>Processing layer:\n<ul class=\"wp-block-list\">\n<li>N\u1ebfu l\u00e0 x\u1eed l\u00fd h\u00e0ng lo\u1ea1t, t\u00f4i d\u00f9ng Spark v\u00ec n\u00f3 c\u00f3 kh\u1ea3 n\u0103ng x\u1eed l\u00fd ph\u00e2n t\u00e1n m\u1ea1nh, h\u1ed7 tr\u1ee3 c\u1ea3 batch v\u00e0 streaming.<\/li>\n\n\n\n<li>N\u1ebfu y\u00eau c\u1ea7u th\u1eddi gian th\u1ef1c v\u00e0 low-latency h\u01a1n, t\u00f4i s\u1ebd xem x\u00e9t Flink ho\u1eb7c Kafka Streams \u2013 \u0111\u1eb7c bi\u1ec7t khi c\u00f3 nhi\u1ec1u tr\u1ea1ng th\u00e1i c\u1ea7n duy tr\u00ec (stateful stream).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Serving layer: Tu\u1ef3 m\u1ee5c ti\u00eau \u0111\u1ea7u ra, d\u1eef li\u1ec7u sau x\u1eed l\u00fd s\u1ebd \u0111\u01b0\u1ee3c \u0111\u1ea9y v\u00e0o BigQuery, Snowflake ho\u1eb7c t\u1ea1o API \u0111\u1ec3 h\u1ec7 th\u1ed1ng kh\u00e1c ti\u00eau th\u1ee5.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-trong-m\u1ed9t-d\u1ef1-an-b\u1ea1n-da-t\u1eebng-t\u1ed1i-\u01b0u-pipeline-ra-sao-d\u1ec3-d\u1ea1t-hi\u1ec7u-qu\u1ea3-t\u1ed1t-h\u01a1n\"><strong>Trong m\u1ed9t d\u1ef1 \u00e1n, b\u1ea1n \u0111\u00e3 t\u1eebng t\u1ed1i \u01b0u pipeline ra sao \u0111\u1ec3 \u0111\u1ea1t hi\u1ec7u qu\u1ea3 t\u1ed1t h\u01a1n?<\/strong><\/h4>\n\n\n\n<p><strong>Nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kh\u1ea3 n\u0103ng nh\u1eadn di\u1ec7n bottleneck v\u00e0 c\u1ea3i ti\u1ebfn quy tr\u00ecnh l\u00e0m vi\u1ec7c v\u1edbi d\u1eef li\u1ec7u.<\/li>\n\n\n\n<li>T\u01b0 duy t\u1ed1i \u01b0u h\u00f3a c\u00f3 h\u1ec7 th\u1ed1ng, kh\u00f4ng ch\u1ec9 \u201cch\u1ea1y \u0111\u01b0\u1ee3c l\u00e0 xong\u201d.<\/li>\n\n\n\n<li>K\u1ef9 n\u0103ng k\u1ef9 thu\u1eadt th\u1ef1c chi\u1ebfn: b\u1ea1n bi\u1ebft d\u00f9ng c\u00f4ng c\u1ee5 g\u00ec, t\u1ed1i \u01b0u \u1edf \u0111\u00e2u \u2013 ingestion, transform, hay load?<\/li>\n\n\n\n<li>T\u01b0 duy \u0111\u00e1nh \u0111\u1ed5i (trade-off): t\u1ed1c \u0111\u1ed9 \u2013 chi ph\u00ed \u2013 \u0111\u1ed9 ch\u00ednh x\u00e1c.<\/li>\n<\/ul>\n\n\n\n<p><strong>G\u1ee3i \u00fd c\u00e2u tr\u1ea3 l\u1eddi:<\/strong><\/p>\n\n\n\n<p>Trong m\u1ed9t d\u1ef1 \u00e1n g\u1ea7n \u0111\u00e2y, t\u00f4i \u0111\u01b0\u1ee3c giao t\u1ed1i \u01b0u h\u00f3a pipeline ETL x\u1eed l\u00fd d\u1eef li\u1ec7u h\u00e0nh vi ng\u01b0\u1eddi d\u00f9ng cho h\u1ec7 th\u1ed1ng recommendation. Pipeline ban \u0111\u1ea7u \u0111\u01b0\u1ee3c vi\u1ebft b\u1eb1ng Python thu\u1ea7n, ch\u1ea1y h\u1eb1ng ng\u00e0y tr\u00ean m\u1ed9t cron job \u0111\u01a1n l\u1ebb, x\u1eed l\u00fd d\u1eef li\u1ec7u t\u1eeb 3 ngu\u1ed3n kh\u00e1c nhau, l\u01b0u v\u00e0o PostgreSQL. Tuy nhi\u00ean, th\u1eddi gian ch\u1ea1y k\u00e9o d\u00e0i \u0111\u1ebfn h\u01a1n 2 ti\u1ebfng v\u00e0 th\u01b0\u1eddng xuy\u00ean b\u1ecb l\u1ed7i timeout ho\u1eb7c duplicate ghi v\u00e0o DB.<\/p>\n\n\n\n<p>T\u00f4i b\u1eaft \u0111\u1ea7u b\u1eb1ng c\u00e1ch chia nh\u1ecf quy tr\u00ecnh x\u1eed l\u00fd v\u00e0 \u0111o th\u1eddi gian t\u1eebng b\u01b0\u1edbc \u0111\u1ec3 x\u00e1c \u0111\u1ecbnh bottleneck. K\u1ebft qu\u1ea3 cho th\u1ea5y ph\u1ea7n l\u1edbn th\u1eddi gian b\u1ecb ti\u00eau t\u1ed1n \u1edf kh\u00e2u x\u1eed l\u00fd d\u1eef li\u1ec7u d\u1ea1ng JSON v\u00e0 ghi l\u1ea7n l\u01b0\u1ee3t t\u1eebng b\u1ea3n ghi v\u00e0o c\u01a1 s\u1edf d\u1eef li\u1ec7u.<\/p>\n\n\n\n<p>T\u00f4i \u0111\u00e3 \u00e1p d\u1ee5ng m\u1ed9t s\u1ed1 thay \u0111\u1ed5i nh\u01b0 sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>D\u00f9ng Pandas \u0111\u1ec3 batch process d\u1eef li\u1ec7u thay v\u00ec for-loop t\u1eebng d\u00f2ng.<\/li>\n\n\n\n<li>Chuy\u1ec3n t\u1eeb ghi tu\u1ea7n t\u1ef1 sang batch insert b\u1eb1ng SQLAlchemy ho\u1eb7c COPY command \u0111\u1ec3 gi\u1ea3m s\u1ed1 l\u01b0\u1ee3ng transaction.<\/li>\n\n\n\n<li>T\u00e1ch pipeline th\u00e0nh t\u1eebng task nh\u1ecf v\u00e0 orchestration l\u1ea1i b\u1eb1ng Airflow, gi\u00fap d\u1ec5 ki\u1ec3m so\u00e1t l\u1ed7i v\u00e0 retry theo t\u1eebng b\u01b0\u1edbc ri\u00eang l\u1ebb.<\/li>\n\n\n\n<li>Th\u00eam checkpoint v\u00e0 logging chu\u1ea9n h\u00f3a, gi\u00fap monitor hi\u1ec7u su\u1ea5t v\u00e0 debug d\u1ec5 h\u01a1n khi ph\u00e1t sinh s\u1ef1 c\u1ed1.<\/li>\n<\/ul>\n\n\n\n<p>Sau t\u1ed1i \u01b0u, th\u1eddi gian x\u1eed l\u00fd gi\u1ea3m t\u1eeb h\u01a1n 2 ti\u1ebfng xu\u1ed1ng c\u00f2n kho\u1ea3ng 25 ph\u00fat. \u0110\u1ed3ng th\u1eddi, t\u1ef7 l\u1ec7 job th\u1ea5t b\u1ea1i gi\u1ea3m \u0111\u00e1ng k\u1ec3 nh\u1edd kh\u1ea3 n\u0103ng retry th\u00f4ng minh c\u1ee7a Airflow. V\u1ec1 l\u00e2u d\u00e0i, t\u00f4i c\u00f2n \u0111\u1ec1 xu\u1ea5t chuy\u1ec3n sang l\u01b0u tr\u1eef t\u1ea1m b\u1eb1ng S3 v\u00e0 s\u1eed d\u1ee5ng Athena \u0111\u1ec3 truy v\u1ea5n nhanh \u2013 gi\u00fap gi\u1ea3m t\u1ea3i PostgreSQL khi data scale l\u1edbn h\u01a1n.<\/p>\n\n\n\n<p>B\u00e0i h\u1ecdc t\u00f4i r\u00fat ra l\u00e0: thay v\u00ec t\u1eadp trung t\u1ed1i \u01b0u t\u1eebng d\u00f2ng code nh\u1ecf, c\u1ea7n nh\u00ecn t\u1ed5ng th\u1ec3 pipeline v\u00e0 x\u00e1c \u0111\u1ecbnh \u0111\u00fang \u0111i\u1ec3m ngh\u1ebdn, \u0111\u1ed3ng th\u1eddi ch\u1ecdn \u0111\u00fang c\u00f4ng c\u1ee5 cho t\u1eebng nhi\u1ec7m v\u1ee5.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-theo-b\u1ea1n-ba-b\u01b0\u1edbc-quan-tr\u1ecdng-nh\u1ea5t-khi-xay-d\u1ef1ng-gi\u1ea3i-phap-big-data-la-gi\"><strong>Theo b\u1ea1n, ba b\u01b0\u1edbc quan tr\u1ecdng nh\u1ea5t khi x\u00e2y d\u1ef1ng gi\u1ea3i ph\u00e1p Big Data l\u00e0 g\u00ec?<\/strong><\/h4>\n\n\n\n<p><strong>Nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>C\u00e1ch b\u1ea1n t\u01b0 duy h\u1ec7 th\u1ed1ng d\u1eef li\u1ec7u \u1edf c\u1ea5p \u0111\u1ed9 l\u1edbn, kh\u00f4ng ch\u1ec9 v\u00e0i b\u1ea3ng ho\u1eb7c file.<\/li>\n\n\n\n<li>M\u1ee9c \u0111\u1ed9 hi\u1ec3u bi\u1ebft c\u1ee7a b\u1ea1n v\u1ec1 quy tr\u00ecnh, t\u1eeb ingestion \u0111\u1ebfn ph\u00e2n t\u00edch trong m\u00f4i tr\u01b0\u1eddng Big Data.<\/li>\n\n\n\n<li>Kh\u1ea3 n\u0103ng x\u00e1c \u0111\u1ecbnh \u0111\u00fang \u01b0u ti\u00ean<\/li>\n<\/ul>\n\n\n\n<p><strong>G\u1ee3i \u00fd c\u00e2u tr\u1ea3 l\u1eddi:<\/strong><\/p>\n\n\n\n<p>Theo t\u00f4i, khi x\u00e2y d\u1ef1ng m\u1ed9t gi\u1ea3i ph\u00e1p Big Data, c\u00f3 r\u1ea5t nhi\u1ec1u b\u01b0\u1edbc c\u1ea7n tri\u1ec3n khai, nh\u01b0ng n\u1ebfu ch\u1ecdn 3 b\u01b0\u1edbc quan tr\u1ecdng nh\u1ea5t, t\u00f4i s\u1ebd \u01b0u ti\u00ean nh\u01b0 sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Thi\u1ebft k\u1ebf ki\u1ebfn tr\u00fac ingest v\u00e0 l\u01b0u tr\u1eef ph\u00f9 h\u1ee3p ngay t\u1eeb \u0111\u1ea7u<\/strong>: \u0110\u00e2y l\u00e0 b\u01b0\u1edbc n\u1ec1n m\u00f3ng. Ch\u1ecdn sai c\u00e1ch ingest (batch\/stream), ho\u1eb7c l\u01b0u tr\u1eef kh\u00f4ng t\u1ed1i \u01b0u (file format, partitioning) c\u00f3 th\u1ec3 khi\u1ebfn h\u1ec7 th\u1ed1ng scale l\u00ean s\u1ebd g\u1eb7p bottleneck.<\/li>\n\n\n\n<li><strong>L\u1ef1a ch\u1ecdn c\u00f4ng c\u1ee5 x\u1eed l\u00fd d\u1eef li\u1ec7u ph\u00e2n t\u00e1n ph\u00f9 h\u1ee3p v\u1edbi m\u1ee5c ti\u00eau<\/strong>: Kh\u00f4ng ph\u1ea3i l\u00fac n\u00e0o Spark c\u0169ng l\u00e0 c\u00e2u tr\u1ea3 l\u1eddi. N\u1ebfu c\u1ea7n low-latency streaming, t\u00f4i s\u1ebd xem x\u00e9t Flink ho\u1eb7c Kafka Streams. N\u1ebfu x\u1eed l\u00fd batch l\u1edbn m\u00e0 kh\u00f4ng c\u1ea7n real-time, Spark l\u00e0 l\u1ef1a ch\u1ecdn m\u1ea1nh m\u1ebd.<\/li>\n\n\n\n<li><strong>\u0110\u1ea3m b\u1ea3o kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng v\u00e0 quan s\u00e1t h\u1ec7 th\u1ed1ng<\/strong>: M\u1ed9t h\u1ec7 th\u1ed1ng Big Data t\u1ed1t kh\u00f4ng ch\u1ec9 \u201cch\u1ea1y \u0111\u01b0\u1ee3c\u201d m\u00e0 ph\u1ea3i d\u1ec5 m\u1edf r\u1ed9ng, d\u1ec5 debug v\u00e0 gi\u00e1m s\u00e1t. V\u00ec v\u1eady t\u00f4i lu\u00f4n \u01b0u ti\u00ean t\u00edch h\u1ee3p logging t\u1eadp trung, alerting, v\u00e0 dashboard theo d\u00f5i c\u00e1c ch\u1ec9 s\u1ed1 hi\u1ec7u su\u1ea5t \u2013 \u0111\u1eb7c bi\u1ec7t khi x\u1eed l\u00fd d\u1eef li\u1ec7u real-time.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-theo-kinh-nghi\u1ec7m-c\u1ee7a-b\u1ea1n-cac-r\u1ee7i-ro-b\u1ea3o-m\u1eadt-th\u01b0\u1eddng-g\u1eb7p-khi-tri\u1ec3n-khai-big-data-la-gi\"><strong>Theo kinh nghi\u1ec7m c\u1ee7a b\u1ea1n, c\u00e1c r\u1ee7i ro b\u1ea3o m\u1eadt th\u01b0\u1eddng g\u1eb7p khi tri\u1ec3n khai Big Data l\u00e0 g\u00ec?<\/strong><\/h4>\n\n\n\n<p><strong>Nh\u00e0 tuy\u1ec3n d\u1ee5ng mu\u1ed1n \u0111\u00e1nh gi\u00e1:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Nh\u1eadn th\u1ee9c c\u1ee7a b\u1ea1n v\u1ec1 kh\u00eda c\u1ea1nh b\u1ea3o m\u1eadt \u2013 m\u1ed9t ph\u1ea7n th\u01b0\u1eddng b\u1ecb xem nh\u1eb9 trong h\u1ec7 th\u1ed1ng d\u1eef li\u1ec7u.<\/li>\n\n\n\n<li>Kinh nghi\u1ec7m th\u1ef1c t\u1ebf khi tri\u1ec3n khai ho\u1eb7c qu\u1ea3n l\u00fd pipeline d\u1eef li\u1ec7u l\u1edbn.<\/li>\n\n\n\n<li>Kh\u1ea3 n\u0103ng l\u01b0\u1eddng tr\u01b0\u1edbc v\u00e0 ph\u00f2ng tr\u00e1nh s\u1ef1 c\u1ed1 v\u1ec1 d\u1eef li\u1ec7u<\/li>\n<\/ul>\n\n\n\n<p><strong>G\u1ee3i \u00fd c\u00e2u tr\u1ea3 l\u1eddi:<\/strong><\/p>\n\n\n\n<p>Theo kinh nghi\u1ec7m c\u00e1 nh\u00e2n, khi tri\u1ec3n khai h\u1ec7 th\u1ed1ng Big Data, t\u00f4i nh\u1eadn th\u1ea5y c\u00f3 m\u1ed9t s\u1ed1 r\u1ee7i ro b\u1ea3o m\u1eadt ph\u1ed5 bi\u1ebfn sau m\u00e0 n\u1ebfu kh\u00f4ng ki\u1ec3m so\u00e1t t\u1ed1t s\u1ebd g\u00e2y h\u1eadu qu\u1ea3 nghi\u00eam tr\u1ecdng v\u1ec1 c\u1ea3 ph\u00e1p l\u00fd l\u1eabn uy t\u00edn doanh nghi\u1ec7p:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>R\u00f2 r\u1ec9 d\u1eef li\u1ec7u nh\u1ea1y c\u1ea3m do thi\u1ebfu ph\u00e2n quy\u1ec1n ho\u1eb7c m\u00e3 h\u00f3a: R\u1ea5t nhi\u1ec1u h\u1ec7 th\u1ed1ng Big Data ingest tr\u1ef1c ti\u1ebfp t\u1eeb nhi\u1ec1u ngu\u1ed3n nh\u01b0 log ng\u01b0\u1eddi d\u00f9ng, thi\u1ebft b\u1ecb IoT, ho\u1eb7c d\u1eef li\u1ec7u t\u00e0i ch\u00ednh. N\u1ebfu kh\u00f4ng c\u00f3 c\u01a1 ch\u1ebf ph\u00e2n quy\u1ec1n truy c\u1eadp theo vai tr\u00f2 (RBAC), r\u1ea5t d\u1ec5 x\u1ea3y ra t\u00ecnh tr\u1ea1ng nh\u00e2n vi\u00ean truy c\u1eadp \u0111\u01b0\u1ee3c c\u1ea3 th\u00f4ng tin h\u1ecd kh\u00f4ng n\u00ean th\u1ea5y. Ngo\u00e0i ra, vi\u1ec7c kh\u00f4ng m\u00e3 h\u00f3a d\u1eef li\u1ec7u khi l\u01b0u tr\u1eef (at-rest) ho\u1eb7c khi truy\u1ec1n (in-transit) c\u0169ng l\u00e0 m\u1ed9t l\u1ed7 h\u1ed5ng l\u1edbn \u2013 \u0111\u1eb7c bi\u1ec7t v\u1edbi c\u00e1c h\u1ec7 th\u1ed1ng x\u1eed l\u00fd PII ho\u1eb7c d\u1eef li\u1ec7u nh\u1ea1y c\u1ea3m.<\/li>\n\n\n\n<li>Kh\u00f4ng gi\u00e1m s\u00e1t v\u00e0 audit \u0111\u1ea7y \u0111\u1ee7 c\u00e1c h\u00e0nh vi truy c\u1eadp d\u1eef li\u1ec7u: M\u1ed9t l\u1ed7 h\u1ed5ng th\u01b0\u1eddng b\u1ecb b\u1ecf qua l\u00e0 thi\u1ebfu h\u1ec7 th\u1ed1ng logging, monitoring v\u00e0 audit trail cho c\u00e1c ho\u1ea1t \u0111\u1ed9ng tr\u00ean d\u1eef li\u1ec7u.<\/li>\n\n\n\n<li>Qu\u00e1 ph\u1ee5 thu\u1ed9c v\u00e0o c\u00f4ng c\u1ee5, b\u1ecf qua nguy\u00ean t\u1eafc b\u1ea3o m\u1eadt c\u01a1 b\u1ea3n: Nhi\u1ec1u nh\u00f3m k\u1ef9 thu\u1eadt qu\u00e1 tin v\u00e0o h\u1ec7 th\u1ed1ng nh\u01b0 Spark, Kafka, Hadoop \u0111\u00e3 \u0111\u01b0\u1ee3c \u201cc\u00e0i s\u1eb5n\u201d m\u00e0 qu\u00ean ki\u1ec3m tra l\u1ea1i c\u1ea5u h\u00ecnh b\u1ea3o m\u1eadt. V\u00ed d\u1ee5, Kafka cluster kh\u00f4ng b\u1eadt SSL, ho\u1eb7c Hadoop m\u1edf port m\u1eb7c \u0111\u1ecbnh kh\u00f4ng gi\u1edbi h\u1ea1n IP c\u00f3 th\u1ec3 b\u1ecb khai th\u00e1c t\u1eeb b\u00ean ngo\u00e0i.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-t\u1ed5ng-k\u1ebft-cau-h\u1ecfi-ph\u1ecfng-v\u1ea5n-big-data-engineer\"><span class=\"ez-toc-section\" id=\"Tong_ket_cau_hoi_phong_van_Big_Data_Engineer\"><\/span><strong>T\u1ed5ng k\u1ebft c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Big Data Engineer l\u00e0 m\u1ed9t trong nh\u1eefng v\u1ecb tr\u00ed then ch\u1ed1t trong th\u1eddi \u0111\u1ea1i d\u1eef li\u1ec7u ng\u00e0y nay. \u0110\u1ec3 v\u01b0\u1ee3t qua v\u00f2ng ph\u1ecfng v\u1ea5n th\u00e0nh c\u00f4ng, \u1ee9ng vi\u00ean kh\u00f4ng ch\u1ec9 c\u1ea7n n\u1eafm v\u1eefng ki\u1ebfn th\u1ee9c k\u1ef9 thu\u1eadt v\u1ec1 h\u1ec7 sinh th\u00e1i Hadoop, Apache Spark, c\u00e1c h\u1ec7 th\u1ed1ng x\u1eed l\u00fd d\u1eef li\u1ec7u streaming, m\u00e0 c\u00f2n ph\u1ea3i c\u00f3 t\u01b0 duy h\u1ec7 th\u1ed1ng, kh\u1ea3 n\u0103ng t\u1ed1i \u01b0u h\u00f3a, v\u00e0 kinh nghi\u1ec7m th\u1ef1c t\u1ebf trong x\u1eed l\u00fd d\u1eef li\u1ec7u quy m\u00f4 l\u1edbn.<\/p>\n\n\n\n<p>B\u1ed9 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer trong b\u00e0i vi\u1ebft n\u00e0y kh\u00f4ng ch\u1ec9 gi\u00fap b\u1ea1n \u00f4n luy\u1ec7n ki\u1ebfn th\u1ee9c c\u1ed1t l\u00f5i, m\u00e0 c\u00f2n h\u1ed7 tr\u1ee3 b\u1ea1n r\u00e8n luy\u1ec7n t\u01b0 duy ph\u1ea3n bi\u1ec7n, gi\u1ea3i quy\u1ebft v\u1ea5n \u0111\u1ec1 v\u00e0 th\u1ec3 hi\u1ec7n kinh nghi\u1ec7m l\u00e0m vi\u1ec7c m\u1ed9t c\u00e1ch thuy\u1ebft ph\u1ee5c. H\u00e3y d\u00e0nh th\u1eddi gian luy\u1ec7n t\u1eadp, chu\u1ea9n b\u1ecb v\u00ed d\u1ee5 c\u1ee5 th\u1ec3 t\u1eeb c\u00e1c d\u1ef1 \u00e1n b\u1ea1n t\u1eebng tham gia, v\u00e0 t\u1ef1 tin chia s\u1ebb c\u00e1ch b\u1ea1n ti\u1ebfp c\u1eadn, gi\u1ea3i quy\u1ebft c\u00e1c th\u00e1ch th\u1ee9c trong th\u1ef1c t\u1ebf.<\/p>\n\n\n\n<p>Ch\u00fac b\u1ea1n th\u00e0nh c\u00f4ng v\u00e0 s\u1edbm chinh ph\u1ee5c v\u1ecb tr\u00ed Big Data Engineer m\u00e0 b\u1ea1n mong mu\u1ed1n!<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u0110\u1ecdc chi ti\u1ebft: <strong><a href=\"https:\/\/itviec.com\/blog\/big-data-engineer-roadmap\/\" target=\"_blank\" rel=\"noreferrer noopener\">Big Data Engineer Roadmap: L\u1ed9 tr\u00ecnh h\u1ecdc t\u1eadp v\u00e0 ph\u00e1t tri\u1ec3n t\u1eeb A-Z<\/a><\/strong><\/em><\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Big Data \u0111ang tr\u1edf th\u00e0nh m\u1ed9t xu h\u01b0\u1edbng kh\u00f4ng th\u1ec3 thi\u1ebfu \u0111\u1ed1i v\u1edbi doanh nghi\u1ec7p trong k\u1ef7 nguy\u00ean d\u1eef li\u1ec7u. N\u1ebfu b\u1ea1n s\u1eafp tham gia ph\u1ecfng v\u1ea5n cho v\u1ecb tr\u00ed Big Data Engineer, b\u00e0i vi\u1ebft sau \u0111\u00e2y s\u1ebd t\u1ed5ng h\u1ee3p b\u1ed9 c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer nh\u1eb1m gi\u00fap b\u1ea1n chu\u1ea9n b\u1ecb k\u1ef9 l\u01b0\u1ee1ng [&hellip;]<\/p>\n","protected":false},"author":247,"featured_media":90267,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_gspb_post_css":"","footnotes":""},"categories":[10345,105,94],"tags":[],"class_list":["post-90220","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analyst-data-engineer","category-phong-van-it","category-su-nghiep-it"],"blocksy_meta":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.8 (Yoast SEO v27.7) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn - ITviec Blog<\/title>\n<meta name=\"description\" content=\"Tham kh\u1ea3o 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn t\u1eeb ki\u1ebfn th\u1ee9c Hadoop, Spark, x\u1eed l\u00fd d\u1eef li\u1ec7u v\u00e0 kinh nghi\u1ec7m th\u1ef1c chi\u1ebfn.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/\" \/>\n<meta property=\"og:locale\" content=\"vi_VN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn\" \/>\n<meta property=\"og:description\" content=\"Big Data \u0111ang tr\u1edf th\u00e0nh m\u1ed9t xu h\u01b0\u1edbng kh\u00f4ng th\u1ec3 thi\u1ebfu \u0111\u1ed1i v\u1edbi doanh nghi\u1ec7p trong k\u1ef7 nguy\u00ean d\u1eef li\u1ec7u. N\u1ebfu b\u1ea1n s\u1eafp tham gia ph\u1ecfng v\u1ea5n cho v\u1ecb tr\u00ed Big Data\" \/>\n<meta property=\"og:url\" content=\"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/\" \/>\n<meta property=\"og:site_name\" content=\"ITviec Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/ITviec\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-31T15:09:30+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-31T15:09:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/cau-hoi-phong-van-big-data-engineer-scaled.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1347\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Th\u1ee7y C\u00fac\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@ITviec\" \/>\n<meta name=\"twitter:site\" content=\"@ITviec\" \/>\n<meta name=\"twitter:label1\" content=\"\u0110\u01b0\u1ee3c vi\u1ebft b\u1edfi\" \/>\n\t<meta name=\"twitter:data1\" content=\"Th\u1ee7y C\u00fac\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u01af\u1edbc t\u00ednh th\u1eddi gian \u0111\u1ecdc\" \/>\n\t<meta name=\"twitter:data2\" content=\"52 ph\u00fat\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn - ITviec Blog","description":"Tham kh\u1ea3o 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn t\u1eeb ki\u1ebfn th\u1ee9c Hadoop, Spark, x\u1eed l\u00fd d\u1eef li\u1ec7u v\u00e0 kinh nghi\u1ec7m th\u1ef1c chi\u1ebfn.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/","og_locale":"vi_VN","og_type":"article","og_title":"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn","og_description":"Big Data \u0111ang tr\u1edf th\u00e0nh m\u1ed9t xu h\u01b0\u1edbng kh\u00f4ng th\u1ec3 thi\u1ebfu \u0111\u1ed1i v\u1edbi doanh nghi\u1ec7p trong k\u1ef7 nguy\u00ean d\u1eef li\u1ec7u. N\u1ebfu b\u1ea1n s\u1eafp tham gia ph\u1ecfng v\u1ea5n cho v\u1ecb tr\u00ed Big Data","og_url":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/","og_site_name":"ITviec Blog","article_publisher":"https:\/\/www.facebook.com\/ITviec","article_published_time":"2025-07-31T15:09:30+00:00","article_modified_time":"2025-07-31T15:09:34+00:00","og_image":[{"width":2560,"height":1347,"url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/cau-hoi-phong-van-big-data-engineer-scaled.png","type":"image\/png"}],"author":"Th\u1ee7y C\u00fac","twitter_card":"summary_large_image","twitter_creator":"@ITviec","twitter_site":"@ITviec","twitter_misc":{"\u0110\u01b0\u1ee3c vi\u1ebft b\u1edfi":"Th\u1ee7y C\u00fac","\u01af\u1edbc t\u00ednh th\u1eddi gian \u0111\u1ecdc":"52 ph\u00fat"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#article","isPartOf":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/"},"author":{"name":"Th\u1ee7y C\u00fac","@id":"https:\/\/itviec.com\/blog\/#\/schema\/person\/c8886a21239e42a8518930575eb56e01"},"headline":"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn","datePublished":"2025-07-31T15:09:30+00:00","dateModified":"2025-07-31T15:09:34+00:00","mainEntityOfPage":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/"},"wordCount":14170,"publisher":{"@id":"https:\/\/itviec.com\/blog\/#organization"},"image":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#primaryimage"},"thumbnailUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/cau-hoi-phong-van-big-data-engineer-scaled.png","articleSection":["Data Analyst \/ Data Engineer","Ph\u1ecfng v\u1ea5n IT","S\u1ef1 nghi\u1ec7p IT"],"inLanguage":"vi"},{"@type":"WebPage","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/","url":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/","name":"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn - ITviec Blog","isPartOf":{"@id":"https:\/\/itviec.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#primaryimage"},"image":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#primaryimage"},"thumbnailUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/cau-hoi-phong-van-big-data-engineer-scaled.png","datePublished":"2025-07-31T15:09:30+00:00","dateModified":"2025-07-31T15:09:34+00:00","description":"Tham kh\u1ea3o 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn t\u1eeb ki\u1ebfn th\u1ee9c Hadoop, Spark, x\u1eed l\u00fd d\u1eef li\u1ec7u v\u00e0 kinh nghi\u1ec7m th\u1ef1c chi\u1ebfn.","breadcrumb":{"@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#breadcrumb"},"inLanguage":"vi","potentialAction":[{"@type":"ReadAction","target":["https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/"]}]},{"@type":"ImageObject","inLanguage":"vi","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#primaryimage","url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/cau-hoi-phong-van-big-data-engineer-scaled.png","contentUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/cau-hoi-phong-van-big-data-engineer-scaled.png","width":800,"height":421,"caption":"c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n big data engineer - itviec blog"},{"@type":"BreadcrumbList","@id":"https:\/\/itviec.com\/blog\/cau-hoi-phong-van-big-data-engineer\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"S\u1ef1 nghi\u1ec7p IT","item":"https:\/\/itviec.com\/blog\/su-nghiep-it\/"},{"@type":"ListItem","position":2,"name":"Top 30+ c\u00e2u h\u1ecfi ph\u1ecfng v\u1ea5n Big Data Engineer ph\u1ed5 bi\u1ebfn"}]},{"@type":"WebSite","@id":"https:\/\/itviec.com\/blog\/#website","url":"https:\/\/itviec.com\/blog\/","name":"ITviec Blog","description":"IT Jobs &amp; People in Vietnam","publisher":{"@id":"https:\/\/itviec.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/itviec.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"vi"},{"@type":"Organization","@id":"https:\/\/itviec.com\/blog\/#organization","name":"ITviec","url":"https:\/\/itviec.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"vi","@id":"https:\/\/itviec.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2018\/12\/itviec-black-square-facebook.png","contentUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2018\/12\/itviec-black-square-facebook.png","width":1800,"height":1800,"caption":"ITviec"},"image":{"@id":"https:\/\/itviec.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/ITviec","https:\/\/x.com\/ITviec","https:\/\/www.linkedin.com\/company\/itviec","https:\/\/www.youtube.com\/channel\/UCYthAQ3bcGr57M_ag5gHDvQ"]},{"@type":"Person","@id":"https:\/\/itviec.com\/blog\/#\/schema\/person\/c8886a21239e42a8518930575eb56e01","name":"Th\u1ee7y C\u00fac","image":{"@type":"ImageObject","inLanguage":"vi","@id":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/dvthuycuc_ava-scaled-e1751357915570-200x185.jpg","url":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/dvthuycuc_ava-scaled-e1751357915570-200x185.jpg","contentUrl":"https:\/\/itviec.com\/blog\/wp-content\/uploads\/2025\/07\/dvthuycuc_ava-scaled-e1751357915570-200x185.jpg","caption":"Th\u1ee7y C\u00fac"},"url":"https:\/\/itviec.com\/blog\/author\/thuy-cuc\/"}]}},"_links":{"self":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts\/90220","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/users\/247"}],"replies":[{"embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/comments?post=90220"}],"version-history":[{"count":4,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts\/90220\/revisions"}],"predecessor-version":[{"id":90268,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/posts\/90220\/revisions\/90268"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/media\/90267"}],"wp:attachment":[{"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/media?parent=90220"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/categories?post=90220"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itviec.com\/blog\/wp-json\/wp\/v2\/tags?post=90220"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}