A Comprehensive Review on Data Science Frameworks for Big Data Analytics

Authors

  • Hassan Raza Washington university of science and technology, USA Author
  • Tsendayush Erdenetsogt University of the Potomac, USA Author
  • A Singh University of North America (UoNA), USA Author
  • Mazhar Farooq Southern New Hampshire University Author
  • Muhammad Mohsin Kabeer Gannon University Author
  • Muhammad Shahrukh Aslam Concordia University, USA Author

DOI:

https://doi.org/10.62671/perfect.v3i1.217

Keywords:

Big Data, Data Science Frameworks, Hadoop, Spark, Real-Time Analytics

Abstract

The importance of big data analytics is now essential in deriving insights in large and complex information in various industries. This review discusses major data science frameworks, such as Apache Hadoop, Spark, Flink, and Storm, their architecture, capabilities, and a relative advantage of processing batches and in real-time. It also presents major challenges that can affect the framework efficiency, including scalability, latency, and heterogeneity of data, security, and the complexity of operational, among others. Lastly, the new trends such as the adoption of AI, cloud-native architecture, real-time streaming, and intelligent automation are discussed to demonstrate the changing environment. This review gives an in-depth insight into the concept of big data frameworks and how they facilitate the achievement of effective analytics.

References

Abuqabita, F., Al-Omoush, R., & Alwidian, J. (2019). A comparative study on big data analytics frameworks, data resources and challenges. Modern Applied Science, 13(7), 1–14.

Abuqabita, F., Al-Omoush, R., & Alwidian, J. (2019). A comparative study on big data analytics frameworks, data resources and challenges. Modern Applied Science, 13(7), 1–14. DOI: https://doi.org/10.5539/mas.v13n7p1

Acharjya, D. P., & Ahmed, K. (2016). A survey on big data analytics: Challenges, open research issues and tools. International Journal of Advanced Computer Science and Applications, 7(2), 511–518. DOI: https://doi.org/10.14569/IJACSA.2016.070267

Ahmed, A., Xi, R., Hou, M., Shah, S. A., & Hameed, S. (2023). Harnessing big data analytics for healthcare: A comprehensive review of frameworks, implications, applications, and impacts. IEEE Access, 11, 112891–112928. DOI: https://doi.org/10.1109/ACCESS.2023.3323574

Ahn, J. S., Jhung, K., Oh, J., Heo, J., Kim, J.-J., & Park, J. Y. (2022). Association of resting-state theta–gamma coupling with selective visual attention in children with tic disorders. Frontiers in Human Neuroscience, 16, 1017703. DOI: https://doi.org/10.3389/fnhum.2022.1017703

Akil, B., Zhou, Y., & Röhm, U. (2017). On the usability of Hadoop MapReduce, Apache Spark and Apache Flink for data science. In Proceedings of the IEEE International Conference on Big Data (pp. 303–310). IEEE. DOI: https://doi.org/10.1109/BigData.2017.8257938

Ali, I. M. S., & Hariprasad, D. (2023). Hyper-heuristic salp swarm optimization of multi-kernel support vector machines for big data classification. International Journal of Information Technology, 15(2), 651–663. DOI: https://doi.org/10.1007/s41870-022-01141-2

Al-Omoush, K. S., Garcia-Monleon, F., & Mas Iglesias, J. M. (2024). Exploring the interaction between big data analytics, frugal innovation, and competitive agility: The mediating role of organizational learning. Technological Forecasting and Social Change, 200, 123188. DOI: https://doi.org/10.1016/j.techfore.2023.123188

Alosert, H., Savery, J., Rheaume, J., Cheeks, M., Turner, R., Spencer, C., Farid, S. S., & Goldrick, S. (2022). Data integrity within the biopharmaceutical sector in the era of Industry 4.0. Biotechnology Journal, 17(6), 2100609. DOI: https://doi.org/10.1002/biot.202100609

Al-Sai, Z. A., Husin, M. H., Syed-Mohamad, S. M., Abdin, R. M. S., Damer, N., Abualigah, L., & Gandomi, A. H. (2022). Explore big data analytics applications and opportunities: A review. Big Data and Cognitive Computing, 6(4), 157. DOI: https://doi.org/10.3390/bdcc6040157

Al-Salim, A. M., El-Gorashi, T. E. H., Lawey, A. Q., & Elmirghani, J. M. H. (2018). Greening big data networks: Velocity impact. IET Optoelectronics, 12(3), 126–135. DOI: https://doi.org/10.1049/iet-opt.2016.0165

Altuwairiqi, M. (2023). Combining extreme learning machine through random projections for dimensional information taxonomy and assembling. In Proceedings of the IEEE International Conference on Innovations in High Speed Communication and Signal Processing (pp. 488–491). IEEE. DOI: https://doi.org/10.1109/IHCSP56702.2023.10127156

Alwadi, M., Chetty, G., & Yamin, M. (2023). A framework for vehicle quality evaluation based on interpretable machine learning. International Journal of Information Technology, 15(1), 129–136. DOI: https://doi.org/10.1007/s41870-022-01121-6

Amalina, F., Hashem, I. A. T., Azizul, Z. H., Fong, A. T., Firdaus, A., Imran, M., & Anuar, N. B. (2019). Blending big data analytics: Review on challenges and a recent study. IEEE Access, 8, 3629–3645. DOI: https://doi.org/10.1109/ACCESS.2019.2923270

Arowoogun, J. O., Babawarun, O., Chidi, R., Adeniyi, A. O., & Okolo, C. A. (2024). A comprehensive review of data analytics in healthcare management: Leveraging big data for decision-making. World Journal of Advanced Research and Reviews, 21(2), 1810–1821. DOI: https://doi.org/10.30574/wjarr.2024.21.2.0590

Ayvaz, S., & Alpay, K. (2021). Predictive maintenance system for production lines in manufacturing: A machine learning approach using IoT data in real time. Expert Systems with Applications, 173, 114598. DOI: https://doi.org/10.1016/j.eswa.2021.114598

Backhoff, O., & Ntoutsi, E. (2016). Scalable online-offline stream clustering in Apache Spark. In Proceedings of the IEEE International Conference on Data Mining Workshops (pp. 37–44). IEEE. DOI: https://doi.org/10.1109/ICDMW.2016.0014

Bansal, M., Chana, I., & Clarke, S. (2020). A survey on IoT big data: Current status, 13 V’s challenges, and future directions. ACM Computing Surveys, 53(6), 1–59. DOI: https://doi.org/10.1145/3419634

Ben Atitallah, S., Driss, M., Boulila, W., & Ben Ghézala, H. (2020). Leveraging deep learning and IoT big data analytics to support smart cities development: Review and future directions. Computer Science Review, 38, 100303. DOI: https://doi.org/10.1016/j.cosrev.2020.100303

Ben Hamida, S., Benjelloun, G., & Hmida, H. (2021). Trends of evolutionary machine learning to address big data mining. In Proceedings of the International Conference on Information and Knowledge Systems (pp. 85–99). Springer. DOI: https://doi.org/10.1007/978-3-030-85977-0_7

Bhatia, S., & Kumar, R. (2018). Review of graph processing frameworks. In Proceedings of the IEEE International Conference on Data Mining Workshops (pp. 998–1005). IEEE. DOI: https://doi.org/10.1109/ICDMW.2018.00144

Brendel, M., Su, C., Bai, Z., Zhang, H., Elemento, O., & Wang, F. (2022). Application of deep learning on single-cell RNA sequencing data analysis: A review. Genomics, Proteomics & Bioinformatics, 20(5), 814–835. DOI: https://doi.org/10.1016/j.gpb.2022.11.011

Briard, T., Jean, C., Aoussat, A., & Véron, P. (2023). Challenges for data-driven design in early physical product design: A scientific and industrial perspective. Computers in Industry, 145, 103814. DOI: https://doi.org/10.1016/j.compind.2022.103814

Calude, C. S., & Longo, G. (2017). The deluge of spurious correlations in big data. Foundations of Science, 22(3), 595–612. DOI: https://doi.org/10.1007/s10699-016-9489-4

Cao, L. (2017). Data science: A comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 1–42. DOI: https://doi.org/10.1145/3076253

Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2015). Apache Flink: Stream and batch processing in a single engine. IEEE Data Engineering Bulletin, 38(4), 28–38.

Chen, C. L. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275, 314–347. DOI: https://doi.org/10.1016/j.ins.2014.01.015

Chen, Y., Hong, Z., & Yang, X. (2023). Cost-sensitive online adaptive kernel learning for large-scale imbalanced classification. IEEE Transactions on Knowledge and Data Engineering, 35(10), 10554–10568. DOI: https://doi.org/10.1109/TKDE.2023.3266648

Chopra, M., Singh, S. K., Gupta, A., Aggarwal, K., Gupta, B. B., & Colace, F. (2022). Analysis and prognosis of sustainable development goals using big data-based approach during COVID-19 pandemic. Sustainable Technology and Entrepreneurship, 1(2), 100012. DOI: https://doi.org/10.1016/j.stae.2022.100012

Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. DOI: https://doi.org/10.1145/1327452.1327492

Deepa, N., Pham, Q.-V., Nguyen, D. C., Bhattacharya, S., Gadekallu, T. R., Maddikunta, P. K. R., Fang, F., & Pathirana, P. N. (2022). A survey on blockchain for big data: Approaches, opportunities, and future directions. Future Generation Computer Systems, 131, 209–226. DOI: https://doi.org/10.1016/j.future.2022.01.017

Dhifli, W., Aridhi, S., & Mephu Nguifo, E. (2017). MR-SimLab: Scalable subgraph selection with label similarity for big data. Information Systems, 69, 155–163. DOI: https://doi.org/10.1016/j.is.2017.05.006

Dicuonzo, G., Galeone, G., Zappimbulso, E., & Dell’Atti, V. (2019). Risk management 4.0: The role of big data analytics in the bank sector. International Journal of Economics and Financial Issues, 9(6), 40–47. DOI: https://doi.org/10.32479/ijefi.8556

Diouf, P. S., Boly, A., & Ndiaye, S. (2018). Variety of data in the ETL processes in the cloud: State of the art. In Proceedings of the IEEE International Conference on Innovative Research and Development (pp. 1–5). IEEE. DOI: https://doi.org/10.1109/ICIRD.2018.8376308

Domann, J., Meiners, J., Helmers, L., & Lommatzsch, A. (2016). Real-time news recommendations using Apache Spark. In Proceedings of CLEF (pp. 628–641).

Dundar, M., Krishnapuram, B., Bi, J., & Rao, R. B. (2007). Learning classifiers when the training data is not IID. In Proceedings of the International Joint Conference on Artificial Intelligence (pp. 756–761).

Elser, B., & Montresor, A. (2013). An evaluation study of big data frameworks for graph processing. In Proceedings of the IEEE International Conference on Big Data (pp. 60–67). IEEE. DOI: https://doi.org/10.1109/BigData.2013.6691555

Emmanuel, I., & Stanier, C. (2016). Defining big data. In Proceedings of the International Conference on Big Data and Advanced Wireless Technologies (pp. 1–6). DOI: https://doi.org/10.1145/3010089.3010090

Galetsi, P., Katsaliaki, K., & Kumar, S. (2019). Values, challenges and future directions of big data analytics in healthcare: A systematic review. Social Science & Medicine, 241, 112533. DOI: https://doi.org/10.1016/j.socscimed.2019.112533

Imran, S., Mahmood, T., Morshed, A., & Sellis, T. (2021). Big data analytics in healthcare—A systematic literature review and roadmap for practical implementation. IEEE/CAA Journal of Automatica Sinica, 8(1), 1–22. DOI: https://doi.org/10.1109/JAS.2020.1003384

Khanra, S., Dhir, A., Islam, A. K. M. N., & Mäntymäki, M. (2020). Big data analytics in healthcare: A systematic literature review. Enterprise Information Systems, 14(7), 878–912. DOI: https://doi.org/10.1080/17517575.2020.1812005

Mohamed, A., Najafabadi, M. K., Wah, Y. B., Zaman, E. A., & Maskat, R. (2020). The state of the art and taxonomy of big data analytics: View from new big data framework. Artificial Intelligence Review, 53(2), 989–1037. DOI: https://doi.org/10.1007/s10462-019-09685-9

Nazir, S., Khan, S., Khan, H. U., Ali, S., García-Magariño, I., Atan, R. B., & Nawaz, M. (2020). A comprehensive analysis of healthcare big data management, analytics and scientific programming. IEEE Access, 8, 95714–95733. DOI: https://doi.org/10.1109/ACCESS.2020.2995572

Ochuba, N. A., Amoo, O. O., Okafor, E. S., Akinrinola, O., & Usman, F. O. (2024). Strategies for leveraging big data and analytics for business development: A comprehensive review across sectors. Computer Science & IT Research Journal, 5(3), 562–575. DOI: https://doi.org/10.51594/csitrj.v5i3.861

Olaniyi, O. O., Okunleye, O. J., & Olabanji, S. O. (2023). Advancing data-driven decision-making in smart cities through big data analytics: A comprehensive review of existing literature. Current Journal of Applied Science and Technology, 42(25), 10–18. DOI: https://doi.org/10.9734/cjast/2023/v42i254181

Pedro, F. (2023). A review of data mining, big data analytics, and machine learning approaches. Journal of Computational and Natural Sciences, 3, 169–181. DOI: https://doi.org/10.53759/181X/JCNS202303016

Rane, N. L., Paramesha, M., Choudhary, S. P., & Rane, J. (2024). Machine learning and deep learning for big data analytics: A review of methods and applications. Partners Universal International Innovation Journal, 2(3), 172–197. DOI: https://doi.org/10.2139/ssrn.4835655

Sakr, S., & Elgammal, A. (2016). Towards a comprehensive data analytics framework for smart healthcare services. Big Data Research, 4, 44–58. DOI: https://doi.org/10.1016/j.bdr.2016.05.002

Shahnawaz, M., & Kumar, M. (2025). A comprehensive survey on big data analytics: Characteristics, tools and techniques. ACM Computing Surveys, 57(8), 1–33. DOI: https://doi.org/10.1145/3718364

Szymańska, E. (2018). Modern data science for analytical chemical data: A comprehensive review. Analytica Chimica Acta, 1028, 1–10. DOI: https://doi.org/10.1016/j.aca.2018.05.038

Tandon, A., Dhir, A., Islam, A. K. M. N., & Mäntymäki, M. (2020). Blockchain in healthcare: A systematic literature review, synthesizing framework and future research agenda. Computers in Industry, 122, 103290. DOI: https://doi.org/10.1016/j.compind.2020.103290

Thayyib, P. V., Mamilla, R., Khan, M., Fatima, H., Asim, M., Anwar, I., Shamsudheen, M. K., & Khan, M. A. (2023). State-of-the-art of artificial intelligence and big data analytics reviews in five different domains: A bibliometric summary. Sustainability, 15(5), 4026. DOI: https://doi.org/10.3390/su15054026

Downloads

Published

2026-01-06

How to Cite

Raza, H. ., Erdenetsogt, T. ., Singh, A. ., Farooq, M. ., Kabeer, M. M., & Aslam, M. S. (2026). A Comprehensive Review on Data Science Frameworks for Big Data Analytics. PERFECT: Journal of Smart Algorithms, 3(1), 1-10. https://doi.org/10.62671/perfect.v3i1.217

How to Cite

Raza, H. ., Erdenetsogt, T. ., Singh, A. ., Farooq, M. ., Kabeer, M. M., & Aslam, M. S. (2026). A Comprehensive Review on Data Science Frameworks for Big Data Analytics. PERFECT: Journal of Smart Algorithms, 3(1), 1-10. https://doi.org/10.62671/perfect.v3i1.217