From Static to Contextual: A Survey of Embedding Advances in NLP

Hussein Alkaabi; Ali Kadhim Jasim; Ali  Darroudi

doi:10.62671/perfect.v2i2.77

Authors

Hussein Alkaabi Ministry of Education Iraq, General Direction of Vocational Education, Al-Najaf, 54001, Iraq Author
Ali Kadhim Jasim Imam Ja‘far al Sadiq University – Maysan Branch, Computer Engineering Department, iraq Author
Ali Darroudi Department of Electrical Engineering, Sadjad University of Technology, Mashhad, Iran Author

DOI:

https://doi.org/10.62671/perfect.v2i2.77

Keywords:

BERT, Deep Learning, Machine Learning, Natural Language Processing (NLP), Word Embedding

Abstract

Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to state-of-the-art approaches. We discuss the strengths and limitations of each paradigm, their applications across various NLP tasks, and the challenges they address, such as polysemy and out-of-vocabulary words. Furthermore, we highlight emerging trends, including multimodal embeddings, domain-specific representations, and efforts to mitigate embedding bias. By synthesizing the advancements in this rapidly evolving field, this paper aims to serve as a valuable resource for researchers and practitioners while identifying open challenges and future directions for embedding research in NLP.

References

Alami, N., Meknassi, M., & En-Nahnahi, N. (2019). Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning. Expert Systems with Applications, 123, 195–211.

Al-Kabbi, H. A., Feizi-Derakhshi, M. R., & Pashazadeh, S. (2023). Multi-type feature extraction and early fusion framework for SMS spam detection. IEEE Access.

Al-Kabbi, H. A., Feizi-Derakhshi, M. R., & Pashazadeh, S. (2024). A hierarchical two-level feature fusion approach for SMS spam filtering. Intelligent Automation & Soft Computing, 39(4).

Asudani, D. S., Nagwani, N. K., & Singh, P. (2023). Impact of word embedding models on text analytics in deep learning environment: A review. Artificial Intelligence Review, 56(9), 10345–10425.

Athiwaratkun, B., Wilson, A. G., & Anandkumar, A. (2018). Probabilistic fastText for multi-sense word embeddings. arXiv preprint arXiv:1806.02901.

Awlla, K. M., Veisi, H., & Abdullah, A. A. (2025). Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish. Language Resources and Evaluation, 1–31.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 4349–4357.

Boselli, R., D’Amico, S., & Nobani, N. (2025). eXplainable AI for word embeddings: A survey. Cognitive Computation, 17(1), 1–24.

Bowman, S. R., Pavlick, E., Grave, E., Van Durme, B., Wang, A., Hula, J., ... & Chen, B. (2018). Looking for ELMo’s friends: Sentence-level pretraining beyond language modeling. arXiv preprint arXiv:1812.10860.

Chang, H., Rong, Y., Xu, T., Huang, W., Zhang, H., Cui, P., ... & Huang, J. (2020). A restricted black-box adversarial framework towards attacking graph embedding models. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 3389–3396.

Chuang, S. P., Liu, A. H., Sung, T. W., & Lee, H. Y. (2020). Improving automatic speech recognition and speech translation via word embedding prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 93–105.

Dash, A., Darshana, S., Yadav, D. K., & Gupta, V. (2024). A clinical named entity recognition model using pretrained word embedding and deep neural networks. Decision Analytics Journal, 10, 100426.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171–4186.

Edunov, S., Baevski, A., & Auli, M. (2019). Pre-trained language model representations for language generation. arXiv preprint arXiv:1903.09722.

Goldberg, Y. (2014). word2vec Explained: Deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.

Hameed, D. A., & Al-Khateeb, B. (2024). Deep learning-based English-Arabic machine translation for sulfur manufacture texts. Mesopotamian Journal of Big Data, 2024, 241–250.

Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

Koroteev, M. V. (2021). BERT: A review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943.

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.

Li, H., Choi, J., Lee, S., & Ahn, J. H. (2020). Comparing BERT and XLNet from the perspective of computational characteristics. 2020 International Conference on Electronics, Information, and Communication (ICEIC), 1–4.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.

Othman, N., Faiz, R., & Smaïli, K. (2019). Enhancing question retrieval in community question answering using word embeddings. Procedia Computer Science, 159, 485–494.

Patil, R., Boit, S., Gudivada, V., & Nandigam, J. (2023). A survey of text representation and embedding techniques in NLP. IEEE Access, 11, 36120–36146.

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543.

Qazi, A., Goudar, R. H., Patil, R., Hukkeri, G. S., & Kulkarni, D. (2025). Leveraging BERT, DistilBERT and TinyBERT for rumor detection. IEEE Access.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

Ranathunga, S., Lee, E. S. A., Prifti Skenduli, M., Shekhar, R., Alam, M., & Kaur, R. (2023). Neural machine translation for low-resource languages: A survey. ACM Computing Surveys, 55(11), 1–37.

Rezaeinia, S. M., Rahmani, R., Ghodsi, A., & Veisi, H. (2019). Sentiment analysis based on improved pre-trained word embeddings. Expert Systems with Applications, 117, 139–147.

Rodríguez, P., Bautista, M. A., Gonzalez, J., & Escalera, S. (2018). Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing, 75, 21–31.

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Sharma, D. K., & Garg, S. (2023). IFND: A benchmark dataset for fake news detection. Complex & Intelligent Systems, 9(3), 2843–2863.

Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 3645–3650.

Verma, P. K., Agrawal, P., Amorim, I., & Prodan, R. (2021). WELFake: Word embedding over linguistic features for fake news detection. IEEE Transactions on Computational Social Systems, 8(4), 881–893.

Wang, Y., Liu, S., Afzal, N., Rastegar-Mojarad, M., Wang, L., Shen, F., ... & Liu, H. (2018). A comparison of word embeddings for the biomedical natural language processing. Journal of Biomedical Informatics, 87, 12–20.

From Static to Contextual: A Survey of Embedding Advances in NLP

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

How to Cite

Index

Latest publications

Information

Language