ANALYSIS OF THE EFFICIENCY OF GPT-2 MODEL APPLICATION WITH ADAPTED TRANSFER LEARNING ON VARIOUS HARDWARE ARCHITECTURES

Dejan Dodić; Dušan Regodić

doi:10.61837/mbuir020124174d

Authors

Dejan Dodić The Academy of Applied Technical and Preschool Studies Niš, Department of Vranje, Serbia; University MB, Faculty of Business and Law, Department of Advanced Information Technologies, Belgrade, Serbia
Dušan Regodić University MB, Faculty of Business and Law, Department of Advanced Information Technologies, Belgrade, Serbia

DOI:

https://doi.org/10.61837/mbuir020124174d

Keywords:

Adaptive Transfer Learning, GPT-2 Efficiency, GPU Architectures, Hardware Impact, Performance Comparison, AI Optimization, Future AI Systems

Abstract

This paper conducts an analysis of the efficiency in implementing the GPT-2 model, one of the advanced artificial intelligence models for text generation, through adapted transfer learning, focusing particularly on the utilization of various GPU architectures. The primary goal of this research is to examine the impact of adapted transfer learning on the performance of the GPT-2 model exclusively on various GPU architectures, assessing how different GPU strengths enhance or influence the model's efficiency. The work relies on an experimental method to evaluate and compare the model's performance in terms of accuracy, processing speed, and energy efficiency on each of the tested platforms. Special attention is given to analysing how different characteristics of hardware architectures, such as processing power and memory capacity, affect the efficiency of the transfer learning process. This study provides important insights into the potential for optimizing the GPT-2 model for specific hardware platforms, which is crucial for its application in a wide range of real-world scenarios. The results of this research offer valuable information for researchers in the fields of artificial intelligence and machine learning, providing a foundation for further development and improvement of AI technologies.

References

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2019). Exploring the Limits of Transfer Learning with a Unified Text-toText Transformer. arXiv preprint arXiv:1910.10683. https://doi.org/10.48550/arXiv.1910.10683

Hu, H., & Yang, Y. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv:2106.09685. https://doi.org/10.48550/ arXiv.2106.09685

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). arXiv preprint arXiv:1706.03762. https://doi.org/10.48550/arXiv.1706.03762

Li, L., & Liang, P. (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv preprint arXiv:2101.00190. https://doi.org/10.48550/arXiv.2101.00190

Zheng, X., Zhang, C., & Woodland, P. C. (2021). Adapting GPT, GPT-2, and BERT Language Models for Speech Recognition. arXiv preprint arXiv:2108.07789. https://doi.org/10.48550/arXiv.2108.07789

NVIDIA. (2021). Optimizing T5 and GPT-2 for RealTime Inference with NVIDIA TensorRT. https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2- for-real-time-inference-with-tensorrt/

Microsoft. (2021). DeepSpeed: Accelerating largescale model inference and training via system optimizations and compression. Microsoft Research Blog. https://www.microsoft.com/en-us/research/ blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/

Li, C., Zhang, M., & He, Y. (2022). The StabilityEfficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models. In Proceedings of the Neural Information Processing Systems Conference (NeurIPS 2022). https://openreview. net/forum?id=JpZ5du_Kdh

Kotei, E., & Thirunavukarasu, R. (2023). A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning. Information, 14(3), 187. https://doi.org/10.3390/ info14030187

Rajbhandari, S., Ruwase, O., Rasley, J., Smith, S., & He, Y. (2021). ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. arXiv preprint arXiv:2104.07857. https://doi. org/10.48550/arXiv.2104.07857

Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2022). DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. arXiv preprint arXiv:2212.03597. https://ar5iv.labs.arxiv. org/html/2212.03597

Hugging Face. (2024). Efficient training on multiple GPUs. Retrieved from https://huggingface. co/docs/transformers/perf_train_gpu_many#efficient-training-on-multiple-gpus

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/ language-models.pdf

PyTorch Team. (2023). Accelerating Generative AI with PyTorch II: GPT, Fast. PyTorch. https://pytorch.org/blog/accelerating-generative-ai-2/

He, C., Li, S., Soltanolkotabi, M., & Avestimehr, S. (2021). PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. PyTorch. https://pytorch.org/blog/pipetransformer-automated-elastic-pipelining/

Shen, L., Sun, Y., Yu, Z., Ding, L., Tian, X., & Tao, D. (2023). On Efficient Training of Large-Scale Deep Learning Models: A Literature Review. arXiv. https://ar5iv.labs.arxiv.org/html/2304.03589

Mustafa, N. (2024). Exploring Pre-trained Model Use Cases with GPT-2 and T5. Toptal. https://www.toptal.com/deep-learning/exploring-pre-trained-models

Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., ... & Sun, M. (2022). Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models. arXiv preprint arXiv:2203.06904. https://doi. org/10.48550/arXiv.2203.06904

Hanna, M., Liu, O., & Variengien, A. (2023). How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. arXiv preprint arXiv:2305.00586. https:// doi.org/10.48550/arXiv.2305.00586

ANALYSIS OF THE EFFICIENCY OF GPT-2 MODEL APPLICATION WITH ADAPTED TRANSFER LEARNING ON VARIOUS HARDWARE ARCHITECTURES

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Most read articles by the same author(s)

Logo

Make a Submission

Language

Journal Information

For Authors

For Reviewers

Information