Attention Mekanizmaları ve Hibrit ViT-ResNet Mimarisi ile Gemi Görüntülerinin Çok Sınıflı Sınıflandırılması
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Bu tezde, gemi görüntülerinin çok sınıflı sınıflandırılması için Vision Transformer (ViT) ve ResNetRS50 tabanlı hibrit bir model geliştirilmiştir. ViT yüksek seviyeli anlamsal bilgileri, ResNetRS50 ise düşük ve orta seviyeli mekânsal özellikleri çıkarmakta; bu iki yapı, dikkat (attention) mekanizmaları ve Gated Fusion katmanı ile birleştirilmektedir. Eğitim sürecinde MixUp ve CutMix veri artırma yöntemleri, Focal Loss ile bilgi aktarımı (distillation) kaybı, OneCycleLR zamanlayıcı, otomatik karma hassasiyet (AMP) ve model ağırlıklarının üssel hareketli ortalaması (EMA) kullanılmıştır. Sekiz gemi sınıfından oluşan veri kümesi üzerinde yapılan deneyler, önerilen mimarinin hem doğruluk hem F1 skoru açısından tek başlı CNN veya ViT modellerinden daha yüksek performans gösterdiğini ortaya koymuştur. Sonuçlar, hibrit mimariler ve dikkat tabanlı füzyon stratejilerinin gemi sınıflandırma problemlerinde etkin bir çözüm sunduğunu göstermektedir.
In this thesis, a hybrid model based on Vision Transformer (ViT) and ResNetRS50 is developed for multi-class classification of ship images. While ViT extracts high-level semantic information, ResNetRS50 captures low- and mid-level spatial features; these two structures are integrated through attention mechanisms and a Gated Fusion layer. During training, advanced techniques such as MixUp and CutMix data augmentation, Focal Loss combined with knowledge distillation loss, the OneCycleLR scheduler, automatic mixed precision (AMP), and exponential moving average (EMA) of model weights are employed. Experiments conducted on a dataset consisting of eight ship classes demonstrate that the proposed architecture outperforms single-stream CNN and ViT models in terms of both accuracy and F1-score. The results indicate that hybrid architectures and attention-based fusion strategies provide an effective solution to the ship classification problem.
In this thesis, a hybrid model based on Vision Transformer (ViT) and ResNetRS50 is developed for multi-class classification of ship images. While ViT extracts high-level semantic information, ResNetRS50 captures low- and mid-level spatial features; these two structures are integrated through attention mechanisms and a Gated Fusion layer. During training, advanced techniques such as MixUp and CutMix data augmentation, Focal Loss combined with knowledge distillation loss, the OneCycleLR scheduler, automatic mixed precision (AMP), and exponential moving average (EMA) of model weights are employed. Experiments conducted on a dataset consisting of eight ship classes demonstrate that the proposed architecture outperforms single-stream CNN and ViT models in terms of both accuracy and F1-score. The results indicate that hybrid architectures and attention-based fusion strategies provide an effective solution to the ship classification problem.
Description
Keywords
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Sınıflandırma, Yapay Zeka, Computer Engineering and Computer Science and Control, Classification, Artificial Intelligence
Turkish CoHE Thesis Center URL
Fields of Science
Citation
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
78
Collections
Google Scholar™
Sustainable Development Goals
9
INDUSTRY, INNOVATION AND INFRASTRUCTURE
