Attention Mekanizmaları ve Hibrit ViT-ResNet Mimarisi ile Gemi Görüntülerinin Çok Sınıflı Sınıflandırılması

Ergün, Berkay

Attention Mekanizmaları ve Hibrit ViT-ResNet Mimarisi ile Gemi Görüntülerinin Çok Sınıflı Sınıflandırılması

dc.contributor.advisor	Arslan, Serdar
dc.contributor.author	Ergün, Berkay
dc.date.accessioned	2026-01-05T15:15:55Z
dc.date.available	2026-01-05T15:15:55Z
dc.date.issued	2025
dc.description.abstract	Bu tezde, gemi görüntülerinin çok sınıflı sınıflandırılması için Vision Transformer (ViT) ve ResNetRS50 tabanlı hibrit bir model geliştirilmiştir. ViT yüksek seviyeli anlamsal bilgileri, ResNetRS50 ise düşük ve orta seviyeli mekânsal özellikleri çıkarmakta; bu iki yapı, dikkat (attention) mekanizmaları ve Gated Fusion katmanı ile birleştirilmektedir. Eğitim sürecinde MixUp ve CutMix veri artırma yöntemleri, Focal Loss ile bilgi aktarımı (distillation) kaybı, OneCycleLR zamanlayıcı, otomatik karma hassasiyet (AMP) ve model ağırlıklarının üssel hareketli ortalaması (EMA) kullanılmıştır. Sekiz gemi sınıfından oluşan veri kümesi üzerinde yapılan deneyler, önerilen mimarinin hem doğruluk hem F1 skoru açısından tek başlı CNN veya ViT modellerinden daha yüksek performans gösterdiğini ortaya koymuştur. Sonuçlar, hibrit mimariler ve dikkat tabanlı füzyon stratejilerinin gemi sınıflandırma problemlerinde etkin bir çözüm sunduğunu göstermektedir.
dc.description.abstract	In this thesis, a hybrid model based on Vision Transformer (ViT) and ResNetRS50 is developed for multi-class classification of ship images. While ViT extracts high-level semantic information, ResNetRS50 captures low- and mid-level spatial features; these two structures are integrated through attention mechanisms and a Gated Fusion layer. During training, advanced techniques such as MixUp and CutMix data augmentation, Focal Loss combined with knowledge distillation loss, the OneCycleLR scheduler, automatic mixed precision (AMP), and exponential moving average (EMA) of model weights are employed. Experiments conducted on a dataset consisting of eight ship classes demonstrate that the proposed architecture outperforms single-stream CNN and ViT models in terms of both accuracy and F1-score. The results indicate that hybrid architectures and attention-based fusion strategies provide an effective solution to the ship classification problem.	en_US
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=CtwiQkYvArAb95Ufpfs_vhNXmzEjxmF6GJcfxVifd8dAVWAMDb5AIAJMaN6tbl_t
dc.identifier.uri	https://hdl.handle.net/20.500.12416/15834
dc.language.iso	en
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol
dc.subject	Sınıflandırma
dc.subject	Yapay Zeka
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Classification	en_US
dc.subject	Artificial Intelligence	en_US
dc.title	Attention Mekanizmaları ve Hibrit ViT-ResNet Mimarisi ile Gemi Görüntülerinin Çok Sınıflı Sınıflandırılması
dc.title	Multi-Class Ship Image Classification Using a Hybrid ViT-ResNet Architecture with Attention Mechanisms and Semantic Gated Fusion	en_US
dc.type	Master Thesis	en_US
dspace.entity.type	Publication
gdc.coar.type	text::thesis::master thesis
gdc.description.department	Fen Bilimleri Enstitüsü / Bilgisayar Mühendisliği Ana Bilim Dalı / Bilgisayar Mühendisliği Bilim Dalı
gdc.description.endpage	78
gdc.identifier.yoktezid	982536
gdc.virtual.author	Arslan, Serdar
relation.isAuthorOfPublication	ee02ccda-1b5e-4bba-b8b3-ece13ce2ec47
relation.isAuthorOfPublication.latestForDiscovery	ee02ccda-1b5e-4bba-b8b3-ece13ce2ec47
relation.isOrgUnitOfPublication	0b9123e4-4136-493b-9ffd-be856af2cdb1
relation.isOrgUnitOfPublication	12489df3-847d-4936-8339-f3d38607992f
relation.isOrgUnitOfPublication	43797d4e-4177-4b74-bd9b-38623b8aeefa
relation.isOrgUnitOfPublication.latestForDiscovery	0b9123e4-4136-493b-9ffd-be856af2cdb1

Collections

Bilgisayar Mühendisliği Bölümü Tezleri

Attention Mekanizmaları ve Hibrit ViT-ResNet Mimarisi ile Gemi Görüntülerinin Çok Sınıflı Sınıflandırılması

Files

Collections