Bilgisayar Mühendisliği Bölümü Yayın Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12416/253
Browse
8 results
Search Results
Conference Object Citation - Scopus: 1Comparative Analysis of Machine Learning Techniques Using Customer Feedback Reviews of Oil and Gas Companies(Association for Computing Machinery, 2020) Alrawi, L.N.; Ashour, O.I.A.Sentiment analysis is the process of computationally identifying and categorizing opinions from a piece of text to determine whether the writer's attitude towards a practical topic, products or services is positive, negative or neutral. In this study, Machine Learning techniques are used to perform sentiment analysis on Oil and Gas customer feedback data. We present a comparison of different classification algorithms used for opinion mining, including Support Vector Machine (SVM), Naïve Bayes (NB), Instance Based Learning (IB3), Random Forest (RF), Partial Decision trees (PART), and Logit Boost (LB). Many studies have been performed on sentiment analysis in different sectors, but research into Oil and Gas customer feedback has been limited. Therefore, we have targeted a pathless sector, namely the Petroleum sector, where companies express their opinions towards specific products or services. Waikato Environment for Knowledge Analysis (WEKA) is used for experimental results. The WEKA environment is open source software entailing a collection of machine learning algorithms to solve data mining problems. The main aim of this study is to evaluate the efficiency of the above mentioned classifiers in terms of Precision, Recall, F-Measure and Accuracy. The findings of the comparison analysis indicate that the Naïve-Bayes classifier gives the best Accuracy of all classifiers. A small dataset could be considered as a limitation to our study due to the difficulty of gaining more datasets at the time of the research. However, this research will play a vital role for researchers in making decisions about the algorithm that they are going to use to solve their data mining problems. © 2020 ACM.Book Part Citation - Scopus: 1Text-Based Fake News Detection Via Machine Learning(Springer Science and Business Media Deutschland GmbH, 2021) Genç, B.; Sever, H.; Mertoğlu, U.The nature of information literacy is changing as people incline more towards using digital media to consume content. Consequently, this easier way of consuming information has sparked off a challenge called “Fake News”. One of the risky effects of this notorious term is to influence people’s views of the world as in the recent example of coronavirus misinformation that is flooding the internet. Nowadays, it seems the world needs “information hygiene” more than anything. Yet real-world solutions in practice are not qualified to determine verifiability of the information circulating. Presenting an automated solution, our work provides an adaptable solution to detect fake news in practice. Our approach proposes a set of carefully selected features combined with word-embeddings to predict fake or valid texts. We evaluated our proposed model in terms of efficacy through intensive experimentation. Additionally, we present an analysis linked with linguistic features for detecting fake and valid news content. An overview of text-based fake news detection guidance derived from experiments including promising results of our work is also presented in this work. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.Conference Object Citation - WoS: 40Citation - Scopus: 77Malware Classification Using Deep Learning Methods(Assoc Computing Machinery, 2018) Dogdu, Erdogan; Cakir, BugraMalware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It is a very serious problem and many efforts are devoted to malware detection in today's cybersecurity world. Many machine learning algorithms are used for the automatic detection of malware in recent years. Most recently, deep learning is being used with better performance. Deep learning models are shown to work much better in the analysis of long sequences of system calls. In this paper a shallow deep learning-based feature extraction method (word2vec) is used for representing any given malware based on its opcodes. Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is used to validate the model performance without sacrificing a validation split. Evaluation results show up to 96% accuracy with limited sample data.Article The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models(2019) Kaya, Aydın; Keçeli, Ali Seydi; Çatal, Çağatay; Tekinerdoğan, BedirSoftware vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text-based features are more useful, and ensemble-based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance.Article Citation - WoS: 39Citation - Scopus: 52Development of a Recurrent Neural Networks-Based Calving Prediction Model Using Activity and Behavioral Data(Elsevier Sci Ltd, 2020) Keceli, Ali Seydi; Catal, Cagatay; Kaya, Aydin; Tekinerdogan, BedirAccurate prediction of calving time in dairy cattle is crucial for dairy herd management to reduce risks like dystocia and pain. Prediction of calving using traditional, manual observation such as observing breeding records and visual cues, however, is a complicated and error-prone task whereby even experts can fail to provide a proper prediction. Moreover, manual prediction does not scale for larger farms and becomes very soon time-consuming, inefficient, and costly. In this context, automated solutions are considered to be promising to provide both better and more efficient predictions, thereby supporting the health of the dairy cows and reducing the unnecessary overhead for farmers. Although the first automated solutions appear to have mainly focused on statistical solutions, currently, machine learning approaches are now increasingly being considered as a feasible and promising approach for accurate prediction of calving. In this context, the objective of this study is to develop machine learning-based prediction models that provide higher performance compared to the existing tools, methods, and techniques. This study shows that the calving of the cattle can be predicted by applying several behaviors of cattle, behavioral monitoring sensors, and machine learning models. Bi-directional Long Short-Term Memory (Bi-LSTM) method has been applied for the prediction of the calving day, and the RusBoosted Tree classifier has been used to predict the remaining 8 h before calving. The experimental results demonstrated that Bi-LSTM provides better performance compared to the LSTM algorithm in terms of classification accuracy, while the RusBoosted Tree algorithm predicts the remaining 8 h accurately before calving. Furthermore, Recurrent Neural Networks provide high performance for the prediction of calving day.Conference Object Citation - WoS: 7Phishing E-Mail Detection by Using Deep Learning Algorithms(Assoc Computing Machinery, 2018) Hassanpour, Reza; Dogdu, Erdogan; Choupani, Roya; Goker, Onur; Nazli, NazliConference Object Citation - WoS: 2Citation - Scopus: 2Clinical Decision Support Systems: From the Perspective of Small and Imbalanced Data Set(Ios Press, 2019) Akcapinar Sezer, Ebru; Sever, Hayri; Par, Oznur EsraClinical decision support systems are data analysis software that supports health professionals' decision - making the process to reach their ultimate outcome, taking into account patient information. However, the need for decision support systems cannot be denied because of most activities in the field of health care within the decision-making process. Decision support systems used for diagnosis are designed based on disease due to the complexity of diseases, symptoms, and disease-symptoms relationships. In the design and implementation of clinical decision support systems, mathematical modeling, pattern recognition and statistical analysis techniques of large databases and data mining techniques such as classification are also widely used. Classification of data is difficult in case of the small and / or imbalanced data set and this problem directly affects the classification performance. Small and/or imbalance dataset has become a major problem in data mining because classification algorithms are developed based on the assumption that the data sets are balanced and large enough. Most of the algorithms ignore or misclassify examples of the minority class, focus on the majority class. Most health data are small and imbalanced by nature. Learning from imbalanced and small data sets is an important and unsettled problem. Within the scope of the study, the publicly accessible data set, hepatitis was oversampled by distance-based data generation methods. The oversampled data sets were classified by using four different machine learning algorithms. Considering the classification scores of four different machine learning algorithms (Artificial Neural Networks, Support Vector Machines, Naive Bayes and Decision Tree), optimal synthetic data generation rate is recommended.Conference Object Citation - WoS: 140Citation - Scopus: 214Intrusion Detection Using Big Data and Deep Learning Techniques(Assoc Computing Machinery, 2019) Dogdu, Erdogan; Faker, OsamaIn this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.
