Bilgisayar Mühendisliği Bölümü Yayın Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12416/253

Browse

Search Results

Now showing 1 - 10 of 10
  • Conference Object
    Citation - Scopus: 1
    Comparative Analysis of Machine Learning Techniques Using Customer Feedback Reviews of Oil and Gas Companies
    (Association for Computing Machinery, 2020) Alrawi, L.N.; Ashour, O.I.A.
    Sentiment analysis is the process of computationally identifying and categorizing opinions from a piece of text to determine whether the writer's attitude towards a practical topic, products or services is positive, negative or neutral. In this study, Machine Learning techniques are used to perform sentiment analysis on Oil and Gas customer feedback data. We present a comparison of different classification algorithms used for opinion mining, including Support Vector Machine (SVM), Naïve Bayes (NB), Instance Based Learning (IB3), Random Forest (RF), Partial Decision trees (PART), and Logit Boost (LB). Many studies have been performed on sentiment analysis in different sectors, but research into Oil and Gas customer feedback has been limited. Therefore, we have targeted a pathless sector, namely the Petroleum sector, where companies express their opinions towards specific products or services. Waikato Environment for Knowledge Analysis (WEKA) is used for experimental results. The WEKA environment is open source software entailing a collection of machine learning algorithms to solve data mining problems. The main aim of this study is to evaluate the efficiency of the above mentioned classifiers in terms of Precision, Recall, F-Measure and Accuracy. The findings of the comparison analysis indicate that the Naïve-Bayes classifier gives the best Accuracy of all classifiers. A small dataset could be considered as a limitation to our study due to the difficulty of gaining more datasets at the time of the research. However, this research will play a vital role for researchers in making decisions about the algorithm that they are going to use to solve their data mining problems. © 2020 ACM.
  • Book Part
    Citation - Scopus: 1
    Text-Based Fake News Detection Via Machine Learning
    (Springer Science and Business Media Deutschland GmbH, 2021) Genç, B.; Sever, H.; Mertoğlu, U.
    The nature of information literacy is changing as people incline more towards using digital media to consume content. Consequently, this easier way of consuming information has sparked off a challenge called “Fake News”. One of the risky effects of this notorious term is to influence people’s views of the world as in the recent example of coronavirus misinformation that is flooding the internet. Nowadays, it seems the world needs “information hygiene” more than anything. Yet real-world solutions in practice are not qualified to determine verifiability of the information circulating. Presenting an automated solution, our work provides an adaptable solution to detect fake news in practice. Our approach proposes a set of carefully selected features combined with word-embeddings to predict fake or valid texts. We evaluated our proposed model in terms of efficacy through intensive experimentation. Additionally, we present an analysis linked with linguistic features for detecting fake and valid news content. An overview of text-based fake news detection guidance derived from experiments including promising results of our work is also presented in this work. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
  • Conference Object
    Citation - WoS: 40
    Citation - Scopus: 77
    Malware Classification Using Deep Learning Methods
    (Assoc Computing Machinery, 2018) Dogdu, Erdogan; Cakir, Bugra
    Malware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It is a very serious problem and many efforts are devoted to malware detection in today's cybersecurity world. Many machine learning algorithms are used for the automatic detection of malware in recent years. Most recently, deep learning is being used with better performance. Deep learning models are shown to work much better in the analysis of long sequences of system calls. In this paper a shallow deep learning-based feature extraction method (word2vec) is used for representing any given malware based on its opcodes. Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is used to validate the model performance without sacrificing a validation split. Evaluation results show up to 96% accuracy with limited sample data.
  • Article
    The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models
    (2019) Kaya, Aydın; Keçeli, Ali Seydi; Çatal, Çağatay; Tekinerdoğan, Bedir
    Software vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text-based features are more useful, and ensemble-based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance.
  • Article
    Citation - WoS: 39
    Citation - Scopus: 52
    Development of a Recurrent Neural Networks-Based Calving Prediction Model Using Activity and Behavioral Data
    (Elsevier Sci Ltd, 2020) Keceli, Ali Seydi; Catal, Cagatay; Kaya, Aydin; Tekinerdogan, Bedir
    Accurate prediction of calving time in dairy cattle is crucial for dairy herd management to reduce risks like dystocia and pain. Prediction of calving using traditional, manual observation such as observing breeding records and visual cues, however, is a complicated and error-prone task whereby even experts can fail to provide a proper prediction. Moreover, manual prediction does not scale for larger farms and becomes very soon time-consuming, inefficient, and costly. In this context, automated solutions are considered to be promising to provide both better and more efficient predictions, thereby supporting the health of the dairy cows and reducing the unnecessary overhead for farmers. Although the first automated solutions appear to have mainly focused on statistical solutions, currently, machine learning approaches are now increasingly being considered as a feasible and promising approach for accurate prediction of calving. In this context, the objective of this study is to develop machine learning-based prediction models that provide higher performance compared to the existing tools, methods, and techniques. This study shows that the calving of the cattle can be predicted by applying several behaviors of cattle, behavioral monitoring sensors, and machine learning models. Bi-directional Long Short-Term Memory (Bi-LSTM) method has been applied for the prediction of the calving day, and the RusBoosted Tree classifier has been used to predict the remaining 8 h before calving. The experimental results demonstrated that Bi-LSTM provides better performance compared to the LSTM algorithm in terms of classification accuracy, while the RusBoosted Tree algorithm predicts the remaining 8 h accurately before calving. Furthermore, Recurrent Neural Networks provide high performance for the prediction of calving day.
  • Conference Object
    Softare Vulnerability Prediction using Extreme Learning Machines Algorithm
    (2019) Keçeli, Ali Seydi; Kaya, Aydın; Çatal, Çağatay; Tekinerdoğan, Bedir
    Software vulnerability prediction aims to detect vulnerabilities in the source code before the software is deployed into the operational environment. The accurate prediction of vulnerabilities helps to allocate more testing resources to the vulnerability-prone modules. From the machine learning perspective, this problem is a binary classification task which classifies software modules into vulnerability-prone and non-vulnerability-prone categories. Several machine learning models have been built for addressing the software vulnerability prediction problem, but the performance of the state-of-the-art models is not yet at an acceptable level. In this study, we aim to improve the performance of software vulnerability prediction models by using Extreme Learning Machines (ELM) algorithms which have not been investigated for this problem. Before we apply ELM algorithms for selected three public datasets, we use data balancing algorithms to balance the data points which belong to two classes. We discuss our initial experimental results and provide the lessons learned. In particular, we observed that ELM algorithms have a high potential to be used for addressing the software vulnerability prediction problem.
  • Article
    Citation - WoS: 20
    Citation - Scopus: 29
    Sensor Failure Tolerable Machine Learning-Based Food Quality Prediction Model
    (Mdpi, 2020) Kaya, Aydin; Keceli, Ali Seydi; Catal, Cagatay; Tekinerdogan, Bedir
    For the agricultural food production sector, the control and assessment of food quality is an essential issue, which has a direct impact on both human health and the economic value of the product. One of the fundamental properties from which the quality of the food can be derived is the smell of the product. A significant trend in this context is machine olfaction or the automated simulation of the sense of smell using a so-called electronic nose or e-nose. Hereby, many sensors are used to detect compounds, which define the odors and herewith the quality of the product. The proper assessment of the food quality is based on the correct functioning of the adopted sensors. Unfortunately, sensors may fail to provide the correct measures due to, for example, physical aging or environmental factors. To tolerate this problem, various approaches have been applied, often focusing on correcting the input data from the failed sensor. In this study, we adopt an alternative approach and propose machine learning-based failure tolerance that ignores failed sensors. To tolerate for the failed sensor and to keep the overall prediction accuracy acceptable, a Single Plurality Voting System (SPVS) classification approach is used. Hereby, single classifiers are trained by each feature and based on the outcome of these classifiers, and a composed classifier is built. To build our SPVS-based technique, K-Nearest Neighbor (kNN), Decision Tree, and Linear Discriminant Analysis (LDA) classifiers are applied as the base classifiers. Our proposed approach has a clear advantage over traditional machine learning models since it can tolerate the sensor failure or other types of failures by ignoring and thus enhance the assessment of food quality. To illustrate our approach, we use the case study of beef cut quality assessment. The experiments showed promising results for beef cut quality prediction in particular, and food quality assessment in general.
  • Conference Object
    Citation - WoS: 7
    Phishing E-Mail Detection by Using Deep Learning Algorithms
    (Assoc Computing Machinery, 2018) Hassanpour, Reza; Dogdu, Erdogan; Choupani, Roya; Goker, Onur; Nazli, Nazli
  • Conference Object
    Citation - WoS: 2
    Citation - Scopus: 2
    Clinical Decision Support Systems: From the Perspective of Small and Imbalanced Data Set
    (Ios Press, 2019) Akcapinar Sezer, Ebru; Sever, Hayri; Par, Oznur Esra
    Clinical decision support systems are data analysis software that supports health professionals' decision - making the process to reach their ultimate outcome, taking into account patient information. However, the need for decision support systems cannot be denied because of most activities in the field of health care within the decision-making process. Decision support systems used for diagnosis are designed based on disease due to the complexity of diseases, symptoms, and disease-symptoms relationships. In the design and implementation of clinical decision support systems, mathematical modeling, pattern recognition and statistical analysis techniques of large databases and data mining techniques such as classification are also widely used. Classification of data is difficult in case of the small and / or imbalanced data set and this problem directly affects the classification performance. Small and/or imbalance dataset has become a major problem in data mining because classification algorithms are developed based on the assumption that the data sets are balanced and large enough. Most of the algorithms ignore or misclassify examples of the minority class, focus on the majority class. Most health data are small and imbalanced by nature. Learning from imbalanced and small data sets is an important and unsettled problem. Within the scope of the study, the publicly accessible data set, hepatitis was oversampled by distance-based data generation methods. The oversampled data sets were classified by using four different machine learning algorithms. Considering the classification scores of four different machine learning algorithms (Artificial Neural Networks, Support Vector Machines, Naive Bayes and Decision Tree), optimal synthetic data generation rate is recommended.
  • Conference Object
    Citation - WoS: 140
    Citation - Scopus: 214
    Intrusion Detection Using Big Data and Deep Learning Techniques
    (Assoc Computing Machinery, 2019) Dogdu, Erdogan; Faker, Osama
    In this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.