Библиотека
|
ваш профиль |
Security Issues
Правильная ссылка на статью:
Pleshakova E.S., Filimonov A.V., Osipov A.V., Gataullin S.T.
Identification of cyberbullying by neural network methods
// Вопросы безопасности.
2022. № 3.
С. 28-38.
DOI: 10.25136/2409-7543.2022.3.38488 EDN: BEINMG URL: https://nbpublish.com/library_read_article.php?id=38488
Identification of cyberbullying by neural network methods / Идентификация кибербуллинга нейросетевыми методами
DOI: 10.25136/2409-7543.2022.3.38488EDN: BEINMGДата направления статьи в редакцию: 20-07-2022Дата публикации: 29-07-2022Аннотация: Авторы подробно рассматривают идентификацию кибербуллинга, который осуществляется мошенниками с незаконным использованием персональных данных жертвы. В основном источником данной информации служат социальные сети, электронная почта. Использование социальных сетей в обществе растет в геометрической прогрессии ежедневно. Использование социальных сетей помимо многочисленных плюсов, несет и негативный характер, а именно пользователи сталкиваются с многочисленными киберугрозами. К таким угрозам можно отнести использование персональных данных в преступных целях, киберзапугивание, киберпреступность, фишинг и кибербуллинг. В данной статье мы сосредоточимся на задаче выявления троллей. Выявление троллей в социальных сетях является сложной задачей поскольку они носят динамический характер и собираются в несколько миллиардов записей. Одно из возможных решений выявления троллей это применение алгоритмов машинного обучения. Основным вкладом авторов в исследование темы является применение метода выявления троллей в социальных сетях, который основывается на анализе эмоционального состояния пользователей сети и поведенческой активности. В этой статье, для выявления троллей пользователи объединяются в группы, это объединение осуществляется путем выявления схожего способа общения. Распределение пользователей осуществляется автоматически благодаря применению специального типа нейронных сетей, а именно самоорганизующихся карт Кохонена. Определение номера группы так же осуществляется автоматически. Для определения характеристик пользователей, на основании которых происходит распределение по группам, используется количество комментариев, средняя длина комментария и показатель, отвечающий за эмоциональное состояние пользователя. Ключевые слова: искусственный интеллект, кибербуллинг, машинное обучение, карта Кохонена, нейронные сети, персональные данные, компьютерное преступление, киберпреступления, социальные сети, буллингСтатья подготовлена в рамках государственного задания Правительства Российской Федерации Финансовому университету на 2022 год по теме «Модели и методы распознавания текстов в системах противодействия телефонному мошенничеству» (ВТК-ГЗ-ПИ-30-2022). Abstract: The authors consider in detail the identification of cyberbullying, which is carried out by fraudsters with the illegal use of the victim's personal data. Basically, the source of this information is social networks, e-mails. The use of social networks in society is growing exponentially on a daily basis. The use of social networks, in addition to numerous advantages, also has a negative character, namely, users face numerous cyber threats. Such threats include the use of personal data for criminal purposes, cyberbullying, cybercrime, phishing and cyberbullying. In this article, we will focus on the task of identifying trolls. Identifying trolls on social networks is a difficult task because they are dynamic in nature and are collected in several billion records. One of the possible solutions to identify trolls is the use of machine learning algorithms. The main contribution of the authors to the study of the topic is the use of the method of identifying trolls in social networks, which is based on the analysis of the emotional state of network users and behavioral activity. In this article, in order to identify trolls, users are grouped together, this association is carried out by identifying a similar way of communication. The distribution of users is carried out automatically through the use of a special type of neural networks, namely self-organizing Kohonen maps. The group number is also determined automatically. To determine the characteristics of users, on the basis of which the distribution into groups takes place, the number of comments, the average length of the comment and the indicator responsible for the emotional state of the user are used. Keywords: artificial intelligence, cyberbullying, machine learning, kohonen map, neural networks, personal data, computer crime, cybercrimes, social network, bullyingIntroduction The influence of the mass media on the formation of national, political and religious views of the population is undeniable. Limited volumes, author's responsibility allows you to strictly control all the information material that is printed in newspapers, voiced on radio and television. In recent years, social networks have become very widespread. Now their degree of influence on the views of people is comparable to the degree of influence of television [1-3]. The conversational style of communication, the limited liability of authors and the huge number of publications do not allow the use of standard media tools and methods of control. Currently, social networks are actively used for malicious influence [4-5]. This paper deals with the problem of identifying cyberbullying in social networks. Cyberbullying is the users of social networks, forums and other discussion platforms on the Internet who escalate anger, conflict through covert or overt bullying, belittling, insulting another participant or participants in communication. Cyberbullying is expressed in the form of aggressive, mocking and offensive behavior [6-9]. Online cyberbullying causes great harm, since such users can incite conflicts based on religious hostility, ethnic hatred, etc. [10-13]. Even just the participation of cyberbullying in a discussion makes the rest of the participants nervous and wastes their time responding to such a user. It turns out that the discussion on any topic is littered with unnecessary messages. The problem of regulation of cyberbullying comments is getting bigger. One of the possible solutions is the use of machine learning to recognize cyberbullying. Thus, the task of identifying and blocking cyberbullying is relevant. Models and Methods There are various methods to find cyberbullying users. The simplest and most reliable approach is manual moderation of discussions [14-15]. However, given that millions of users communicate on social networks, manual search becomes too costly. In this case, it is necessary to use methods of automated search for cyberbullying. In 2011, a group of researchers from Canada developed an interesting method for identifying users who engaged in cyberbullying for money [1]. The method is based on the analysis of comments left by attackers and their behavior on the network. It is assumed that such users have similar behavior patterns, which makes it possible to identify them. Table 1 lists the characteristics of user comments that were analyzed. Table 1. List of characteristics of user comments
The researchers used a classification based on semantic and non-semantic analysis. The maximum detection accuracy was 88.79%. As the researchers themselves note, in earlier studies based only on the analysis of the content of messages, the accuracy did not exceed 50% [2-4]. At the same time, a significant drawback is the dependence of the proposed algorithm on easily changed indicators, such as Post Time or Post Location. Sequence No is also an unreliable parameter. In our opinion, the most accurate way to determine whether a user is an attacker or not is only by analyzing the text of his comments and the frequency of posting. Moreover, given the fact that attackers can write with errors, use formulaic phrases or disguise themselves as normal users, it is better to analyze not the semantic part of their comments, but the emotional component, since it is much more difficult to forge. In addition, it should be taken into account that attackers do not just write comments, but try to manipulate other participants in the discussion, which should also manifest itself in the emotional component. Let's try to qualitatively imagine typical models of user behavior in a social network when posting comments. Most of the users practically do not leave comments on the message they like. Usually put "like" or "class". At best, they will write something like “super!”, “Cool”, or the like. Those. we are dealing with single comments, which are usually very short. The next category of users writes longer comments, which, as a rule, reflect their emotional attitude to the message. Another category of participants in the discussion is the “victims” of cyberbullying. As a rule, they try to prove something and at the same time write very detailed comments with an abundance of emotions. And finally, cyberbullying attackers. These are people who constantly participate in the discussion, try to provoke other participants, and, therefore, cannot unsubscribe with short phrases. To identify the emotional state of the discussion participant, we used the fact that the structure of the informational text is fundamentally different from the structure of the inspiring (manipulating) text and is characterized by the absence of intentional rhythmization of its lexical and phonetic units [16-17]. In practice, this means that some sound combinations can not only evoke certain emotions, but can also be perceived as certain images [5]. For example, in combinations, the letter “and” with an indication of the subject has the property of “reducing” the object, in front of which (or in which) it is clearly dominant. Also, the sound "o" gives the impression of softness and relaxation. The predominance of the sounds "a" and "e", as a rule, is associated with an emotional upsurge. Based on the prerequisites listed above, we proposed the fields listed in Table 2 for analysis. Table 2. List of fields for analysis
Авторам
Услуги
Наши сайты
|