Рус Eng Cn Перевести страницу на:  
Please select your language to translate the article


You can just close the window to don't translate
Библиотека
ваш профиль

Вернуться к содержанию

Security Issues
Правильная ссылка на статью:

Identification of cyberbullying by neural network methods / Идентификация кибербуллинга нейросетевыми методами

Плешакова Екатерина Сергеевна

ORCID: 0000-0002-8806-1478

кандидат технических наук

доцент, кафедра Информационной безопасности, Финансовый университет при Правительстве Российской Федерации

125167, Россия, г. Москва, пр-д 4-Й вешняковский, 12к2, корпус 2

Pleshakova Ekaterina Sergeevna

PhD in Technical Science

Associate Professor, Department of Information Security, Financial University under the Government of the Russian Federation

125167, Russia, Moscow, 4th Veshnyakovsky Ave., 12k2, building 2

espleshakova@fa.ru
Другие публикации этого автора
 

 
Филимонов Андрей Викторович

кандидат физико-математических наук

доцент, кафедра Информационной безопасности, Федеральное государственное образовательное бюджетное учреждение высшего образования «Финансовый университет при Правительстве Российской Федерации»

125167, Россия, г. Москва, пр-д 4-Й вешняковский, 4, оф. корпус 2

Filimonov Andrei Viktorovich

PhD in Physics and Mathematics

Associate Professor, Department of Information Security, Federal State Educational Budgetary Institution of Higher Education "Financial University under the Government of the Russian Federation"

125167, Russia, g. Moscow, pr-d 4-I veshnyakovskii, 4, of. korpus 2

remueur@yandex.ru
Осипов Алексей Викторович

кандидат физико-математических наук

доцент, Департамент анализа данных и машинного обучения, Финансовый университет при Правительстве Российской Федерации

125167, Россия, г. Москва, ул. 4-Й вешняковский, 4, корпус 2

Osipov Aleksei Viktorovich

PhD in Physics and Mathematics

Associate Professor, Department of Data Analysis and Machine Learning, Financial University under the Government of the Russian Federation

125167, Russia, Moscow, 4th veshnyakovsky str., 4, building 2

avosipov@fa.ru
Другие публикации этого автора
 

 
Гатауллин Сергей Тимурович

кандидат экономических наук

декан факультета «Цифровая экономика и массовые коммуникации» Московского технического университета связи и информатики; ведущий научный сотрудник Департамента информационной безопасности Финансового университета при Правительстве РФ

111024, Россия, г. Москва, ул. Авиамоторная, 8А

Gataullin Sergei Timurovich

PhD in Economics

Dean of "Digital Economy and Mass Communications" Department of the Moscow Technical University of Communications and Informatics; Leading Researcher of the Department of Information Security of the Financial University under the Government of the Russian Federation

8A Aviamotornaya str., Moscow, 111024, Russia

stgataullin@fa.ru
Другие публикации этого автора
 

 

DOI:

10.25136/2409-7543.2022.3.38488

EDN:

BEINMG

Дата направления статьи в редакцию:

20-07-2022


Дата публикации:

29-07-2022


Аннотация: Авторы подробно рассматривают идентификацию кибербуллинга, который осуществляется мошенниками с незаконным использованием персональных данных жертвы. В основном источником данной информации служат социальные сети, электронная почта. Использование социальных сетей в обществе растет в геометрической прогрессии ежедневно. Использование социальных сетей помимо многочисленных плюсов, несет и негативный характер, а именно пользователи сталкиваются с многочисленными киберугрозами. К таким угрозам можно отнести использование персональных данных в преступных целях, киберзапугивание, киберпреступность, фишинг и кибербуллинг. В данной статье мы сосредоточимся на задаче выявления троллей. Выявление троллей в социальных сетях является сложной задачей поскольку они носят динамический характер и собираются в несколько миллиардов записей. Одно из возможных решений выявления троллей это применение алгоритмов машинного обучения. Основным вкладом авторов в исследование темы является применение метода выявления троллей в социальных сетях, который основывается на анализе эмоционального состояния пользователей сети и поведенческой активности. В этой статье, для выявления троллей пользователи объединяются в группы, это объединение осуществляется путем выявления схожего способа общения. Распределение пользователей осуществляется автоматически благодаря применению специального типа нейронных сетей, а именно самоорганизующихся карт Кохонена. Определение номера группы так же осуществляется автоматически. Для определения характеристик пользователей, на основании которых происходит распределение по группам, используется количество комментариев, средняя длина комментария и показатель, отвечающий за эмоциональное состояние пользователя.


Ключевые слова:

искусственный интеллект, кибербуллинг, машинное обучение, карта Кохонена, нейронные сети, персональные данные, компьютерное преступление, киберпреступления, социальные сети, буллинг

Статья подготовлена в рамках государственного задания Правительства Российской Федерации Финансовому университету на 2022 год по теме «Модели и методы распознавания текстов в системах противодействия телефонному мошенничеству» (ВТК-ГЗ-ПИ-30-2022).

Abstract: The authors consider in detail the identification of cyberbullying, which is carried out by fraudsters with the illegal use of the victim's personal data. Basically, the source of this information is social networks, e-mails. The use of social networks in society is growing exponentially on a daily basis. The use of social networks, in addition to numerous advantages, also has a negative character, namely, users face numerous cyber threats. Such threats include the use of personal data for criminal purposes, cyberbullying, cybercrime, phishing and cyberbullying. In this article, we will focus on the task of identifying trolls. Identifying trolls on social networks is a difficult task because they are dynamic in nature and are collected in several billion records. One of the possible solutions to identify trolls is the use of machine learning algorithms. The main contribution of the authors to the study of the topic is the use of the method of identifying trolls in social networks, which is based on the analysis of the emotional state of network users and behavioral activity. In this article, in order to identify trolls, users are grouped together, this association is carried out by identifying a similar way of communication. The distribution of users is carried out automatically through the use of a special type of neural networks, namely self-organizing Kohonen maps. The group number is also determined automatically. To determine the characteristics of users, on the basis of which the distribution into groups takes place, the number of comments, the average length of the comment and the indicator responsible for the emotional state of the user are used.


Keywords:

artificial intelligence, cyberbullying, machine learning, kohonen map, neural networks, personal data, computer crime, cybercrimes, social network, bullying

Introduction

The influence of the mass media on the formation of national, political and religious views of the population is undeniable. Limited volumes, author's responsibility allows you to strictly control all the information material that is printed in newspapers, voiced on radio and television. In recent years, social networks have become very widespread. Now their degree of influence on the views of people is comparable to the degree of influence of television [1-3]. The conversational style of communication, the limited liability of authors and the huge number of publications do not allow the use of standard media tools and methods of control. Currently, social networks are actively used for malicious influence [4-5].

This paper deals with the problem of identifying cyberbullying in social networks. Cyberbullying is the users of social networks, forums and other discussion platforms on the Internet who escalate anger, conflict through covert or overt bullying, belittling, insulting another participant or participants in communication. Cyberbullying is expressed in the form of aggressive, mocking and offensive behavior [6-9].

Online cyberbullying causes great harm, since such users can incite conflicts based on religious hostility, ethnic hatred, etc. [10-13]. Even just the participation of cyberbullying in a discussion makes the rest of the participants nervous and wastes their time responding to such a user. It turns out that the discussion on any topic is littered with unnecessary messages. The problem of regulation of cyberbullying comments is getting bigger. One of the possible solutions is the use of machine learning to recognize cyberbullying.

Thus, the task of identifying and blocking cyberbullying is relevant.

Models and Methods

There are various methods to find cyberbullying users. The simplest and most reliable approach is manual moderation of discussions [14-15]. However, given that millions of users communicate on social networks, manual search becomes too costly. In this case, it is necessary to use methods of automated search for cyberbullying.

In 2011, a group of researchers from Canada developed an interesting method for identifying users who engaged in cyberbullying for money [1]. The method is based on the analysis of comments left by attackers and their behavior on the network. It is assumed that such users have similar behavior patterns, which makes it possible to identify them.

Table 1 lists the characteristics of user comments that were analyzed.

Table 1. List of characteristics of user comments

Characteristic

Meaning

Report ID

The ID of the new post, which will then be commented by users.

Sequence No.

Sequence number of the comment to the message.

Post Time

Comment posting time.

Post Location

The geographic location of the user who posted the comment.

User ID

The ID of the user who left the comment.

Content

Comment content.

Response Indicator

The reply indicator shows whether the comment is a new comment or a reply to another comment.

The researchers used a classification based on semantic and non-semantic analysis. The maximum detection accuracy was 88.79%. As the researchers themselves note, in earlier studies based only on the analysis of the content of messages, the accuracy did not exceed 50% [2-4].

At the same time, a significant drawback is the dependence of the proposed algorithm on easily changed indicators, such as Post Time or Post Location. Sequence No is also an unreliable parameter.

In our opinion, the most accurate way to determine whether a user is an attacker or not is only by analyzing the text of his comments and the frequency of posting.

Moreover, given the fact that attackers can write with errors, use formulaic phrases or disguise themselves as normal users, it is better to analyze not the semantic part of their comments, but the emotional component, since it is much more difficult to forge.

In addition, it should be taken into account that attackers do not just write comments, but try to manipulate other participants in the discussion, which should also manifest itself in the emotional component.

Let's try to qualitatively imagine typical models of user behavior in a social network when posting comments.

Most of the users practically do not leave comments on the message they like. Usually put "like" or "class". At best, they will write something like “super!”, “Cool”, or the like. Those. we are dealing with single comments, which are usually very short.

The next category of users writes longer comments, which, as a rule, reflect their emotional attitude to the message.

Another category of participants in the discussion is the “victims” of cyberbullying. As a rule, they try to prove something and at the same time write very detailed comments with an abundance of emotions.

And finally, cyberbullying attackers. These are people who constantly participate in the discussion, try to provoke other participants, and, therefore, cannot unsubscribe with short phrases.

To identify the emotional state of the discussion participant, we used the fact that the structure of the informational text is fundamentally different from the structure of the inspiring (manipulating) text and is characterized by the absence of intentional rhythmization of its lexical and phonetic units [16-17].

In practice, this means that some sound combinations can not only evoke certain emotions, but can also be perceived as certain images [5]. For example, in combinations, the letter “and” with an indication of the subject has the property of “reducing” the object, in front of which (or in which) it is clearly dominant. Also, the sound "o" gives the impression of softness and relaxation. The predominance of the sounds "a" and "e", as a rule, is associated with an emotional upsurge.

Based on the prerequisites listed above, we proposed the fields listed in Table 2 for analysis.

Table 2. List of fields for analysis

Field

Comment

Field

Comment

Mp

Mp- number of messages for p-th user

fу,p

The frequency of occurrence of the symbol "y" for the p-th user

Lp

Average message length for p-th user

fэ,p

The frequency of occurrence of t