Abstract:
The objective of this research is to discover web user navigational behavior for Wolkite
University web users. in this study, experimental research has been used as a research design.
Sharma's web usage mining process model has been followed to discover web users’ behavior. In
this study, the dataset is collected from Wolkite University proxy server data with a total of three month data starting from February 01/ 2021 to April 30/2021. For data cleaning, to extract the
URL path Python programming language has been used and to split the VLANs from the IP
address MS-Excel 2021 have used for VLAN identification. Since the data is a huge, in addition,
Minitab and Python have been used for statistical analysis and association rule mining
respectively. To discover association rules FP-growth and Apriori algorithms has been used in
this study. From the statistical analysis result, most of the time Facebook, and YouTube websites
are the top-level websites accessed by the student. However, in terms of website category
Entertainment websites have been accessed by the student as the primary interest, Education
websites as the second interest, and social media websites as the third web interest. Whereas in
the staff dataset most of the time Gmail, Facebook, and YouTube websites are accessed at the top
level. However, in terms of website category, educational websites have been accessed as the
primary interest, entertainment websites, social media websites, and email websites as second,
third, and fourth web interest by the staff users. in terms of web traffic, some of the VLANs in the
student dataset have more web traffics especially VLAN (90,120) have more web traffics as
compared to the other. Whereas VLANs such as (78, 81) have low web traffics as compared to
the remaining VLANs. On the other hand, in staff VLANs, VLAN (2) have more web traffic as
compared to the other whereas VLAN (50) has low web traffic as compared to the other. From
the association rule discovery, the FP-growth algorithm shows that entertainment websites and
social media websites have been browsed together by the student. Whereas in staff users, email
and social media, email and entertainment, entertainment and social media, educational and
educational websites have been accessed together by the staff VLAN users. The key challenges in
this work include preparing log files due to their enormous, noisy, and complex nature of weblog
data due to the existing network VLANs is complex, and it is challenging to identify the requests
from which users are submitted and identify their behavior accordingly.