LightBlog
Contact at mumbai.academics@gmail.com or 8097636691/9323040215
Responsive Ads Here

Monday, 4 June 2018

SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors

SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors

ABSTRACT:
Mass media sources, specifically the news media, have traditionally informed us of daily events. In modern times, social media services such as Twitter provide an enormous amount of user-generated data, which have great potential to contain informative news-related content. For these resources to be useful, we must find a way to filter noise and only capture the content that, based on its similarity to the news media, is considered valuable. However, even after noise is removed, information overload may still exist in the remaining data—hence, it is convenient to prioritize it for consumption. To achieve prioritization, information must be ranked in order of estimated importance considering three factors. First, the temporal prevalence of a particular topic in the news media is a factor of importance, and can be considered the media focus (MF) of a topic. Second, the temporal prevalence of the topic in social media indicates its user attention (UA). Last, the interaction between the social media users who mention this topic indicates the strength of the community discussing it, and can be regarded as the user interaction (UI) toward the topic. We propose an unsupervised framework—SociRank—which identifies news topics prevalent in both social media and the news media, and then ranks them by relevance using their degrees of MF, UA, and UI. Our experiments show that SociRank improves the quality and variety of automatically identified news topics.
EXISTING SYSTEM:
  • Two traditional methods for detecting topics are LDA and PLSA. LDA is a generative probabilistic model that can be applied to different tasks, including topic identification.
  • PLSA, similarly, is a statistical technique, which can also be applied to topic modeling. In these approaches, however, temporal information is lost, which is paramount in identifying prevalent topics and is an important characteristic of social media data.
  • Matsuo et al. employed a different approach to achieve the clustering of co-occurrence graphs. They used Newman clustering to efficiently identify word clusters. The core idea behind Newman clustering is the concept of edge betweenness. The betweenness measure of an edge is the number of shortest paths between pairs of nodes that run along it. If a network contains clusters that are loosely connected by a few intercluster edges, then all shortest paths between different clusters must go along one of these edges. Consequently, the edges connecting different clusters will have high edge betweenness, and removing them iteratively will yield well-defined clusters.
DISADVANTAGES OF EXISTING SYSTEM:
  • Even after the removal of unimportant content, there is still information overload in the remaining news-related data, which must be prioritized for consumption.
  • LDA and PLSA only discover topics from text corpora; they do not rank based on popularity or prevalence.
  • The main disadvantage of the algorithm was its high computational demand.
  • The existing work, however, only considers the personal interests of users, and not prevalent topics at a global scale.
  • These methods, however, only use data from microblogs and do not attempt to integrate them with real news. Additionally, the detected topics are not ranked by popularity or prevalence.
PROPOSED SYSTEM:
  • We propose an unsupervised system—SociRank—which effectively identifies news topics that are prevalent in both social media and the news media, and then ranks them by relevance using their degrees of MF, UA, and UI. Even though this paper focuses on news topics, it can be easily adapted to a wide variety of fields, from science and technology to culture and sports.
  • To achieve its goal, SociRank uses keywords from news media sources (for a specified period of time) to identify the overlap with social media from that same period.
  • We then build a graph whose nodes represent these keywords and whose edges depict their co-occurrences in social media. The graph is then clustered to clearly identify distinct topics. After obtaining well-separated topic clusters (TCs), the factors that signify their importance are calculated. Finally, the topics are ranked.
ADVANTAGES OF PROPOSED SYSTEM:
  • To the best of our knowledge, no other work attempts to employ the use of either the social media interests of users or their social relationships to aid in the ranking of topics.
  • Moreover, SociRank undergoes an empirical framework, comprising and integrating several techniques, such as keyword extraction, measures of similarity, graph clustering, and social network analysis.
  • The effectiveness of our system is validated by extensive controlled and uncontrolled experiments.
SYSTEM ARCHITECTURE:
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS: 
  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB
SOFTWARE REQUIREMENTS: 
  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Eclipse Luna
  • Database : MYSQL
REFERENCE:
Derek Davis, Gerardo Figueroa, and Yi-Shin Chen, “SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors”, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, 2017.


Project cost 10000

Thanks and Regards,
Mumbai Academics | Airoli 
8097636691 (Gaurav Sir)[Project Manager]
7506234650 (Hema Yadav)[HR]
Row House No 7,Opp Datta Meghe College, 
Sector 2,Airoli ,Navi Mumbai MH 400708