Show simple item record

dc.contributor.advisor Rafea, Ahmed
dc.contributor.author Mostafa, Nada Ayman A.
dc.date.accessioned 2016-04-11T09:30:25Z
dc.date.available 2016-04-11T22:00:10Z
dc.date.created Spring 2016 en_US
dc.date.issued 2016-04-11
dc.identifier.uri http://dar.aucegypt.edu/handle/10526/4691
dc.description.abstract Social media has become the first source of information for many people. The amount of information posted on social media daily has become very vast that it became difficult to track. One of the most popular social media applications is Twitter. Users follow lots of news accounts, public figures, and their friends so they can be updated by the latest events around them. Since the dialect language and the style of writing differ from a region to another, our objective in this research is to extract trending topics for an Egyptian twitter user. In this way, the user can easily get at a glimpse of the trending topics discussed by the people he follows. To find the best approach achieving our objective, we investigate the document pivot and the feature pivot approaches. By applying the document pivot approach on the baseline data using tf-itf (term frequency-inverse tweet frequency) representation, repeated bisecting k-means clustering technique and extracting most frequent n-grams from each cluster we could achieve a recall value of 100% and F1 measure of 0.8. The application of the feature pivot approach on the baseline data using the content similarity algorithm to group related unigrams together, could achieve a recall value of 100% and F1 measure of 0.923. To validate our results we collected 12 different data sets of different sizes (200, 400, 600, and 1200) and from three different domains (sports, entertainment, and news) then applied both approaches to them. The average recall, precision and F1 measure values resulted from applying the feature pivot approach are larger than those achieved by applying the document pivot approach. To make sure this difference in results is statistically significant we applied the Two-sample one-tailed paired significance t-test that showed the results are significantly better at confidence interval of 90% The results showed that the document pivot approach could extract the trending topics for an Egyptian twitter user with an average recall value of 0.714, average precision value of 0.521, and average F1 measure value of 0.556 versus average recall, precision and F1 measure values of 0.981, 0.754, and 0.833 respectively, when applying the feature pivot approach.   en_US
dc.format.extent 103 p. en_US
dc.format.medium theses en_US
dc.language.iso en en_US
dc.rights Author retains all rights with regard to copyright. en
dc.subject Data Mining en_US
dc.subject Social Media en_US
dc.subject Topic Extraction en_US
dc.subject.lcsh Thesis (M.S.)--American University in Cairo en_US
dc.title Trending topic extraction from social media en_US
dc.type Text en_US
dc.subject.discipline Computer Science en_US
dc.rights.access This item is available en_US
dc.contributor.department American University in Cairo. Dept. of Computer Science and Engineering en_US
dc.description.irb American University in Cairo Institutional Review Board approval has been obtained for this item. en_US
dc.contributor.committeeMember Aly, Sherif G.
dc.contributor.committeeMember El Kadi, Amr
dc.contributor.committeeMember Rashwan, Mohsen


Files in this item

Icon

This item appears in the following Collection(s)

  • Theses and Dissertations [1728]
    This collection includes theses and dissertations authored by American University in Cairo graduate students.

Show simple item record