Show simple item record

dc.contributor.advisor Moustafa, Mohamed Hosny, Karim Mohamed 2020-02-09T09:12:36Z Fall 2019 en_US 2020-02-09
dc.description.abstract Human action recognition is attempting to identify what kind of action is being performed in a given video by a person, it is considered one of the important topics in machine learning and computer vision. It’s importance comes from it’s need in many applications such as security applications and human computer interaction. Many methods have been researched to attempt to solve the problem, ranging from handcrafting techniques to deep neural network techniques and methods such as 3D convolution and recurrent neural networks has been used as well. Popular datasets have been curated in order to benchmark the methods researched to tackle this problem, datasets such as UCF-101 and HMDB-51 are the most popular and are being tested with for all current and past techniques in the area of human action recognition. two-stream convolutional networks, a deep learning technique, has picked up the trend in recent years to solve the human action recognition problem. Most famous method for solving the problem is by pre-processing the video to generate optical flow data or dense trajectories then feed them to a deep neural network alongside feeding static individual image frames of the video. We attempt to ask the question of can we classify human action without the need for pre-processing or handcrafted feature generation before using deep learning for classification? And how will 3D convolution affect the temporal stream and the overall classification accuracy. We contribute to solving the human action recognition problem by introducing a new end-to-end solution using two-stream convolutional network that learns static features and temporal features without any pre-processing for the data to generate optical flow or dense trajectories for video temporal information. Our method has been tested on UCF-101 and HMDB-51 datasets to compete with state of the art techniques. It shows that we were able to achieve high accuracy results without any pre-processing needed unlike current popular methods. Our method ranked among the highest in UCF-101, the only method which had a higher accuracy was a research modifying the original two-stream network by adding new fusion techniques. And ranked the highest in the HMDB-51 in comparison with the other techniques. en_US
dc.format.extent 69 p. en_US
dc.format.medium theses en_US
dc.language.iso en en_US
dc.rights Author retains all rights with regard to copyright. en
dc.subject human action recognition en_US
dc.subject machine learning en_US
dc.subject UCF-101 en_US
dc.subject HMDB-51 en_US
dc.subject convolutional networks en_US
dc.subject CNN en_US
dc.subject 3D convolution en_US
dc.subject two stream convolutional network en_US
dc.subject deep learning en_US
dc.subject artificial intelligence en_US
dc.subject computer vision en_US
dc.subject ResNet-50 en_US
dc.subject video recognition en_US
dc.subject.lcsh Thesis (M.S.)--American University in Cairo en_US
dc.title 3D convolution with two-stream convNets for human action recognition en_US
dc.type Text en_US
dc.subject.discipline Computer Science en_US
dc.rights.access This item is restricted for 1 year from the date issued en_US
dc.contributor.department American University in Cairo. Dept. of Computer Science and Engineering en_US
dc.embargo.lift 2021-02-08T09:12:36Z
dc.description.irb American University in Cairo Institutional Review Board approval is not necessary for this item, since the research is not concerned with living human beings or bodily tissue samples. en_US
dc.contributor.committeeMember Goneid, Amr
dc.contributor.committeeMember Khalil, Mahmoud

Files in this item


This item appears in the following Collection(s)

  • Theses and Dissertations [1788]
    This collection includes theses and dissertations authored by American University in Cairo graduate students.

Show simple item record