COMP30770 Programming for Big Data

Academic Year 2023/2024

`Big Data' refers to datasets that are too big, or change too quickly, for traditional data management and data processing approaches. Big Data has forced the field of data management to rethink some of it design concepts and architectural patterns. This module will walk the students through the complex set of concepts and projects that form the Big Data stack. Students will learn how to set up Big Data environments, how to use efficient data management operations and how to run algorithms - to the scale and speed required by Big Data datasets. Students will also be able at the end of this module to design and implement their own solutions to address Big Data problems.

Show/hide contentOpenClose All

Curricular information is subject to change

Learning Outcomes:

On successful completion of this module the learner will be able to:
- Review the data processing using Shell and traditional data management systems using SQL;
- Understand the problem of managing data at scale and why traditional data management systems are failing
- Understand the various data management paradigms used in the context of Big Data (e.g., relational, NoSQL)
- Understand the role of distributed file systems (e.g., using HDFS) that support big data programming
- Understand Big Data programming models such as Map/Reduce and Spark, and how to use them on real examples
- Understand other Spark extensions for various big data applications such as MLlib, GraphX, Spark Streaming, etc.

Student Effort Hours: 
Student Effort Type Hours
Lectures

12

Practical

24

Autonomous Student Learning

64

Total

100

Approaches to Teaching and Learning:
peer and group work; lectures; lab/studio work; 
Requirements, Exclusions and Recommendations

Not applicable to this module.


Module Requisites and Incompatibles
Not applicable to this module.
 
Assessment Strategy  
Description Timing Open Book Exam Component Scale Must Pass Component % of Final Grade
Examination: 2-hour closed-book paper-based exam 2 hour End of Trimester Exam No Graded No

60

Class Test: MCQs for basic big data programming concepts End of trimester MCQ n/a Graded No

10

Group Project: A comparative study on solving a data-intensive task with and without big data programming. Coursework (End of Trimester) n/a Graded No

30


Carry forward of passed components
Yes
 
Resit In Terminal Exam
Summer Yes - 2 Hour
Please see Student Jargon Buster for more information about remediation types and timing. 
Feedback Strategy/Strategies

• Feedback individually to students, on an activity or draft prior to summative assessment
• Group/class feedback, post-assessment
• Self-assessment activities

How will my Feedback be Delivered?

solutions to lab practices will be provided;

Name Role
Priscilla Adong Tutor
Ms Cassidy Aytan Gigan Tutor
Riju Das Tutor
Zhongping Dong Tutor
Mossoun Franck Malick Jaures Ebiele Tutor
Mr Patrick English Tutor
Jiaying Guo Tutor
Nils Höhing Tutor
Haotian Li Tutor
Xiao Li Tutor
Mr Hrishikesh Dilip Mulay Tutor
Furqan Rustam Tutor
Weijiong You Tutor
Timetabling information is displayed only for guidance purposes, relates to the current Academic Year only and is subject to change.
 
Spring
     
Practical Offering 1 Week(s) - 20, 21, 22, 23, 24, 25, 26 Fri 09:00 - 10:50
External & School Exams Offering 1 Week(s) - 28 Fri 10:00 - 12:50
External & School Exams Offering 1 Week(s) - 28 Fri 10:00 - 13:50
Lecture Offering 1 Week(s) - 20, 21, 22, 23, 24, 25 Thurs 11:00 - 12:50
Lecture Offering 1 Week(s) - 26 Thurs 11:00 - 12:50
Lecture Offering 1 Week(s) - 26 Thurs 11:00 - 13:50
Practical Offering 1 Week(s) - 20, 21, 22, 23, 24, 25 Thurs 14:00 - 15:50
Spring