COMP30770 Programming for Big Data

Academic Year 2024/2025

`Big Data' refers to datasets that are too big, or change too quickly, for traditional data management and data processing approaches. Big Data has forced the field of data management to rethink some of it design concepts and architectural patterns. This module will walk the students through the complex set of concepts and projects that form the Big Data stack. Students will learn how to set up Big Data environments, how to use efficient data management operations and how to run algorithms - to the scale and speed required by Big Data datasets. Students will also be able at the end of this module to design and implement their own solutions to address Big Data problems.

Show/hide contentOpenClose All

Curricular information is subject to change

What will I learn?

Learning Outcomes:

On successful completion of this module the learner will be able to:
- Review the data processing using Shell and traditional data management systems using SQL;
- Understand the problem of managing data at scale and why traditional data management systems are failing
- Understand the various data management paradigms used in the context of Big Data (e.g., relational, NoSQL)
- Understand the role of distributed file systems (e.g., using HDFS) that support big data programming
- Understand Big Data programming models such as Map/Reduce and Spark, and how to use them on real examples
- Understand other Spark extensions for various big data applications such as MLlib, GraphX, Spark Streaming, etc.

How will I learn?

Student Effort Hours:

Student Effort Type	Hours
Lectures	12
Practical	24
Autonomous Student Learning	64
Total	100

Approaches to Teaching and Learning:
peer and group work; lectures; lab/studio work;

Am I eligible to take this module?

Requirements, Exclusions and Recommendations

Not applicable to this module.

Module Requisites and Incompatibles

Pre-requisite:
COMP20250 - Introduction to Java, COMP20350 - Object-Oriented Programming

How will I be assessed?

Assessment Strategy

Description	Timing	Open Book Exam	Component Scale	Must Pass Component	% of Final Grade
Group Work Assignment: A comparative study on solving a data-intensive task with and without big data programming.		n/a	Graded	No	30
Exam (In-person): 2-hour closed-book paper-based exam		n/a	Graded	No	70

Carry forward of passed components
Yes

What happens if I fail?

Resit In	Terminal Exam
Summer	Yes - 2 Hour

Please see Student Jargon Buster for more information about remediation types and timing.

Assessment feedback

Feedback Strategy/Strategies

• Feedback individually to students, on an activity or draft prior to summative assessment
• Group/class feedback, post-assessment
• Self-assessment activities

How will my Feedback be Delivered?

solutions to lab practices will be provided;

UCD Course Search
Programming for Big Data (COMP30770)

Academic Year 2024/2025

The information contained in this document is, to the best of our knowledge, true and accurate at the time of publication, and is solely for informational purposes. University College Dublin accepts no liability for any loss or damage howsoever arising as a result of use or reliance on this information.

Programming for Big Data (COMP30770)

Subject:: Computer Science
College:: Science
School:: Computer Science
Level:: 3 (Degree)
Credits:: 5.0
Trimester:: Spring
Module Coordinator:: Dr Shen Wang
Mode of Delivery:: Face-to-Face
Internship Module:: No
How will I be graded?: Letter grades

(Google Chrome is recommended when printing this page)