COMP47470

Academic Year 2025/2026

Big Data Programming (COMP47470)

Subject:: Computer Science
College:: Science
School:: Computer Science
Level:: 4 (Masters)
Credits:: 5
Module Coordinator:: Dr Ravi Manumachu
Trimester:: Autumn and Spring (separate)
Mode of Delivery:: On Campus
Internship Module:: No
How will I be graded?: Letter grades

Curricular information is subject to change.

Big data refers to high-volume, high-velocity and/or high-variety data that is too complex to handle by traditional (relational) data management and data processing systems. The data-intensive nature of big data applications has pushed research and industry practitioners to build innovative solutions that are inherently distributed software systems with novel programming and execution models. This module describes, compares and contrasts the pioneering and leading big data technologies (NoSQL, batch, streaming, and graph). It will teach students how to install Big Data software technology (NoSQL, batch, streaming, graph) and develop (code and test) a big data application.

About this Module

Learning Outcomes:

(a). Explain and illustrate properties of traditional and big data management and processing systems (ACID, CAP, BASE).
(b). Compare and contrast relational, NoSQL and newSQL database management systems.
(c). Describe, distinguish, and work with big data technologies for batch, stream, and graph processing.
(d). Develop (design and implement) NoSQL, batch, streaming and graph big data applications.

Indicative Module Content:

Introduction to Big Data (Characteristics and classifications)
Big Data reference architectures and Classification of Data Intensive Distributed Systems
Database Concepts and Architecture
The Relational Data Model, SQL, and Introduction to MySQL
Introduction to NoSQL Databases, MongoDB document NoSQL database
MapReduce Programming Model, Introduction to Apache Hadoop, HDFS and YARN
Distributed Data Processing using Apache Spark
Introduction to Graph processing and developing large graph applications using Spark's GraphX
Introduction to Data Streams and developing streaming applications using Spark's structured streaming API
Machine learning (supervised, unsupervised, recommendation) using Spark's MLlib API

Student Effort Hours:

Student Effort Type	Hours
Autonomous Student Learning	62
Lectures	24
Practical	24
Total	110

Approaches to Teaching and Learning:

Lectures, Laboratory Practicals, Weekly Quizzes, Mid-term Assignments, End-term Exam

Requirements, Exclusions and Recommendations

Learning Recommendations:

It is strongly recommended that students have an acceptable competency level in bash scripting and Python programming language.

Module Requisites and Incompatibles

Not applicable to this module.

Assessment Strategy

Description	Timing	Component Scale	Must Pass Component	% of Final Grade	In Module Component Repeat Offered
Assignment(Including Essay): Continuous Assessment Throughout the Trimester	Week 6, Week 12	Alternative linear conversion grade scale 40%	No	40	No
Exam (In-person): 2 hour End of Trimester Exam	End of trimester Duration: 2 hr(s)	Alternative linear conversion grade scale 40%	No	50	No
Quizzes/Short Exercises: A quiz each week comprising questions from the lecture delivered in the same week.	Week 12	Alternative linear conversion grade scale 40%	No	10	No

Carry forward of passed components

Yes

Remediation Type	Remediation Timing
Repeat	Within Two Trimesters

Please see Student Jargon Buster for more information about remediation types and timing.

Feedback Strategy/Strategies

• Feedback individually to students, post-assessment
• Group/class feedback, post-assessment
• Online automated feedback

How will my Feedback be Delivered?

Solutions to weekly quizzes. Solutions to continuous assessment assignments.

Fundamentals Of Database Systems, 7th Edition
by Elmasri Ramez and Navathe Shamkant

Hadoop - The Definitive Guide 4e: Storage and Analysis at Internet Scale, 4th Edition
by Tom White

Spark - The Definitive Guide: Big data processing made simple
by Bill Chambers, Matei Zaharia

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann

Explore UCD

About UCD

Students

Research & Innovation

Colleges

Engage

Key Services

COMP47470

Big Data Programming (COMP47470)

About this Module

Learning Outcomes:

Student Effort Hours:

Approaches to Teaching and Learning:

Requirements, Exclusions and Recommendations

Module Requisites and Incompatibles

Assessment Strategy

Carry forward of passed components

Feedback Strategy/Strategies

How will my Feedback be Delivered?

Explore UCD

About UCD

Students

Research & Innovation

Colleges

Engage

Key Services

COMP47470

Big Data Programming (COMP47470)

About this Module

What will I learn?

Learning Outcomes:

How will I learn?

Student Effort Hours:

Approaches to Teaching and Learning:

Am I eligible to take this module?

Requirements, Exclusions and Recommendations

Module Requisites and Incompatibles

How will I be assessed?

Assessment Strategy

Carry forward of passed components

What happens if I fail?

Assessment feedback

Feedback Strategy/Strategies

How will my Feedback be Delivered?

Reading List

Ask a Question: