COMP47470

Academic Year 2024/2025

Big Data Programming (COMP47470)

Subject:: Computer Science
College:: Science
School:: Computer Science
Level:: 4 (Masters)
Credits:: 5
Module Coordinator:: Dr Ravi Manumachu
Trimester:: Autumn and Spring (separate)
Mode of Delivery:: On Campus
Internship Module:: No
How will I be graded?: Letter grades

Curricular information is subject to change.

Big data refers to high-volume, high-velocity and/or high-variety data that is too complex to be dealt with by traditional (relational) data management and data processing systems. The data-intensive nature of big data applications has pushed research and industry practitioners to build innovative solutions that are inherently distributed software systems with novel programming and execution models. This module describes, compares and contrasts the pioneering and leading big data technologies (NoSQL, batch, streaming, and graph). It will teach students how to install a Big Data software technology (NoSQL, batch, streaming, graph) and develop (code and test) a big data application using the technology.

About this Module

Learning Outcomes:

(a). Explain and illustrate properties of traditional and big data management and processing systems (ACID, CAP, BASE).
(b). Compare and contrast relational, NoSQL and newSQL data management systems.
(c). Describe, distinguish, and use big data technologies for batch, stream, and graph processing.
(d). Develop (design and implement) NoSQL, batch, streaming and graph big data applications.

Indicative Module Content:

Introduction to Big Data (Characteristics and classifications)
Database Concepts, Architecture, Database Modelling and Design
The Relational Data Model, SQL, and Introduction to MySQL
Introduction to NoSQL Databases, MongoDB document NoSQL database
MapReduce Programming Model, Introduction to Apache Hadoop, HDFS
Distributed Data Processing using Apache Spark
Introduction to Graph processing and developing large graph applications using Spark's GraphX
Introduction to Data Streams and developing streaming applications using Spark's structured streaming API
Machine learning (supervised, unsupervised, recommendation) using Spark's MLlib API

Student Effort Hours:

Student Effort Type	Hours
Autonomous Student Learning	62
Lectures	24
Practical	24
Total	110

Approaches to Teaching and Learning:

Lectures, Laboratory Practicals, Weekly Quizzes, Mid-term Assignments, End-term Exam

Requirements, Exclusions and Recommendations

Learning Recommendations:

It is strongly recommended that students have an acceptable competency level in bash scripting and Python programming language.

Module Requisites and Incompatibles

Not applicable to this module.

Assessment Strategy

Description	Timing	Component Scale	Must Pass Component	% of Final Grade	In Module Component Repeat Offered
Exam (In-person): 2 hour End of Trimester Exam	End of trimester Duration: 2 hr(s)	Alternative linear conversion grade scale 40%	No	40	No
Assignment(Including Essay): Continuous Assessment Throughout the Trimester.	Week 3, Week 6, Week 9, Week 11, Week 12	Alternative linear conversion grade scale 40%	No	50	No
Quizzes/Short Exercises: A quiz each week comprising questions from the lecture delivered in the same week.	Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9, Week 10, Week 11	Alternative linear conversion grade scale 40%	No	10	No

Carry forward of passed components
Yes

Remediation Type	Remediation Timing
Repeat	Within Two Trimesters

Please see Student Jargon Buster for more information about remediation types and timing.

Feedback Strategy/Strategies

• Feedback individually to students, post-assessment
• Group/class feedback, post-assessment
• Online automated feedback

How will my Feedback be Delivered?

Solutions to weekly quizzes. Solutions to continuous assessment assignments.

Fundamentals Of Database System, 7th Edition
by Elmasri Ramez and Navathe Shamkant

Hadoop - The Definitive Guide 4e: Storage and Analysis at Internet Scale, 4th Edition
by Tom White

Spark - The Definitive Guide: Big data processing made simple
by Bill Chambers, Matei Zaharia

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann

Explore UCD

About UCD

Students

Research & Innovation

Colleges

Engage

Key Services

COMP47470

Big Data Programming (COMP47470)

About this Module

Learning Outcomes:

Student Effort Hours:

Approaches to Teaching and Learning:

Feedback Strategy/Strategies

How will my Feedback be Delivered?

Explore UCD

About UCD

Students

Research & Innovation

Colleges

Engage

Key Services

COMP47470

Big Data Programming (COMP47470)

About this Module

What will I learn?

Learning Outcomes:

How will I learn?

Student Effort Hours:

Approaches to Teaching and Learning:

Am I eligible to take this module?

How will I be assessed?

What happens if I fail?

Assessment feedback

Feedback Strategy/Strategies

How will my Feedback be Delivered?

Reading List

Ask a Question: