JCU Australia logo

Subject Search

MA3831 - Natural Language Processing, Web Scraping and Large Data Processing

Credit points: 3
Year: 2021
Student Contribution Band: Band 2
Administered by: College of Science and Engineering

This subject will provide students with cutting-edge tools and techniques for data science. There are two parts to this subject. In the first half of the subject, student will explore natural language processing (NLP), web scraping and APIs to harvest data with Python and explore the data science workbench approach to managing production pipelines of work that can be re-used in different data science projects. In the second half of the subject, student will focus on computer models and software designed to handle Big Data sets in a distributed and/or parallel fashion. Particular focus will be given to distributed and parallel computing using Map-Reduce/Hadoop and similar models for processing Big Data sets.

Learning Outcomes

  • understand and apply new data science skills, knowledge and techniques to solve problems in data science using NLP;
  • apply data science skills, knowledge and techniques to solve problems in data science NLP projects with a focus on web scraping;
  • understand how to deploy data science projects into production pipelines;
  • compare and evaluate different systems and approaches for high-performance and large-scale computing for analytics for standard data and big data;
  • manage and prepare data using standard management frameworks for the purpose of transforming, cleaning to ensuring classical characteristic outcomes are achieved;
  • examine and deploy data processing tasks in the Hadoop ecosystem for big data.

Subject Assessment

  • Written > Case report 1 - (20%) - Individual
  • Written > Project report - (50%) - Individual
  • Written > Technical report - (30%) - Individual.
Prerequisites: CP1404

Availabilities

Townsville, Internal, Study Period 1
Census Date 25-Mar-2021
Workload expectations:

The student workload for this 3 credit point subject is approximately 130 hours.

  • 26 hours pre-recorded content/lectures
  • 26 hours online workshops
  • assessment and self-directed study

Cairns, Internal, Study Period 1
Census Date 25-Mar-2021
Workload expectations:

The student workload for this 3 credit point subject is approximately 130 hours.

  • 26 hours pre-recorded content/lectures
  • 26 hours online workshops
  • assessment and self-directed study

Note: Minor variations might occur due to the continuous Subject quality improvement process, and in case of minor variation(s) in assessment details, the Subject Outline represents the latest official information.