MapReduce-Based Big Data Processing Technology

In Web 3.0, Resource Description Framework (RDF) is used for conceptual description or modeling of the information implemented in Web resources because of its ability to represent data in a machine-readable format. It has been widely applied in many knowledge management applications of Web 3.0. An RDF data set consists of a set of RDF triples in the format (subject, predicate, object). To be able to globally identify a Web resource uniquely, URIs (Uniform Resource Identifier) are used in RDF (Brickley & Guha, 2014). Such triple format can be translated into a directed labeled graph. Query RDF triples are technically equivalent to conducting a large number of join operations. Traditional relational query cannot handle such large number of “star joins” efficiently. Therefore, many research works introduced MapReduce framework into their solutions to apply parallel computing to improve RDF query performance. However, due to many aspects involved in this subject, these research works have applied many different approaches to introducing the MapReduce framework as part of their solutions (Sakr & Gaber, 2014).

In this assignment, you are asked to conduct a survey of the existing research works on the approaches to applying MapReduce to improve RDF data query processing performance. You need to create a short research report to compare at least 2 solutions which have applied distinctive approaches in introducing MapReduce framework into their RDF data query processing solutions.

Create a survey report with a focus on the following four aspects:

Identify the main focus of each solution.
Identify the main technical changes made to the MapReduce framework by the solution.
Specify the rationale for these technical changes.
Analyze the pros and cons of each solution.

Sample Solution