BA03 Introduction to Big Data

WARNING - Clicking on the "SUBMIT ASSIGNMENT" button will submit the

Assignment. Be sure that you have reviewed your answers before clicking it. Attempt all the questions. All questions are compulsory. Each question carries 4 marks. There is No Negative Marking for wrong answer/s.

Please note: There are 25 questions out of which Q.No.21-25 are based on the Case Study.

Subject Code: BA03

Subject Name:



Component name: TERM


Question 1:- What are the four V’s of Big Data?

a)         Volume                       

b)         Velocity                       

c)         Variety                       

d)         All of the mentioned                       

Question 2:- Input to the _______ is the sorted output of the mappers a)           Reducer                       

b)         Mapper                       

c)         Shuffle                       

d)         All of the mentioned                       

Question 3:- According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?

a)         Data warehousing and business intelligence                       

b)         Big data management and data mining                       

c)         Management of Hadoop clusters                       

d)         Collecting and storing unstructured data                       

Question 4:- __________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.

a)         Partitioner                       

b)         OutputCollector                       

c)         Reporter                       

d)         All of the mentioned                       

Question 5:- ________ is the slave/worker node and holds the user data in the form of Data Blocks. a)           DataNode                       

b)         NameNode                       

c)         Data block                       

d)         Replication                       

Question 6:- What are the different features of Big Data Analytics?


  Data Recovery                       



  All of the mentioned                        

Question 7:- ___________ is the world’s most complete, tested, and popular distribution of Apache Hadoop and related projects.





Question 8:- Cloudera ___________ includes CDH and an annual subscription license (per node) to Cloudera Manager and technical support.




  All of the mentioned                        

Question 9:- __________ is a online NoSQL developed by Cloudera.



  Imphala                          Oozie                       

Question 10:- CDH process and control sensitive data and facilitate _____________ .

  flexibilty                          scalabilty                       

  multi-tenancy                          resuability                       

Question 11:- Which of the following is a distributed graph processing framework on top of Spark?

  Spark Streaming                       



  All of the mentioned                       

Question 12:- Based on which functional programming language construct for Spark optimizer?





Question 13:- You can delete a column family from a table using the method _________ of HBAseAdmin class.

  delColumn()                          removeColumn()                          deleteColumn()                       

  None of the mentioned                       

Question 14:- What is the default size of distributed cache?

  8 GB                       

  10 GB                       

  16 GB                       

  20 GB                       

Question 15:- Which of the following is a data processing engine for clustered computing?




  All of the mentioned                       

Question 16:- Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________

  MapReduce, Hive and Hbase                       

  MapReduce, MySQL and Google Apps                       

  MapReduce, Hummer and Iguana                       

  MapReduce, Heron and Trumpet                       

Question 17:- In NameNode HA, when active node fails, which node takes the responsibility of active node?

  Secondary NameNode                       

  Backup node                       

  Standby node                       

  Checkpoint node                       

Question 18:- As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including

  Improved data storage and information retrieval                       

  Improved extract, transform and load features for data integration                       

  Improved data warehousing functionality                       

  Improved security, workload management and SQL support                       

Question 19:- Which of the following is the reason for Spark being faster than MapReduce while execution time?

  It supports different programming languages like Scala, Python, R, and Java.                       


  DAG execution engine and in-memory computation (RAM based)                       

  All of the mentioned                       

Question 20:- The ________ class provides the getValue() method to read the values from its instance.





Case Study

Employees are a both a business’s greatest asset and its greatest expense. So hitting on the right formula for selecting them, and keeping them in place, is absolutely essential. One company offering unique solutions to help others tackle this challenge is Cornerstone. Cornerstone is a software tool which helps assess and understand employees and candidates by crunching half a billion data points on everything from gas prices, unemployment rates and social media use. Clients such as Xerox use it to predict, for example, how long an employee is likely to stay in his or her job, and remarkable insights gleaned include the fact that in some careers, such as call centre work, employees with criminal records perform better than those without. Its prowess has made Cornerstone into a huge success, with sales growing by 150% from 2012 to 2013 and the software being put to use by 20 of the Fortune 100 companies. The “data points” are measurements taken from employees working across 18 industries in 13 different countries, providing information on everything from how long they take to travel to work, to how often they speak to their managers. Data collection methods include the controversial “smart badges” that monitor employee movements and track which employees interact with each other. Cornerstone has certainly caused positive change in companies using it – Bank of America reportedly improved performance metrics by 23% and decreased stress levels (measured by analysing worker’s voices) by 19%, simply by allowing more staff to take their breaks together. And Xerox reduced call centre turnover by 20% by applying analytics to prospective candidates – finding among other things that creative people were more likely to remain with the company for the 6 months necessary to recoup the $6,000 cost of their training than inquisitive people. So far data gathering and analysis has focused mainly on customerfacing members of staff, who in larger organizations will tend to be those with less responsibility and decision-making power. Could even greater benefits be taken by applying the same principles to the movers and shakers in the boardroom, who hold the keys to widerreaching business change? Certainly some companies are starting to think that way. The director of research and strategy at one firm that uses the software – David Lathrop of

Steelcase – told the Financial Times this year that improving the performance of top executives has a

“disproportionate effect on the company”. Although he did not disclose precise details of methods or results, much research is being carried out in the name of finding exactly what it is that makes highfliers tick. This will inevitably find its way into analytical projects at big companies which spend millions hiring executives. Crunching employee data at this level plainly has the opportunity to bring huge benefits, but it could also prove disastrous if a company gets it wrong. Failing to take proper consideration of individuals’ rights to privacy in some jurisdictions (eg Europe) can lead to severe legal penalties. In my opinion, any company thinking about carrying out datagathering and analysis for these purposes needs to take great care. In workplaces where morale is low or relationships between workers and managers are not good, it could very easily be seen as a case of taking snooping too far. Interestingly, Cornerstone’s privacy policy makes it clear that information on applicants is provided to them by their clients, including names, work history and contact details. How many people know that simply by applying for a job with one of these clients, their personal data will be made available for analysis? It appears that Cornerstone absolves itself of responsibility here by declaring itself a “mere data processor” – putting the onus on the client businesses to gain permission to distribute their applicants’ and employees’ data. It is vitally important that staff are made aware of precisely what data is being gathered from them, and what it is being used for. Everyone (and certainly those running the operation) needs to be aware that the purpose is to increase overall company efficiency, rather than assess or monitor individual members of staff. With more than half of human resources departments reporting an increase in data analytics since 2010, according to a report by the Economist Intelligence Unit, it’s obvious that like it or not, it’s here to stay. Companies that use it well, with respect for their employees’ privacy and an understanding of the vital principle mentioned above, are likely to prosper.

Question 21:- You have data of a website contains information of logged in user ,one user may have multiple fields. But the number of fields per user may vary based on his actions.In that case which component of hadoop you will use to store the data?

  1. Pig
  2. MSSQL
  3. hbase

d)   ORACLE 8I                       

Question 22:- Assume If data from external sources is getting populated in to hdfs in csv format on a daily basis, How would you handle it efficiently so that it can be processed by other applications and also reduce the data storage?

a)            Hbase                       

b)            Using ORC or Parquet format in hive, Deleting old hdfs data and Create business partdate as a partition in hive.                       

c)            Pig                       

d)            ORACLE 8I                       

Question 23:- Which tool is helpful to establised relationship and find employement life cycle in the above situation? a)           NEO4J                       

b)         Pig                       

c)         ORACLE 8I                       

d)         Hive                       

Question 24:- For the management and anlytics of the data in above situation, What is a block in HDFS and what is its default size in Hadoop 1 and Hadoop 2?

a)         32 MB and 64 MB                       

b)         16 MB and 32 MB                       

c)         128 MB and 256 MB                        

d)         64 MB and 128 MB                       

Question 25:- Which machine learning algorithm is most suitable for analytics and forecasting? a)           Decision Tree                       

b)         Regression                       

c)         Classification                        

d)         Random Forest                       


Download Sample Now

Earn back money you have spent on downloaded sample

Upload Document Document Unser Evaluion Get Money Into Your Wallet

Cite This work.

To export a reference to this article please select a referencing stye below.

Assignment Hippo (2021) . Retrive from

"." Assignment Hippo ,2021,

Assignment Hippo (2021) . Available from:

[Accessed 11/04/2021].

Assignment Hippo . ''(Assignment Hippo,2021) accessed 11/04/2021.

Want latest solution of this assignment

Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.

+ Submit Your Assignment Here

Captcha Image

AssignmentHippo Features

On Time Delivery

Our motto is deliver assignment on Time. Our Expert writers deliver quality assignments to the students.

Plagiarism Free Work

Get reliable and unique assignments by using our 100% plagiarism-free.

24 X 7 Live Help

Get connected 24*7 with our Live Chat support executives to receive instant solutions for your assignment.

Services For All Subjects

Get Help with all the subjects like: Programming, Accounting, Finance, Engineering, Law and Marketing.

Best Price Guarantee

Get premium service at a pocket-friendly rate at AssignmentHippo


Client Review

I was struggling so hard to complete my marketing assignment on brand development when I decided to finally reach to the experts of this portal. They certainly deliver perfect consistency and the desired format. The content prepared by the experts of this platform was simply amazing. I definitely owe my grades to them.

Tap to Chat
Get instant assignment help