Processingdatawithapachesparkinthissection processingmapreduce andbeyond processingscenarios statisticsonthelivetwitterstream wewillbeusingsbt tobuild includetheversionofthescalainterpreterthatsparklinksto repositoryusedtoresolveimplicitdependencies hadooplibraries sbtstart-scriptthehelpercanbeinvokedasfollows paramn viasbtwiththefollowingcommand toexecutecode instantiateasparkcontextobject twitter tobepresentinthesamedirectorywheresbtorspark-shellisbeinginvoked oauth oauth oauth chapter- jar paramn
Processing�data�with�Apache�Spark
In�this�section,�we�will�implement�the�examples�from�Chapter�3,�Processing�–�MapReduce
and�Beyond,�using�the�Scala�API.�We�will�consider�both�the�batch�and�real-time
processing�scenarios.�We�will�show�you�how�Spark�Streaming�can�be�used�to�compute
statistics�on�the�live�Twitter�stream.
$�sbt�compile
Or,�it�can�be�packaged�into�a�JAR�file�with:
$�target/start�<class�name>�<master>�<param1>�…�<param�n>
Here,�<master>�is�the�URI�of�the�master�node.�An�interactive�Scala�session�can�be�invoked via�sbt�with�the�following�command:
To�run�the�examples�on�a�YARN�grid,�we�first�build�a�JAR�file�using:
$�sbt�package