scala - Custom input reader in spark -


I'm new to Spark and would like to load page records in RDD from a Wikipedia dump

I Thaup tried to use a record reader provided in streaming but could not understand how to use it. Has anyone made me a good RDD with the page record to make the following code?

  Import org.apache.hadoop.io.Text Import org.apache.hadoop.streaming.StreamXmlRecordReader Import org.apache .hadoop.mapred.JobConf Import org.apache.spark.SparkConf Import org.apache.spark.SparkContext objection WikiTest {def main (args: array [string]) {// configuration val sparkConf = new SparkConf () .setMaster ("local [4]") .setAppName ("WikiDumpTest") Val JobConf = new JobConf () jobConf.set ("input", "enwikisource-20,140,906-pages-article-multistream.xml") jobConf.set ("stream.recordreader .class", "org.apache.hadoop.streaming.StreamXmlRecordReader ") JobConf.set (" stream.recordreader.begin "" & lt; page & gt; ") jobConf.set (" stream.recordreader.end "" & lt; / page & gt; ") val sparkContext = new SparkContext (SparkConf) // wikiData = spark Context.hadoopRDD (jobConf, classOf [StreamXmlRecordReader], classOf [text], classOf [text]) // count lines println (reading data wikiData.count)}}   

Refused to use the Ark's StreamXmlRecordReader. I get the following error:

[Error] found: Class [org.apache.hadoop.streaming.StreamXmlRecordReader (classOf [org.apache.hadoop.streaming.StreamXmlRecordReader])

[Error] Required: Class [? _Up & lt;: org.apache.hadoop.mapreduce.InputFormat [,]]

[Error] classOf [StreamXmlRecordReader]

If I warn the eclisis Ignore and launch the Pregram, so I hit java.lang.ClassNotFoundException.

You get the classOf [org.apache.hadoop.streaming.StreamInputFormat] Instead of classOf [StreamXmlRecordReader] .

should use Java. Lang.ClassNotFoundException is because you can run your class WikiTest , but it does not exist because it can not be compiled.

Comments

Popular posts from this blog

php - PDO bindParam() fatal error -

logging - How can I log both the Request.InputStream and Response.OutputStream traffic in my ASP.NET MVC3 Application for specific Actions? -

java - Why my included JSP file won't get processed correctly? -