Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
March 25, 2022 08:35 pm GMT

Create a Hadoop playground with Docker Desktop on Windows in minutes

This semester, I've chosen to take a course about parallel computing. One of the projects involves writing a MapReduce program on Hadoop in Java. Connecting to the school's computing resources might be difficult at times, especially when a due date is approaching. As a result, I looked for an easy way to set up a local Hadoop environment with Docker on my Windows laptop, so that I could conduct some experiments quickly.

Preparations

Docker

Docker saves us from having to go through complicated installation procedures for certain softwares (including Hadoop in this post), and it also allows us to clearly delete them if we need to free up some disk space.

The first step is to download and install a Docker Desktop for Windows (for Mac if your OS is Mac) on your computer. Now, Docker Desktop supports using WSL 2 (Windows Subsystem for Linux 2) instead of Hyper-V as the backend. If you do not have WSL on your Windows machine, you could follow this official guide to enable it.

You may check the versions by typing the following commands in your terminal (Powershell/WSL shell) to test the correct installation of both Docker and Docker Compose once the Docker Desktop is installed and running.

$ docker --versionDocker version 20.10.13, build a224086$ docker-compose --versionDocker Compose version v2.3.3

It's also possible to check Docker's functioning by launching a sample docker container.

$ docker run -d -p 80:80 --name myserver nginx

VSCode

I'm sure every developer has installed VSCode, so you just need to make sure you have the plugin Remote Development installed in your VSCode. This enables you to develop in a container, on a remote machine, or in WSL.

Lets go Hadoop

Setup

As you can see from the steps below, setting up a Hadoop environment with Docker is rather simple.

Clone the repo big-data-europe/docker-hadoop under a certain path, then setup the Hadoop cluster via docker-compose.

git clone [email protected]:big-data-europe/docker-hadoop.gitcd docker-hadoopdocker-compose up -d

Now, you are all set :). After a few moments, you can check if it is working properly by visiting http://localhost:9870/.

Get on the train

It's finally time to meet your new Hadoop cluster. Start VS Code, then go to the left panel and select the Remote Development plugin. Select "Containers" from the dropdown above, then locate and connect to a container named "namenode" by clicking "Attach to the container" icon. You've arrived in the Hadoop world!

VS Code Remote Development

Hello "WordCount"

This is how we will test the Hadoop cluster. We will run the Word Count example (from source code in Java) to see how it works.

But don't hurry up just yet. Here are some more things to do. We need to make some sample input data.

Open the terminal in the VS Code. Then run:

mkdir inputecho "Hello World" > input/f1.txtecho "Hello Docker" > input/f2.txt

The inputs we have created are stored in your local (more precisely, in the docker container local). We also need to copy them to the HDFS.

hadoop fs -mkdir -p inputhdfs dfs -put ./input/* input

After the preparation, you can get the official WordCount example from this link.

import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount {  public static class TokenizerMapper       extends Mapper<Object, Text, Text, IntWritable>{    private final static IntWritable one = new IntWritable(1);    private Text word = new Text();    public void map(Object key, Text value, Context context                    ) throws IOException, InterruptedException {      StringTokenizer itr = new StringTokenizer(value.toString());      while (itr.hasMoreTokens()) {        word.set(itr.nextToken());        context.write(word, one);      }    }  }  public static class IntSumReducer       extends Reducer<Text,IntWritable,Text,IntWritable> {    private IntWritable result = new IntWritable();    public void reduce(Text key, Iterable<IntWritable> values,                       Context context                       ) throws IOException, InterruptedException {      int sum = 0;      for (IntWritable val : values) {        sum += val.get();      }      result.set(sum);      context.write(key, result);    }  }  public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    Job job = Job.getInstance(conf, "word count");    job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    FileInputFormat.addInputPath(job, new Path(args[0]));    FileOutputFormat.setOutputPath(job, new Path(args[1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }}

Save it with the filename WordCount.java.

Now, let's find out if it works.

export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jarhadoop com.sun.tools.javac.Main WordCount.javajar cf wordcount.jar WordCount*.classhadoop jar wordcount.jar WordCount input output 

It looks like there are a lot of logs, but what we care about is the output. Print them out with cat command.

$ hdfs dfs -cat output/part-r-0000*Docker 1Hello 2World 1

Hooray! We did it!

Clean-up

This is simple. Using this command will make your computer's life easier.

docker-compose down

Acknowledgement

This article significantly references this article by Jos Lise. I've added some content on how to develop with VSCode.


Original Link: https://dev.to/txfs19260817/create-a-hadoop-playground-with-docker-desktop-on-windows-in-minutes-10im

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To