An Interest In:
Web News this Week
- April 19, 2024
- April 18, 2024
- April 17, 2024
- April 16, 2024
- April 15, 2024
- April 14, 2024
- April 13, 2024
How working/install Pig with Notebooks?
Basic commands to work with Pig in Notebooks
Related content
You can find post related in:
You can find repo related in:
You can connect with me in:
Resume
I will install Hadoop with Pig program and will use a library of Python to write a job that answer the question, how many row exists by each rating?
First I install Hadoop using same commands that I have used before but without put a number of step.
Install Hadoop
I use following command but you can change to get current last version:
!wget https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
You would can get other version if you need in: https://downloads.apache.org/hadoop/common/ and later replace it in the before command.
Unzip and copy
I use following command:
!tar -xzvf hadoop-3.3.4.tar.gz && cp -r hadoop-3.3.4/ /usr/local/
Set up Hadoop's Java
I use following command:
#To find the default Java path and add export in hadoop-env.shJAVA_HOME = !readlink -f /usr/bin/java | sed "s:bin/java::"java_home_text = JAVA_HOME[0]java_home_text_command = f"$ {JAVA_HOME[0]} "!echo export JAVA_HOME=$java_home_text >>/usr/local/hadoop-3.3.4/etc/hadoop/hadoop-env.sh
Set Hadoop home variables
I use following command:
# Set environment variablesimport osos.environ['HADOOP_HOME']="/usr/local/hadoop-3.3.4"os.environ['JAVA_HOME']=java_home_text
1st - Install Pig
I use following command but you can change to get current last version:
!wget https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz
You would can get other version if you need in: https://downloads.apache.org/pig/ and later replace it in the before command.
2nd - Unzip and copy
I use following command:
!tar -xzvf pig-0.17.0.tar.gz
3rd - Set Pig home variables
I use following command:
# Set environment variablesimport osos.environ['PIG_HOME']="/content/pig-0.17.0"os.environ['PIG_CLASSPATH']="/usr/local/hadoop-3.3.1/conf"os.environ["PATH"] += os.pathsep + "/content/pig-0.17.0/bin"
We can validate installation with command:
!pig -version
4th - Create a folder with HDFS
I use following command:
!/usr/local/hadoop-3.3.4/bin/hadoop fs -mkdir file:///content/data_pig
4.1 - Remove folder with HDFS
Maybe, later you need remove it. To do that you must apply following command:
!/usr/local/hadoop-3.3.4/bin/hadoop fs -rm -r file:///content/data_pig
5th - Getting a dataset to anlyze with Pig
I use a dataset from grouplens. You can get other in:
http://files.grouplens.org/datasets/
This time I use movieslens and you can download it using:
!wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
To use data extract files. I extract files in path later of -d in command:
!unzip "/content/ml-100k.zip" -d "file:///content/data_pig"
For list them:
!/usr/local/hadoop-3.3.4/bin/hadoop fs -ls /content/data_pig/ml-100k
6th - Creating process to use Pig with Pig Syntax
To create job in Pig, you must see structure of dataset to configure jobs.
In this case we print dataset with following command:
!head /content/data_pig/ml-100k/u.data
I can get following information of dataset:
- First column reference to userID.
- Second column reference to movieID.
- Third column reference to rating.
- Fourth column reference to timestamp.
# Create pig script%%writefile id.pig/* id.pig */student = LOAD 'file:///content/data_pig/ml-100k/u.data' USING PigStorage(' ') as (userId:int, movieId:int, rating:int, timestamp:int);student_order = ORDER student BY rating DESC;Dump student_order;
7th - Running the process
Here we run the process specifing some parameters:
- Pig file program is
id.pig
- Dataset is in
file:///content/data_pig/ml-100k/u.data
When run process, maybe take a few minutes...
You can run script with:
!pig -x local id.pig
But we run script and save results in a file .txt:
!pig -x local id.pig > results.txt
8th - Advancing in the logic of the scripts
Now we will advance in logic of the script to get answer to next questions:
- What are the oldest 5 star movies?
- What are the worst movies?
8.1 - Find oldest 5 star movies start
%%writefile fiveStarMovies.pigratings = LOAD 'file:///content/data_pig/ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime:int);metadata = LOAD 'file:///content/data_pig/ml-100k/u.item' USING PigStorage('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray, videoRealese:chararray, imdblink:chararray);nameLookup = FOREACH metadata GENERATE movieID, movieTitle, ToUnixTime(ToDate(releaseDate, 'dd-MMM-yyyy')) AS releaseTime;ratingsByMovie = GROUP ratings BY movieID;avgRatings = FOREACH ratingsByMovie GENERATE group as movieID, AVG(ratings.rating) as avgRating;fiveStarMovies = FILTER avgRatings BY avgRating > 4.0;fiveStarsWithData = JOIN fiveStarMovies BY movieID, nameLookup BY movieID;oldestFiveStarMovies = ORDER fiveStarsWithData BY nameLookup::releaseTime;DUMP oldestFiveStarMovies;
Run script and save results in a file .txt:
!pig -x local fiveStarMovies.pig > fiveStarMovies.txt
8.2 - Find most rated bad movies
%%writefile BadPopularMovies.pigratings = LOAD 'file:///content/data_pig/ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime:int);metadata = LOAD 'file:///content/data_pig/ml-100k/u.item' USING PigStorage('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray, videoRealese:chararray, imdblink:chararray);nameLookup = FOREACH metadata GENERATE movieID, movieTitle;groupedRating = GROUP ratings by movieID;avgRatings = FOREACH groupedRating GENERATE group as movieID, AVG(ratings.rating) as avgRating, COUNT(ratings.rating) AS numRatings; badMovies = FILTER avgRatings BY avgRating < 2.0;namedBadMovies = JOIN badMovies BY movieID, nameLookup BY movieID;results = FOREACH namedBadMovies GENERATE nameLookup::movieTitle as movieName, badMovies::avgRating as avgRating, badMovies::numRatings as numRatings;finalResults = ORDER results BY numRatings DESC;DUMP finalResults;
Run script and save results in a file .txt:
!pig -x local BadPopularMovies.pig > BadPopularMovies.txt
9th - Say thanks, give like and share if this has been of help/interest
Original Link: https://dev.to/xlmriosx/how-workinginstall-pig-with-notebooks-54km
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To