Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 2, 2022 12:28 pm GMT

How I build my own calculator that group any data set given in to classes using R.

How I build my own calculator that group any data set given in to classes using R.

I can vividly remember when I was in college, There was one of our lecturer who liked to keep us busy with too much assignments. there was a time we were given 20 different data sets. they are large ones and the question is to group each of them and calculate some necessary statistical value for each.

I must say I was a little lazy boy then, I just don't like repeating something and stressing myself when I already understand the basic. so I thought through the way to make it faster, But I couldn't see any better way. luckily, it comes to my mind that I should be able to build calculator that could do this for me. that was how I got the idea. lol.

How to group large data set into classes and calculate all important statistical value

Firstly let me say if you don't know how to group data set before kindly click here so that my codes doesn't look like magic. But before I showed you the codes let me explain the step taken to arrive at my simple project.

steps taken to create a calculator that group large data set in to classes and also calculate some statistical value showing important workings

Step1: I need a codes to group the data in to classes with any given class width to form a group frequency table.

step2: I need to calculate the class boundary; this is done by subtracting 0.5 from lower limit and adding 0.5 to upper class limit of each class.

step3: I need to write codes that will calculate the class mark; this is the mid point of each class.

Step4: I need codes to find the multiplication of each class mark with respective frequency.

Step5: I need codes to calculate the mean using summation of Fx divided by summation of the frequency

Step6: I need codes to get me the deviation and its squared.

step7: I have to calculate the variance and standard deviation using the formula.

As you can see above the problem I want my calculator to for me was listed above. I will start picking the step one by one.

How to write a codes that group any data set in to classes using the given width.

I will be writing codes in form of function as i want a reusable codes so everything will be inside function so that I can always call the function when needed. Before the codes if you have no knowledge on how to group data before click here to read my article on as there are much things to explain here so I won't explain much on it. take a look at the following codes

Codes>>

reateGroupTable=function(data,classwidth){ minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.MaximumValue=((max(data)%/%classwidth)+1)*classwidthd=MaximumValue+classwidth #to get the last upper class limitlowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the datamytable=data.frame(alldata) # turn the table to two column mytable}

The codes above is just a function and when you run it it gives no output since the function has not been executed. So let us try the following data to test it.

Example.

Construct a group frequency table for the following score of 30students.
24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13

14, 16, 16, 13, 16, 13, 18, 19, 7, 9, 8, 6, 20, 26, 28, 29, 30, 18, 19, 15, 17, 12, 14, 15, 14, 16, 13, 12, 14, 13.

Codes>>>

createGroupTable=function(data,classwidth){ minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.MaximumValue=((max(data)%/%classwidth)+1)*classwidthd=MaximumValue+classwidth #to get the last upper class limitlowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the datamytable=data.frame(alldata) # turn the table to two column mytable} #the functions code end here score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) #now call the functioncreateGroupTable(score,15) 

Result>>

   Var1    Freq1  0 - 9     12 10 - 19   133 20 - 29    84 30 - 39    35 40 - 49    46 50 - 59    17 60 - 69    0


'
If you tried this with different data set and provide your preference class-width you will realize that there is additional class at the last that we never need at all that is a flawless from the codes. So, I need to adjust that by removing the last row.

How to remove the last rows of a table using R.

to remove the last row here is never a problem, all i need to do is to find the index of the last row, then use form a new data without it. like the following:

Codes>>

createGroupTable=function(data,classwidth){ minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.MaximumValue=((max(data)%/%classwidth)+1)*classwidthd=MaximumValue+classwidth #to get the last upper class limitlowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the datamytable=data.frame(alldata) # turn the table to two column lastIndex=length(mytable$Freq)newTable=mytable[-lastIndex,]newTable} #the functions code end here score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) #now call the functioncreateGroupTable(score,10) 

Result>>

  Var1     Freq1  0 - 9     12 10 - 19   133 20 - 29    84 30 - 39    35 40 - 49    46 50 - 59    1

Can you see that we have eliminate the last index. You can call the functions as many times as you want for any large data set, just input the variable name of the data and your class width.
Now I need to deal with other steps. These are not going to take time.

how to create group frequency table with class boundaries using R.

Since we are able to construct class interval with frequency now we need to subtract 0.5 from all lower class and add 0.5 to all upper classes. Study the following codes.

createGroupTable=function(data,classwidth){ minimumValue=(min(data)%/%classwidth)*classwidthMaximumValue=((max(data)%/%classwidth)+1)*classwidthd=MaximumValue+classwidthlowerclass=seq(minimumValue,MaximumValue,classwidth) upperclass=lowerclass+classwidth-1classInterval=paste(lowerclass,'-',upperclass)lowerclassBound=lowerclass-0.5upperclassBound=upperclass+0.5classBoundary=paste(lowerclassBound,'-', upperclassBound)alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval))mytable=data.frame(alldata)mytable$classBound=classBoundarypureTable=mytable[!(mytable$Freq==0),]pureTable} #the functions code end here score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) #now call the functioncreateGroupTable(score,10) 

Result

  Var1     Freq  classBound1   0 - 9    1  -0.5 - 9.52 10 - 19   13  9.5 - 19.53 20 - 29    8 19.5 - 29.54 30 - 39    3 29.5 - 39.55 40 - 49    4 39.5 - 49.56 50 - 59    1 49.5 - 59.5

Now we should include the codes to calculate the classMark(x) and Fx.
See the codes below.

createGroupTable=function(data,classwidth){ minimumValue=(min(data)%/%classwidth)*classwidthMaximumValue=((max(data)%/%classwidth)+1)*classwidthd=MaximumValue+classwidthlowerclass=seq(minimumValue,MaximumValue,classwidth) upperclass=lowerclass+classwidth-1classInterval=paste(lowerclass,'-',upperclass)lowerclassBound=lowerclass-0.5upperclassBound=upperclass+0.5classBoundary=paste(lowerclassBound,'-', upperclassBound)classMark=(lowerclass+upperclass)/2alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval))mytable=data.frame(alldata)Freq=mytable$FreqFx=Freq*classMarkmytable$classBound=classBoundarymytable$classMark(x)=classMarkmytable$Fx=FxpureTable=mytable[!(mytable$Freq==0),]pureTable} #the functions code end here score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) #now call the functioncreateGroupTable(score,10) 

Result>>

     Var1 Freq  classBound classMark    Fx1   0 - 9    1  -0.5 - 9.5       4.5   4.52 10 - 19   13  9.5 - 19.5      14.5 188.53 20 - 29    8 19.5 - 29.5      24.5 196.04 30 - 39    3 29.5 - 39.5      34.5 103.55 40 - 49    4 39.5 - 49.5      44.5 178.06 50 - 59    1 49.5 - 59.5      54.5  54.5

As you can see above we just include two more columns. Now we can go ahead and calculate for the mean using Fx/F i.e the sum of Fx column divided by the sum of frequency column.
To do that we would add few lines of codes.
I shall show you that later in my next article including the standard deviation.
Consider follow me so that you don't miss any of my article.
Happy coding!


Original Link: https://dev.to/maxwizard01/how-i-build-my-own-calculator-that-group-any-data-set-given-in-to-classes-using-r-2529

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To