The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
data mining with dataset
Date: September 08, 2017 09:43PM

i have a data set with 1500 tuples and 8 attributes. i want to calculate the best tuples out of 1500 according to the 8 attributes. Any one tell me what will be the exact procedure.
mtech.sureshkumar@gmail.com

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 12:46AM

What do you mean by "best"?

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 02:59AM

i have website data containing some parameters interms of numerical values like some agencies evaluated the performance of website and they gave some value. i will gibe sample data. please help me in this regard

website index alexa DA PA PR Domainage Google FB Twitter
xyz 2256 2600 156 0 86 iyear7days 256 0 145

The above data is sample of one web site example. All the tuples are similar to that. In that i want to calculate the all the parameters to pict the best website. either i have to use clustering, classification or any others.

regards

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 06:09AM

Still, "the best website" is something very subjective. The best website for me may not be the "best website for you". Typically, when you use a search engine like Google, the "best website" is the one that answer your search query, which represents what you are looking for at this exact moment. Answering your query can be done by analyzing the keywords in your search query and comparing them to the webpage to then show the most relevant websites (this is called "information retrieval).

My meaning is that "the best website" depends on the user and what he is looking for rather than just on some characteristics of the website.

If you don't take this into account, then how do you define best?



Edited 3 time(s). Last edit at 09/09/2017 06:11AM by webmasterphilfv.

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 06:22AM

sir
according to the data some of the ranking agencies were ranked the websites based on their performance like website content, likes etc., with that they have given some ranks to the particular websites the agencies are Alexa, Google, Facebook, Twitter, MozDA, Pageauthority(PA), Domainauthority(DA) etc., All the values are in numerical. as you said as per one agency it was good and not good for other agency. we have to calculate the best by calculating all the parameters.

i think you got my point

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 07:09AM

So you have 8 numerical scores given by 8 agencies and each of these scores are computed based on different criteria. Some of these criteria may be more important than others for some users of your system. Let's say that you design a system that compute the "best" website based on these numerical scores, then how will you validate that your system has truly shown you the best websites?

What I am trying to say is that you need to make your goal clear about what is "best". "best" in terms of user satisfaction when users use your system? If you cannot define your goal clearly (what is a best website), then it is hard to suggest which method to use to find the best websites using your numeric data. Moreover, if you don't have a way of checking if what your system produces is really the best or not, then it will be hard to make a system that find the "best" websites.

If you don't have a clear goal, you could just make the sum of all these numerical values in each tuple and take the website with the highest value as the best website. But it would not make much sense. But since you do not define how to evaluate if a website is the best, how can you define a method?
You always need to start with a clear goal, and a way of evaluating the solution, to then find a solution to a problem. Here you have a goal (find the best website), but how you want to evaluate the result does not seems to be clear.



Edited 1 time(s). Last edit at 09/09/2017 07:12AM by webmasterphilfv.

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 09:27AM

i will explain clearly

Alexa - starts from 0 to large value( mentioned smaller is better)
DA - ranked from 1 - 100 ( one is the best one)
PA - ranked from 1-100 (scaling) ( one is the best one)
PR - ranked from 1 to 10( scaling)(one is the best one)
Domain age - given data in days( older is best one)
Google - starts from 0 to large value( more hits will give large value and it is better)
now you suggest me what to do with this type of data

Options: ReplyQuote
Re: data mining with dataset
Date: September 09, 2017 07:03PM

A first thing that you could do is to normalize these values since the minimum and maximum is different for each attribute (some range from 0 to 10 and others from 0 to 100. So putting all these attributes to values between 0 to 100 or 0 to 1 would be a good first step.

Then, after that it depends on your goal and how you will evaluate the result.

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 **     **  **      **  ********   **     **  ******** 
  **   **   **  **  **  **     **  ***   ***  **       
   ** **    **  **  **  **     **  **** ****  **       
    ***     **  **  **  ********   ** *** **  ******   
   ** **    **  **  **  **         **     **  **       
  **   **   **  **  **  **         **     **  **       
 **     **   ***  ***   **         **     **  ******** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.