The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
High utility mining
Posted by: ASHISH
Date: September 02, 2017 05:57AM

Hello Sir,

I'm interested to know how we calculate the high utility patterns and rules from the data base.

The database is say

T1: A (10) B(20) C(10) D (15)

T2: A(25) B (10) C (15) D (12)

T3: A(15) B (15) C(20) D(18)

T4: A(20) B (25) C (25) D(27)

From the above transaction i want to calculate the high utility itemset and also the rules.

Thanks sir
Ashish

Options: ReplyQuote
Re: High utility mining
Date: September 02, 2017 06:32AM

In high utility itemset mining, there are usually two types of numeric values : purchased quantities and unit profit values. For example, a customer can buy 3 apples, and each apple yield 5 $ profit.

In your example database, you have a single table. So I assume that each item is associate with some amount of money for example. So your transaction:

T1: A (10) B(20) C(10) D (15)

means that item A was sold and generated a 10 $ profit, B was sold with a 20 $ profit etc.

How we calculate the utility? If you want to calculate the utility of AD, you would make the sum of the profit of AD in all transactions where A and D appear together. So, here for example, A and D appears in transactions T1, T2, T3 and T4.

Thus the utility of AD is 10 + 15 (the profit of AD in T1) + 25 + 12 (the profit of AD in T2) + 15 + 18 (the profit of AD in T3) + 20 + 27 (the profit of AD in T4).

This is the main idea. Actually, I wrote a blog post that explain that with another example, and some pictures:

http://data-mining.philippe-fournier-viger.com/introduction-high-utility-itemset-mining/

You can read it.

For the other part of your question, it depends what kind of rules you want to find. If you want to find sequential rules (rules with time), you can check this paper:

Zida, S., Fournier-Viger, P., Wu, C.-W., Lin, J. C. W., Tseng, V.S., (2015). Efficient Mining of High Utility Sequential Rules. Proc. 11th Intern. Conference on Machine Learning and Data Mining (MLDM 2015). Springer, LNAI 9166, pp. 157-171.

Options: ReplyQuote
Re: High utility mining
Posted by: Ashish Gupta
Date: September 02, 2017 11:26PM

Thanks sir for replying back.

I'm unable to understand

1) What is the purpose of the rule or what we can achieve by calculating the rules.

2) Sir can we calculate the rules from the above database as I've read the paper still confused about calculating the rules from the database.

Thanks Sir
Ashish Gupta

Options: ReplyQuote
Re: High utility mining
Posted by: Ashish Gupta
Date: September 03, 2017 04:12AM

Sir

I'm calculating the high utility of A in above database.

1) The utility of A in T1 is 10 + utility of A in T2 is 25 + utility of A in T3 is
15 and utility of a in T4 is 20 that is 10+25+15+20= 70

2) Similarly the utility of B in transaction T1,T2,T3,T4 is coming as 20+10+15+25
is : 70

3) Similarly the utility of C in transaction T1,T2,T3,T4 is coming as
10+15+20+25 is : 70

4) Similarly the utility of C in transaction T1,T2,T3,T4 is coming as
15+12+18+27 is : 72

the high utility for above database is

A= 70 B=70 C=70 D=72

Now in the paper Efficient Mining of High-Utility Sequential Rules the Table 1

A Sequence Database:
SID Sequences

s1 {(a, 1)(b, 2)}(c, 2)(f, 3)(g, 2)(e, 1)
s2 {(a, 1)(d, 3)}(c, 4),(b, 2), {(e, 1)(g, 2)}
s3 {(a, 1)(b, 2)(f, 3)(e, 1)
s4 {(a, 3)(b, 2)(c, 1)}{(f, 1)(g, 1)}

On the right side the external utility values is calculated as

Table 2

Item a b c d e f g
Profit 1 2 5 4 1 3 1

now if we calculate the utility of a is s1 is 1 + utility of a in s2 is 1 + utility of a in s3 is 1 and utility of a in s4 is 3 coming out to be a= 6 while in table 2 a is coming out to 1.

similarly for d it is appearing in transaction s2 which is 3 now the table 2 shows the value as 4.

Not able to understand how to calculate the item a,b,c,d individually for all database.


In second question in forum as we need to calculate the sequential rules with time from the database as shown in forum 1.

Thanks sir

Options: ReplyQuote
Re: High utility mining
Date: September 03, 2017 05:52AM

I think you are confusing some definitions in the paper. The table 2 does not give the utility of items.

In high utility pattern mining, there are three concepts that you should not confuse:
- the internal utility (or purchase quantity - the number of units of an item that were purchased in a transaction)
- the external utility (or unit profit - amount of profit generated by the sale of one unit of an item
- the utility, which is the profit generated by an itemset. We find the pattern with a high utility, that is that yield a lot of money.

In Table 1, consider this sequence:

s1 {(a, 1)(b, 2)}(c, 2)(f, 3)(g, 2)(e, 1)

It means that:
(a, 1) the customer has bought 1 unit of item "a" (for example 1 apple)
(b, 2) the customer has bought 2 units of item "b" (for example, 2 breads)
...

These numbers are purchase quantities (also called internal utility).

Now, the numbers in the Table 2 are NOT the utility. They are external utility values. It means how much money the store can earn for the sale of 1 unit of each item. So if you look at that table:

Item a b c d e f g
Profit 1 2 5 4 1 3 1

It means that if you sell 1 unit of "a", you earn 1 $ of profit.
It means that if you sell 1 unit of "b", you earn 2 $ of profit.

...


Now, if you want to calculate the profit of item "b" in the first sequence:

s1 {(a, 1)(b, 2)}(c, 2)(f, 3)(g, 2)(e, 1)

you need to multiply the purchase quantity (2) by the external utility of "b" in Table 2. Thus the utility of "b" in sequence "s1" is 2 x 2 = 4. In other words, if the customer "s1" buys two breads and each of generate a 2$ profit, the total profit (utility) of bread in "s1" is 2 x 2 $ = 4$

I think you should take the time to read the definition carefully. There are examples for each definition in the paper.

Options: ReplyQuote
Re: High utility mining
Posted by: ashish gupta
Date: September 03, 2017 06:54AM

Sir

I understand now. To be be precise we need to have a internal utility and external utility table to calculate the overall profit value of an item.

Now sir please do me a last favor

1) What is the purpose of the rule or what we can achieve by calculating the rules.

2) Sir can we calculate the rules from the above database as I've read the paper still confused about calculating the rules from the database.

Thanks Sir
Ashish Gupta

Options: ReplyQuote
Re: High utility mining
Date: September 03, 2017 07:19AM

Hi

> I understand now. To be be precise we need to have a internal utility and external utility table to calculate the overall profit value of an item.

Yes. Exactly.

> 1) What is the purpose of the rule or what we can achieve by calculating the rules.

In general, the purpose of discovering patterns in databases is to find some patterns that can be useful to understand the data or help us to take some decision. In that paper, we find high utility sequential rules because we want to find patterns that represent the behavior of customers where a lot of profit was made by the company. The profits is obviously important for a business. FOr example, if you find a rule:

BookA --> BookB with a utility of 77000 $ and a confidence of 90 %

It means that this rule generate a lot of money (77000$) and that 90 % of the time, when someone buys BookA, he will later buy BookB.

So if you are a business manager, you could use this knowledge to gain more money. For example, you could setup a promotion in your store such that if you buy BookA and then BookB, you will get a discount. This could help to increase the profit.

Another possible use is for prediction or recommendation. For example, if I buy BookA on Amazon, it would be possible to use the above rule to recommend to buy BookB because 90 % of people who buy BookA usually buy BookB.

So this is the main idea about why we want to find the high utility rules.

By the way, in this example, I talk about profit. But instead of profit it could be something else, like the time spent on a website, etc.

> 2) Sir can we calculate the rules from the above database as I've read the paper still confused about calculating the rules from the database.

Yes, the user must set some parameters called the minutil and minconfidence threshold. Then using the database in the Tables of the article you can obtain some rules.

In the paper, for the Table 1 and Table 2, minutil = 40 and
minconf = 0.65, you will obtain the rules shown in Table 3.

Options: ReplyQuote
Re: High utility mining
Posted by: ashish gupta
Date: September 03, 2017 09:41AM

Sir,

Thanks a lot sir, for your kind help. A very well demonstrated answer sir.

Than you very much sir

Ashish

Options: ReplyQuote
Re: High utility mining
Posted by: manish
Date: September 04, 2017 06:46AM

Hi,

High utility pattern mining is used to find maximum utility of itemset.

Can High utility pattern mining can be used to calculate the lift,leverage, bond from database to find the interesting items from database

Thanks

Options: ReplyQuote
Re: High utility mining
Date: September 04, 2017 07:05AM

Hello,

There are a lot of patterns in a database. To select the patterns, we need to use some measures to decide whether the patterns are interesting or not.

In high utility itemset mining, the interestingness measure is the utility. The assumption is that if an itemset has a high utility (makes a lot of money), then it is interesting for the user.

Now, you could always combine several measures to select patterns. For example, in a paper that I wrote, I have designed an algorithm that considers both the "bond" and the "utility" measure to find patterns with a high utility and a high bond.

Fournier-Viger, P., Lin, C. W., Dinh, T., Le, H. B. (2016). Mining Correlated High-Utility Itemsets Using the Bond Measure. Proc. 11 th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2016), Springer LNAI, pp.53-65.

Or you could combine the utility with the lift or any other measures.

It depends on what you want to do.

Best,

Options: ReplyQuote
Re: High utility mining
Posted by: manish
Date: September 04, 2017 07:27AM

Hi,

In a database what interesting things can be found from the database to find the items interesting in data mining.

Like from items in database High utility, rules, leverage, bond what more can be found as on date.


Thanks

Options: ReplyQuote
Re: High utility mining
Posted by: Phil
Date: September 04, 2017 10:25PM

I am not sure what you mean. Do you mean what other kinds of patterns we can find in a database?

Options: ReplyQuote
Re: High utility mining
Posted by: gaurav
Date: September 04, 2017 09:23PM

Hello Good day,

I'm referring your PHM paper i'm unable to calculate measuring the periodicity in high utility mining.




A transaction database External utility values
Item a b c d e
Unit profit 5 2 1 2 3

TID Transaction
T1 (a, 1),(c, 1),
T2 (e, 1)
T3 (a, 1),(b, 5),(c, 1),(d, 3),(e, 1)
T4 (b, 4),(c, 3),(d, 3),(e, 1)
T5 (a, 1),(c, 1),(d, 1)
T6 (a, 2),(c, 6),(e, 2)
T7 (b, 2),(c, 2),(e, 1)


Example 7. The periods of itemsets {a, c} and {e} are respectively ps({a, c}) =
{1, 2, 2, 1, 1} and ps({e}) = {2, 1, 1, 2, 1, 0}. The average periodicities of these itemsets are respectively avgper({a, c}) = 1.4 and avgper({e}) = 1.16.

How the periodicity is calculated in example 7.

Options: ReplyQuote
Re: High utility mining
Posted by: Phil
Date: September 04, 2017 10:24PM

Hello,

I will explain for {a,c}.

The itemset AC appears in transactions T1, T3, T5 and T6.

Based on this information, we need to calculate the periods of the itemset {a,c}, that is the number of transactions each consecutive occurrences of {a,c} in the database. Note that the first and last periods are special cases.

The result is : 1,2,2,1,1

Now I will explain how this result is obtained. It is calculated based on definition 6 in the paper.

To calculate the periods of {A,C} we take each pairs of consecutive transactions where {a,c} appears in the database and subtract their IDs:

T3 - T1 = 3 - 1 = 2
T5 - T3 = 5 - 3 = 2
T6 - T5 = 6 - 5 = 1

Moreover, we add the following special case for the first occurrence of {a,c}:

T1 - T0 = 1 - 0 = 1 (we assume that a transaction T0 would exist)

And, we add the following special case for the last occurence of {a,c}:

T7 - T6 = 7 - 6 = 1 (we use the id of the last transaction T7)

If we put all this together:

T1 - T0 = 1 - 0 = 1
T3 - T1 = 3 - 1 = 2
T5 - T3 = 5 - 3 = 2
T6 - T5 = 6 - 5 = 1
T7 - T6 = 7 - 6 = 1

This is how the periods of {a,c} are calculated as 1, 2, 2, 1, 1.

More formally, this is definition 6 in the paper.

Now if you want to calculate the average periodicity:

Average ( 1, 2, 2, 1, 1 ) = (1+ 2 + 2 + 1+ 1 ) / 5 = 1.4

Options: ReplyQuote
Re: High utility mining
Posted by: ashish
Date: September 06, 2017 06:08AM

Hello sir,

From the question above what more patterns can be found from database, from sequential pattern , from association, from clustering, from time series etc..... Actually we need to know how many kind of patterns we can found as a whole in a database.

Thanks sir

Ashish

Options: ReplyQuote
Re: High utility mining
Date: September 06, 2017 06:28AM

There is a lot of kinds of patterns that you can find in a database. It would be too long to list all of them. For example:
- frequent itemsets
- closed frequent itemsets
- maximal frequent itemsets
- rare itemsets
- perfectly rare itemsets
- periodic itemsets
- association rules
- negative association rules
- generator itemsets
....
etc.

Making a list of all kinds of patterns that you can find in such database would be too long.

Options: ReplyQuote
Re: High utility mining
Posted by: ashish gupta
Date: September 11, 2017 05:48AM

sir,

TID Transaction A transaction database External utility values
Item a b c d e
Unit profit 5 2 1 2 3
T1 (a, 1),(c, 1),
T2 (e, 1)
T3 (a, 1),(b, 5),(c, 1),(d, 3),(e, 1)
T4 (b, 4),(c, 3),(d, 3),(e, 1)
T5 (a, 1),(c, 1),(d, 1)
T6 (a, 2),(c, 6),(e, 2)
T7 (b, 2),(c, 2),(e, 1)

For 1 itemset let say for a is appearing in ti, t3, t5,t6 so total utility of a is 5x5 = 25 is the total utility of a.

Now for 2 itemset say ab it is appearing in transaction t3 that is 6 x external utility ab that is 10 so it is coming out to be 60.

Is this is a right answer.

Thanks sir
ashish

Options: ReplyQuote
Re: High utility mining
Date: September 11, 2017 07:27AM

Yes, in high utility itemset mining, that answer is correct.

Options: ReplyQuote
Re: High utility mining
Posted by: Gabriela Basidas
Date: September 30, 2017 12:33PM

Hi all!!
Im new here, and i would like to ask ur help.. i want to know how much it take implementing a Data Mining System like.. Oracle DM or IBM SPSS?
I hope u can help me.. thank u!!!

Options: ReplyQuote
Re: High utility mining
Date: September 30, 2017 05:27PM

You mean how complicated?

It can be quite complicated since data mining software typically offers many algorithms and features. For example, I am the founder of the SPMF data mining software. It took a few years of work to develop that software. But if someone has more programmers, it could be done faster.

But in real life, you don't need to implement all algorithms. For example, if you want to do something very specific to solve a specific problem you perhaps just need 1 algorithm. In that case you can maybe implement it very quickly in 1 day or 1 week for example. So, some data mining software are very complicated but we don't need all the features as users.

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 **     **  ********  **      **  **     **  **     ** 
 **     **     **     **  **  **  **     **   **   **  
 **     **     **     **  **  **  **     **    ** **   
 **     **     **     **  **  **  *********     ***    
 **     **     **     **  **  **  **     **    ** **   
 **     **     **     **  **  **  **     **   **   **  
  *******      **      ***  ***   **     **  **     ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.