The Data Mining Forum                             open-source data mining software data science journal data mining conferences machine learning in software engineering MLISE 2021 utility mining workshop at ICDM 2021
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
数据库的转换
Posted by: qinqinzhou
Date: November 28, 2021 05:34AM

您好:
我想将图片1文本型的案例数据库转换成图2二进制类型的事务数据库,在spmf预处理工具中没找到合适的,请问您有什么意见,非常感谢。

1.
2.

Options: ReplyQuote
Re: 数据库的转换
Posted by: qinqinzhou
Date: November 28, 2021 04:06PM

图片2:

Options: ReplyQuote
Re: 数据库的转换
Date: November 28, 2021 09:16PM

下午好!

Thanks for using SPMF.

This data is suitable for SPMF. However, as you have noticed the data must be transformed to be used by SPMF.

To transform the data, there is some tools available in SPMF. However, it is impossible to have a tool for all possible types of data. If the type of data is not supported, you may write a simple program or script to convert the data by yourself.

I see that your data is like a table. If your data is in an Excel file, you may first export it to a CSV file.

Then, after that you could modify the format by hand or using your own script for example...


What format you should use? It depends on what you want to do.

If you want to apply a frequent itemset mining algorithm, you could encode the data like that:


@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

Here this is the format required by the Apriori algorithm (see documentation: http://www.philippe-fournier-viger.com/spmf/Apriori.php )

This format means that the first line contains 1, 3 and 4, which are Apple, Tomato and Milk.

The second line contains the items 2, 3 and 5, which means Orange, Tomato and Bread.

For your data, the meaning of 1,2,3,4,5... would be different. It would be the Chinese terms.


Hope that this give you some helps!

Best regards,

Philippe



Edited 1 time(s). Last edit at 11/28/2021 09:17PM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.