Path: Top » Journal » Jurnal_Keuangan_dan_Perbankan » 2008
Klasifikasi dokumen web berdasarkan frase kunci pada bagian informatif
Journal from JIPTUNMERPP / 2011-12-22 01:44:07
Oleh : Amak Yunus E.P ; Arif Djunaidy , Diploma 3 of Finance and Banking Merdeka University Malang (jurkubank@yahoo.com)
Dibuat : 2008-08-01, dengan file
Keyword : Dokumen web, Bagian informatif
Along with period development progress, management of classifying document is requires to improve web performance. In accordance with Pierre (2000), a number of web pages available are about one quintillion with addition of about 1.5 millions pages a day. With changer occurred above, so many web pages will be varied in term of content, information and quality. If the data organization is not good so it will be difficult for user to search for information in accordance with his or her desirability. So, a means of classifying efficiently must be done in order to improve information quality required. And it needs to be suggested that in a web page, there are parts that are actually unimportant to be looked for by users such as advertisement, logo, copyright, etc. if the classification is done directly without taking merely important part, it will cause inaccuracy in classifying web document. From the problem above, so the research is arranged relating with how a means of classifying web document based on key phrase on informative part form the document. This research is arranged through some stages. The first stage is taking informative part form a web document by using Feature Extractor method, whereas the second is doing key phrase extraction by using tf-idf method. The last stage is classifying the web page document by using Bayesian method that is known as one of classifying sufficiently good text. In experimental conducted, Feature Extractor method proposed by Zhang apparently takes informative part froe a web page, and it can be integrated into a program that can classify a web page based on key phrase from informative part of web page. By using holdout method in doing the experiment, integrating third module is that Feature Extractor, TFIDF, and Naïve Bayesian gives the sufficiently convenient result. Training data of 25% from total data gives classifying accuracy of 79%. Whereas training data of 70% gives accuracy of 89%.
Along with period development progress, management of classifying document is requires to improve web performance. In accordance with Pierre (2000), a number of web pages available are about one quintillion with addition of about 1.5 millions pages a day. With changer occurred above, so many web pages will be varied in term of content, information and quality. If the data organization is not good so it will be difficult for user to search for information in accordance with his or her desirability. So, a means of classifying efficiently must be done in order to improve information quality required. And it needs to be suggested that in a web page, there are parts that are actually unimportant to be looked for by users such as advertisement, logo, copyright, etc. if the classification is done directly without taking merely important part, it will cause inaccuracy in classifying web document. From the problem above, so the research is arranged relating with how a means of classifying web document based on key phrase on informative part form the document. This research is arranged through some stages. The first stage is taking informative part form a web document by using Feature Extractor method, whereas the second is doing key phrase extraction by using tf-idf method. The last stage is classifying the web page document by using Bayesian method that is known as one of classifying sufficiently good text. In experimental conducted, Feature Extractor method proposed by Zhang apparently takes informative part froe a web page, and it can be integrated into a program that can classify a web page based on key phrase from informative part of web page. By using holdout method in doing the experiment, integrating third module is that Feature Extractor, TFIDF, and Naïve Bayesian gives the sufficiently convenient result. Training data of 25% from total data gives classifying accuracy of 79%. Whereas training data of 70% gives accuracy of 89%.
Beri Komentar ?#(0) | Bookmark
Properti | Nilai Properti |
---|---|
ID Publisher | JIPTUNMERPP |
Organisasi | D |
Nama Kontak | Dra. Wiwik Supriyanti, SS |
Alamat | Jl. Terusan Halimun 11 B |
Kota | Malang |
Daerah | Jawa Timur |
Negara | Indonesia |
Telepon | 0341-563504 |
Fax | 0341-563504 |
E-mail Administrator | perpus@unmer.ac.id |
E-mail CKO | wsupriyanti@yahoo.com |
Print ...
Kontributor...
- Editor: Wiwik Supriyanti, Dra. SS.