Implementasi Ekstraksi Informasi Dari Abstrak Jurnal Sains Maritim Menggunakan Metode Ruled Based

Iwan Mahendro; Iman Mujiarto

doi:10.61132/jupiter.v1i6.169

Authors

Iwan Mahendro Unimar AMNI Semarang
Iman Mujiarto Universitas Maritim AMNI Semarang

DOI:

https://doi.org/10.61132/jupiter.v1i6.169

Keywords:

extraction, information, abstract,, method, ruled based

Abstract

Universities certainly have a lot of important documents stored. Documents stored include scientific works in the form of journals. The journal that is kept has an identity so that if needed it will make it easier for people to find it. A library is a place to store scientific work, both in the form of journals and for storing other documents. The problem currently occurring is that there are so many documents or journals stored that it takes a long time for people who want to find information related to journals. The aim of this research is to provide a solution to make it easier for people who want to find information about many journals. The method proposed in this research is to use the information extraction method. Information extraction is the search for structured information such as entities and attributes. The stages carried out start from collecting the dataset which is then carried out preprocessing where this stage takes the form of converting documents with PDF extensions to documents with HTML extension. Apart from that, at this stage data cleaning is also carried out, meaning that at this stage it will be possible to create new paragraphs. Therefore, the new paragraph needs to be removed. Then the next stage is rule-based information extraction based on keywords and rules. The results of this research are that none of the 50 journal abstract documents failed so that the accuracy obtained was 100%. So the information extraction in this research can be used to search for information from journal abstracts.

References

C. C. Aggarwal dan C. Zhai, Ed., Mining Text Data. Boston, MA: Springer US, 2012.

S. Sarawagi, Information extraction. Boston: Now, 2007.

J. Piskorski dan R. Yangarber, “Information Extraction: Past, Present and Future,” dalam Multi-source, Multilingual Information Extraction and Summarization, T. Poibeau, H. Saggion, J. Piskorski, dan R. Yangarber, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, hlm. 23–49.

L. I. Tan, W. S. Phang, K. O. Chin, dan A. Patricia, “Rule-Based Sentiment Analysis for Financial News,” dalam 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon, Okt 2015, hlm. 1601–1606, doi: 10.1109/SMC.2015.283.

S. S. Htay dan K. T. Lynn, “Extracting Product Features and Opinion Words Using Pattern Knowledge in Customer Reviews,” Sci. World J., vol. 2013, hlm. 1–5, 2013, doi: 10.1155/2013/394758.

K. Rosikin, S. Basuki, dan Y. Azhar, “Ekstraksi Informasi Kesehatan Masyarakat Dari Tweet Berbahasa Indonesia Berbasis Klasifikasi Dengan Algoritma Naive Bayes,” J. Repos., vol. 2, no. 2, hlm. 193, Feb 2020, doi: 10.22219/repositor.v2i2.237.

A. Konys, “Towards Knowledge Handling in Ontology-Based Information Extraction Systems,” Procedia Comput. Sci., vol. 126, hlm. 2208–2218, 2018, doi: 10.1016/j. procs. 2018.07.228.

X. Xie, Y. Fu, H. Jin, Y. Zhao, dan W. Cao, “A novel text mining approach for scholar information extraction from web content in Chinese,” Future Gener. Comput. Syst., vol. 111, hlm. 859–872, Okt 2020, doi: 10.1016/j.future.2019.08.033.

D. Ji, P. Tao, H. Fei, dan Y. Ren, “An end-to-end joint model for evidence information extraction from court record document,” Inf. Process. Manag., vol. 57, no. 6, hlm. 102305, Nov 2020, doi: 10.1016/j.ipm.2020.102305.