Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
May 19, 2022 08:56 am GMT

CUSTOM SWAHILI NAMED ENTITY RECOGNITION USING SPACY

Named Entity Recognitionas a potential "game-changer" in most businesses, has helped many business operations around the world by addressing complex challenges, since defining boilerplate textual data and extracting even standard information from a big corpus of words can be a difficult and error-prone task.

Financial professionals, business leaders, and innovatorsare increasingly turning toartificial intelligence (AI) technologiesto help spend less time discovering data and more time acting on insights from the data to improve the future of their businesses. This is why "Named Entity Recognition" (NER) as the best tool of all the time in the Artificial intelligence era came into play.

This article speaks about NER as highly leveraged byNeurotech Africa, a leading startup in Africa focused on creatingpowerfully Artificial Intelligence and NLP algorithmsto automate African business by providingsarufi solutions. I will explain common uses cases and demonstrate how to create a customSwahili named entity recognitionmodel using thespaCylibrary.

Meaning: Named Entity Recognition

Named Entity Recognition(NER) is the technique that automatically identifies the important and usefullynamed entitiesthat have been shown, discussed, or mentioned in a certain unstructured text document and classifies them into pre-defined categories such as person names, organization, location, monetary values andso on. Consider the below image for more understanding.

https://blog.neurotech.africa/content/images/2022/03/1_OZaHa-z7A4Xny3dN1qbsQg.png

Named Entity Recognition is the first help towards information retrieval tasks, it is also known as entity chunking, entity identification, or entity extraction and has been used in many fields such asNatural Language Processing(NLP)andMachine Learning.

Myth: Named Entity Recognition isn't the future or important in digital businesses.

The power of Named Entity Recognition, in my option comes in the ease with which different basic models can be customized or even built from scratch to extract specificbusiness's informationfrom a variety of data sources in certain companies, resulting inhigh commercial and business values andhence is of more important and mostly future of digital business

Examine how NER can be used to marry with business use cases.

https://blog.neurotech.africa/content/images/2022/04/NER_auto_x2.jpg

Saurbh image

Use Cases: Relevance of Named Entity Recognition in Businesses' operations

The most successful businesses operations rely on the customers, withArtificial intelligence-powered Named Entity Recognition toolscan give up Africa and the whole world possibilities for driving economic interest in most business operations through user satisfaction. Here I will showcase some of the usages of Named Entity Recognition in business operations.

Automating and Simplifying Customer Support

NER can be used to recognize useful entities in customer complaints and feedback so that they can be categorized to the proper department in charge of the recognized product. This saves time, cost, and faster customer caring and feedback handling in business, hence resulting in more business values. A typical example,Neurotechprovidesentity recognition APIsthat can be integrated into business to automate customer handling process.

Powering Recommendation Engine Algorithms

Recommendation systems govern how wefind fresh stuff and ideas in an interconnected world. Named Entity Recognition may be used to create algorithms that automatically filter relevant information we might be interested in and assist us to uncover similar and previously undiscovered relevant stuff based on our prior behavior. This increases customer engagement on products and brings more business values.

Effective and Efficient optimization of search engine algorithms

A search engine's algorithm is a collection of rules that determines how listings are ranked in response to a search query. Instead of examining the millions of articles and websites online for an entered query, a more efficient approach to design a search engine algorithm would be to run aNERmodel on the articles once and store the entities associated with them permanently. This speeds up a process and increases the business value.

Implementation: Creating a Custom Swahili Named Entity Recognition Model using Spacy.

Hope you now understandNamed Entity Recognition,itsimportance, and its usage in business operations, let's dive into our topic and see how to create a simple named entity recognition model based on the Swahili language using spaCy. But wait !, I see you wondering, what Spacy is? right!

Meaning: spaCy

Simply put,spaCyis a Python-based open-source framework that doessophisticated natural language processing. It is intended for production usage and aids with the development of applications that process and "understand" massive amounts of text. check it outhere,In spaCy, Named Entity Recognition is done by thepipelinecomponentner,it is easy to implement, shortly I can say, spaCy is like your NumPy in data science.

Now, Let's get started,

Using the pre-built-in NER spaCy model

Here we first explore the trained model calledxx_ent_wiki_sm,this is a multilingual model trained to understand different languages. This is due to some languages including Swahili does not have a specific spaCy NER language model. This solution is made on spaCyversion 3.2.1,as the latest version at the time of writing this article

Let's start byinstalling the librariesto be used, code below shows how to install spaCy and download the Multi-language model

! pip install -U spacy   #install spacy and upgrade to latest version! python -m spacy download xx_ent_wiki_sm #download the multi language model! python -m spacy info #checking the info about the spacy installed

Copy

https://blog.neurotech.africa/content/images/2022/03/image.png

Importing the necessary librariesin the project

import spacyimport xx_ent_wiki_sm #multi language modelfrom tqdm import tqdm #making loop show nice progress barfrom spacy.tokens import DocBin # effeciently used to hold serialized annotationsfrom spacy import displacy #highlighting the discovered named entities from text documentimport warningswarnings.filterwarnings("ignore") #filter warningsmodel=xx_ent_wiki_sm.load() #loading the multi language model

Copy

Testing the trained NER Modelloaded as shown above by giving it text data. Consider below code

text_swahili="Mimi ni Innocent Charles , mjuzi wa akili bandia na sayansi ya data kutoka kampuni ya IPFsoftwares" #text data in swahili languagepreds=model(text_swahili) #made predictions of the named entities that might be in text givenfor preds_show in preds.ents:  print(preds_show.text,preds_show.label_) #print named entitie and respective labelsdisplacy.render(preds,style="ent",jupyter=True) #displaying it for proper visualization

Copy

Magic !, just simplelike that the model trained in spaCy has done well in recognizing the named entities as shown below image.

https://blog.neurotech.africa/content/images/2022/03/image-4.png

Let's explore the pre-defined named entitiesas recognized above by the trained spaCy NER model. Consider below code

print("PER Meaning:",spacy.explain("PER"))   #meaning of PERprint("ORG Meaning:",spacy.explain("ORG"))   #meaning of ORGprint("MISC Meaning:",spacy.explain("MISC")) #meaning of MISCprint("LOC Meaning:",spacy.explain("LOC"))   #meaning of LOC

Copy

Nice, From the below image, contains the meaning of entitiesnow you got to know what NER is capable of. It was able to recognizenamesandorganizationswhere innocent charles might work there.

https://blog.neurotech.africa/content/images/2022/03/image-5.png

From the above images and codes, it is shown that we were using the already trained NER model from spaCywithout fine-tuning.

Now, let's create our own or custom NER model using spaCy based onthe Swahili language

Training Custom NER Swahili Model using Spacy By Updating the existing pre-trained Multilingual Model

Preparation of custom data, here I have prepared some training data and validation data withpre-defined entitiesas labels, consider the code below

#training dataSwahili_training_data=[    ("Maafisa wa WHO wamesema kwa wiki kadhaa ufuatiliaji wa mlipuko huo umeangazia mabara ya Marekani, na idadi ya Jumapili imeonyesha ongezeko la siku moja la zaidi ya maambukizi 116,000 katika eneo Latin Amerika na Amerika ya Kaskazini.",{"entities":[[0,7,"MTU"],[11,14,"SHIRIKA"],[88,96,"MAHALI"],[110,118,"SIKU"],[175,182,"IDADI"],[195,208,"MAHALI"],[212,232,"MAHALI"]]}),    ("Watu wawili waliojitolea walipatiwa chanjo hiyo Alhamisi mjini Oxford ambapo timu ya Chuo kikuu hicho ilitengeneza chanjo hiyo katika kipindi chini ya miezi mitatu.",{"entities":[[0,4,"MTU"],[5,11,"IDADI"],[48,56,"SIKU"],[63,69,"MAHALI"],[85,95,"SHIRIKA"]]})]#validation dataSwahili_validation_data=[    ("Canada, Russia na nchi nyingine pia wanashughulika kutengeneza chanjo, lakini wataalam wanasema hata kama itapatikana inayofaa hivi karibuni, utengenezaji wa chanjo hiyo na usambazaji wake unaweza kuchukua mwaka mmoja au zaidi.",{"entities":[[0,6,"MAHALI"],[8,14,"MAHALI"],[78,86,"MTU"],[206,217,"MUDA"]]}),    ("Tafiti mbalimbali pia zinaonyesha dawa ya malaria hydroxychloroquine haiponyi virusi hivyo na pengine, ukweli ulivyo, inahatarisha maisha ya wagongwa wa COVID-19.",{"entities":[[42,49,"UGONJWA"],[50,68,"DAWA"],[141,149,"MTU"],[153,162,"UGONJWA"]]})]#loading the pre trained model for doing fine tuningcustom_NER_model=xx_ent_wiki_sm.load()

Copy

Double-check if the model is loaded, consider the code below

if(custom_NER_model):  print("Existing Model is Loaded",custom_NER_model)else:  print("Existing Model is not Loaded")

Copy

https://blog.neurotech.africa/content/images/2022/03/image-1.png

Check the pipelines and labeled entities, consider the code below

print(custom_NER_model.pipe_names)print(custom_NER_model.pipe_labels)

Copy

https://blog.neurotech.africa/content/images/2022/03/image-2.png

Now the magic task happens here, the code below to covert the prepared data into spaCy data format with .spacy extension and add the custom entities to the model, and save the well-formatted data in the disk.

db = DocBin() #efficiently serialize the information#training datafor text, annot in tqdm(Swahili_training_data):  #data in previous format    doc = custom_NER_model.make_doc(text)    ents = []    for start, end, label in annot["entities"]:   #create doc object        span = doc.char_span(start, end, label=label,alignment_mode="contract")        if span is None:            print("Skipping entity")        else:            ents.append(span)    doc.ents = ents    #label the text with the ents    db.add(doc)db.to_disk("Swahili_training_data.spacy") #save the docbin object#validation datafor text, annot in tqdm(Swahili_validation_data):    doc = custom_NER_model.make_doc(text)    ents = []    for start, end, label in annot["entities"]:        span = doc.char_span(start, end, label=label,alignment_mode="contract")        if span is None:            print("Skipping entity")        else:            ents.append(span)    doc.ents = ents    db.add(doc)db.to_disk("Swahili_validation_data.spacy")

Copy

Creating the config file for the training model,this file automatically come up with necessary hyperparameters based on the pipeline and language model used, this saves time instead of defining them manually in codes. There are multiple ways of creating a config file, but this seems to be simple with CLI.

! python -m spacy init config config.cfg --lang xx --pipeline ner --optimize efficiency

Copy

Finally, use the spacy train and config fileto train the model on the prepared data in spacy format as shown below

! python -m spacy train config.cfg --output ./ --paths.train ./Swahili_training_data.spacy --paths.dev ./Swahili_validation_data.spacy

Copy

https://blog.neurotech.africa/content/images/2022/03/image-6.png

Load the custom NER Swahili modeland test it in an unseen Swahili text document

model_test=spacy.load("../Notebook/model-best")test_preds=model_test("Walinzi wa pwani ya Libya wamekamata wahamiaji 400 waliokuwa wakonjiani katika pwani ya Mediterranean ya nchi hiyo wakielekea Ulaya na kuwarejesha katika mji mkuu wa Tripoli masaa 24 yaliyopita, Shirika la uhamiaji la Umoja wa Mataifa UN limesema Jumapili.")for x in test_preds.ents:    print(x.text,x.label_)displacy.render(test_preds,style="ent",jupyter=True) #display the recognized named entity in the text given

Copy

https://blog.neurotech.africa/content/images/2022/03/image-8.png

Nice job! we have managed to create a simple custom Swahili NER model using spaCy, in this article you have learned about NER, business use cases, and see the implementation of NER and creating a custom model using spaCy.

Bottom line

Following my recent exposure toNER,I am quite confident in stating that this is a highly helpful feature used in a wide range of business scenarios. However, many difficulties must be considered to make the most optimal use of NER.

On the other hand, the rapid advancement of deep learning algorithms as offered byNeurotech Africaand other organizations has resulted in far more powerful NLP models in recent years. You may considercontacting usnow to upscale your business and make the most of it.

Author: Innocent Charles,machine Learning data scientistandNLP developer advocatebased in Africa, focuses on harnessing the power of data and technology to create smart solutions that address complex challenges around Africa. I'm quite eager in hearing about your experience with data space!, let's keep in touch onLinkedin


Original Link: https://dev.to/neurotech_africa/custom-swahili-named-entity-recognition-using-spacy-5p5

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To