Use an off-the-shelf POS tagger to tag the source text. Run the giza-convert.py script to produce the backward alignments as follows: python giza-convert.py workspace/-gfiles/-.alignments Y > workspace/-giza.backward.Repeat the second and their steps while switching and in order to produce the backward alignments.Run the giza-convert.py script to produce the forward alignments as follows: python giza-convert.py workspace/-gfiles/-.alignments N > workspace/-giza.forward.This will create a new directory workspace/-gfiles with the necessary GIZA++ output files. Run the run_gizapp.sh script to train and produce the alignments from the source to the target with the three parameters, and.The order of the IDs should correspond to the order of the sentences in -.parallel. a key file of sentence IDs -.keys, one ID per line.Use the config file data/nfig, and replace 'ENG' by, 'AFR' by and 'bible' by. the GIZA++ input configuration file -.nfig. the source-target GIZA++ input parallel file -.parallel (per line: ||| ).For the source language (ISO3 code), the target language (ISO3 code) and the dataset, produce the following files:.Create a directory alignments that has the GIZA++ and mkcls installation directories, in addition to the run_gizapp.sh and giza-convert.py scripts and a workspace directory to store the inputs and outputs.We use GIZA++ to train and produce word-level alignments between the target language and a source language for which POS annotations are available based on a parallel corpus that is white-space tokenized. MorphAGram (Add a MorphAGram directory in the main directory of this repo.).Unsupervised Stem-based Cross-lingual Part-of-Speech Tagging for Morphologically Rich Low-Resource Languages.Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios.Unsupervised Cross-Lingual Part-of-Speech Tagging Version: 1.5
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |