Medpost

From Icbwiki

Jump to: navigation, search

Configuration

Download and install (see http://www.ncbi.nlm.nih.gov/staff/lsmith/MedPost.html). Run make whenever the installation directory changes.

Processing text

Using bash or sh:

 medpost -text <input.txt >output.tagged

Will produce tagged output from the text input 'input.txt'

Processing medline no title files

  • Extract the docNumber and pmid fields:
 cut -f 1,2 ~/projects/trec-2004-no-title/b-trec-gen-2004-2005/twease/data/trec/2004/100-abstracts-sentence3.tsv >1_2.txt
docNumber       pmid
27087   8122812
47510   8146329

  • Extract the sentence field:
 cut -f 3 ~/projects/trec-2004-no-title/b-trec-gen-2004-2005/twease/data/trec/2004/100-abstracts-sentence3.tsv >3.txt
  • Remove dots from sentences. Medpost interprets most dots are sentence breaks. Sentence breaks in turn introduce new lines in the output. New lines would disrupt the alignment with the input file:
 sed -s 's/\./ /g' 3.txt  >3_nodot.txt

  • Process sentence field with Medpost:
 cat 3_notdot.txt | ./medpost -text >3-tagged.txt
sentence_NN
methods_NNS :_: This_DD article_NN evaluates_VVZ the_DD liquid_NN and_CC microbial_JJ barrier_NN properties_NNS of_II 13_MC
reusable_JJ and_CC disposable_JJ gowns_NNS and_CC investigates_VVZ the_DD cumulative_JJ effects_NNS of_II laundering_VVGN 
and_CC sterilizing_NN on_II the_DD barrier_NN efficacy_NN of_II reusable_JJ gowns_NNS by_II means_NNS of_II the_DD impact_NN
penetration_NN (_( splash_NN)_) Test_NN ,_, the_DD synthetic_JJ blood_NN resistance_NN Test_NN ,_, the_DD viral_JJ resistance_NN  
Test_NN ,_, and_CC the_DD elbow_NN Lean_NN (_( demonstration_NN )_) Test_NN ._.
  • Verify that 1_2.txt and 3-tagged.txt have the same number of lines:
 wc -l 1_2.txt 3-tagged.txt
  • Reattach docNumber and pmid to the tagged text:
paste 1_2.txt 3-tagged.txt >1_2_3-tagged.txt
  • The output should look like this :
docNumber       pmid    sentence_NN
27087   8122812 methods_NNS :_: This_DD article_NN evaluates_VVZ the_DD liquid_NN 
and_CC microbial_JJ barrier_NN properties_NNS of_II 13_MC reusable_JJ and_CC 
disposable_JJ gowns_NNS and_CC investigates_VVZ the_DD cumulative_JJ effects_NNS
of_II laundering_VVGN and_CC sterilizing_NN on_II the_DD barrier_NN efficacy_NN
of_II reusable_JJ gowns_NNS by_II means_NNS of_II the_DD impact_NN penetration_NN
(_( splash_NN )_) Test_NN ,_, the_DD synthetic_JJ blood_NN resistance_NN Test_NN
,_, the_DD viral_JJ resistance_NN Test_NN ,_, and_CC the_DD elbow_NN Lean_NN
(_( demonstration_NN )_) Test_VVI
Personal tools