How can I extract triples from the Freebase dump?

1 year ago

#280351

Faisal Mirza

I would like to collect a large knowledge base of triples as: subject, object, predicate, so I downloaded the Freebase dump from the developers page, which contains triples in RDF format, and I want to decode it to a readable format. How can I achieve this?

Currently I am following the Github of nchah

and am running the shell script s0-run-parse-extract-triples.sh on VirtualBox Ubuntu, which should clean the input data of RDF's by removing URL's but keeping the ID's, and am passing my input data as freebase-triples.txt which is a sample of 100 rows from the 30Gb freebase-rdf-latest.gz as argument.

you can find the code here

Note that I was getting the message No such file in directory, so I removed line 8, and added $1 in line 17 instead of $INPUT_FILE which took care of this message, and also in line 21 I removed the # sign and changed gsed to sed, and I also added echo messages to do some tracing.

and this is how am running it:sh s0-run-parse-extract-triples.sh freebase-triples.txt

Check the error that am getting here

Am getting the output file fb-rdf-s01-c01 but it still has the URL's and its unchanged from my input, and am also getting the other file fb-rdf-s01-c02 but its empty .

rdf

freebase

triples

knowledge-graph

n-triples

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs