Why PubMed Central?
Verbatim quotes from their web page:
"PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM)"
"...a digital archive of scholarly articles, spanning centuries of scientific research."
"The PMC Open Access Subset (or PMC OA Subset) contains millions of full-text open access article files made available under a Creative Commons or similar license terms or with publisher permission."
The permissive license of the OA Subset allows us to perform machine analysis. We already have a semantic index of all articles in the set (over 7 million !!), which enables search and retrieval in 100+ languages.