FAQ

Updated on 2023-05-30

Thanks for your interests in Paper Digest! Below are the frequently asked questions by our users and we try to update as needed.

1. How PD works?

User inputs a DOI or PDF url in the search box, and PD will try to fetch, parse, and summarize the article. It only works if the article is Open Access.

2. What is a DOI?

Digital Object Identifier or DOI is a standardized alphanumeric code starting with “10…” assigned to most of academic articles. Clicking on a DOI will resolve to full text or at least to a page with the article abstract.

3. What is a PDF url?

A PDF url is what you see in your browser address bar when viewing a PDF article Below are not valid PDF urls:

  • Url to a pdf stored in your private folders, Google Drive, Dropbox etc.
  • Other smart type PDFs. Notice that some websites offer “Enhanced PDF” files, those cannot be read by PD.
  • Urls to web pages that embed the PDF as elements of the UI.

4. Which publishers/journals does PD work?

For PD to successfully generate a digest, the article has to be Open Access. PD looks at the license type, and the full text either in XML or PDF format. Even when the article is open, PD may or may not work successfully because each publisher has different conventions in their file format. The table below summarizes our general experience, and we will continue to improve PD so we can accommodate more publishers/journals.

5. I put a DOI of an Open Access article, but PD failed to summarize it. What happened?

  • DOIs assigned through local RAs (registration agencies) may not be deposited to Crossref, which is our primary source for validating DOIs.
  • Recently published articles might still not be available through DOI search.
  • Some Open Access articles, published in non open access journals might be out of reach of the DOI search.
In the above cases we suggest to use the PDF url instead.

6. Why is PD sometimes too slow?

The performance of PD largely depends of the file size being summarized. Bigger files will take longer to process.

7. Is there any difference if use a DOI or PDF urls?

  • Using DOI might work faster and produce better output.
  • Using PDF urls ensures greater coverage because not all publishers provide XML full text. The downside is PDF usually provides metadata less structured than XML, therefore difficult to have the same level of output in PD.

8. Are there any differences in the summary if use a DOI or PDF urls of the same article?

It might be.

  • When a DOI is provided PD looks firstly for its XML version which ensure higher fidelity to text contents and sections. Here, a better output than the PDF could be achieved.
  • If XML is not reachable, then a PDF will be fetched. In this case providing a DOI or PDF url will lead to same results.
  • There is no way to know in advance if XML will be available for an article.

9. Why does PD fail to extract some content?

Currently PD is trained based on common headers we see in academic articles, such as Introduction, Data and Methods, Results, Discussion, and Conclusion etc. With articles that do not include such common headers, PD might struggle a bit. We are working ways to improve this.

10. Why does PD give me gibberish or unreadable characters?

When a PDF url is provided, PD will do its best to read it. Some old PDFs (typically OCR-processed via scanned image) might just not have the complete text to be processed by PD.

11. I landed on an error page. What does that mean?

Depending on the nature of errors, you will receive either of the following error messages:

12. Is Paper Digest available as an API?

No. The API is now deprecated.