Home Help

Documentation for Recommend Papers

Table of Contents

  1. Bulk Download
  2. Web Site
  3. APIs

Bulk Download

See here for more information on bulk downloads.

Web Site

See web site. You can enter key words in either search box. The box on the left will search papersand the box on the right will search authors.

APIs

These APIs provide a federated view over three APIs: Semantic Scholar (S2), PubMed and Astrophysics Data System (ADS). In addition, the APIs support recommendations based on cached embeddings from Specter and ProNE. Specter is a BERT-like model, fine-tuned on a few million citations. ProNE uses spectral clustering to embed 2 billion citations. Recommendations from ProNE tend to have more citations, whereas recommendations from Specter tend to be more recent (papers with more citations tend to be older because it takes time for papers to accumulate citations). The API provides access to 4 embeddings: Specter, ProNE, SciNCL and GNN; see discussion of score1, score2 and embeddings.

All of the APIs use GET HTTP requests, and return json objects. Click on the examples below to see the input GET request and the output json objects.

APIExamplesArgumentsDescription
Lookup Paper help, id, fields, embeddings, score2, get_pdfs, get_bibtex
  • Input one or more comma separated paper id and output fields from Semantic Scholar, as well as embeddings.
  • If embeddings argument is specified, then output embedding vectors for each input paper (missing values will have vectors of 0).
  • See documentation on embeddings for details on how to specify combinations of different embeddings to return.
bibtex help, id
  • Input one or more comma separated paper id and output bibtex entries
pdfs help, id
  • Input one or more comma separated paper id and output urls for pdfs
Lookup Author help, id, fields, sort_by, limit, embeddings, score2, get_pdfs, get_bibtex
  • Input author id and output author fields from Semantic Scholar.
  • Field argument can request list of papers
  • Limit argument will truncate list of papers returned
  • sort_by argument can be citationCount; if so, then field argument should contain papers.citationCount
  • Embeddings can return vectors; see documention on embeddings
  • Note: author ids are different from paper ids and author fields are different from paper fields.
Lookup Citations example help, offset (defaults to 0), limit (defaults to 100; max is 1000), id, fields
  • Lookup Citations for paper id and output fields from Semantic Scholar for each citation.
  • A useful field to request is contexts; that field returns citing sentences, sentences from other papers that cite the input paper id.
  • Another useful field to request is intents; that field returns the intent of eachciting sentence.
  • For papers with more than 1000 citations, call this API multiple times with different offsets.
Lookup References example help, offset (defaults to 0), limit (defaults to 100; max is 1000), id, fields
  • Lookup References for paper id and output fields from Semantic Scholar for each reference.
  • A useful field to request is contexts; that field returns citing sentences, sentences in the input paper id that cite other other papers.
  • Another useful field to request is intents; that field returns the intent of eachciting sentence.
  • For papers with more than 1000 citations, call this API multiple times with different offsets.
Coauthors example help, query, after_year
  • Input query (a string); for each matching author ids, returns a list of coauthors filtered by after_year (a 4 digit number).
  • Note: since Semantic Scholar may have multiple author ids for the same author, the json object contains a list of coauthors for each author matching the input query
Recommend Papers help, id, limit, recommend_method
  • Recommend papers similar to paper id using recommend_method.
  • See documentation on recommend_method for choices of recommend_methods that are currently supported. New: one or more methods can be provided, separated by commas.
Recommend Authors example help, id, limit, recommend_method
Compare and Contrast example1, example2 example2 help, ids (two or more ids, separated by commas)
  • Use RAG to compare and contrast the first id with the rest.
Compare and Contrast Texts example help, text1, text2
  • Use RAG to compare and contrast text1 with text2, where both texts are strings.

Arguments

  1. help: return short documentation
  2. query: input string
  3. id: input for lookup_paper, recommendations and lookup_citations; many of the externalId formats are supported, including:
    1. sha (40 byte hex); example
    2. CorpusId (the primary key in Semantic Scholar); example
    3. PMID (pubmed ids); example
    4. ACL (acl anthology ids); example
    5. arXiv; example
    6. MAG (Microsoft Academic Graph); example
  4. id: input for lookup_author (Note: author ids are different from paper ids)
  5. offset: start of papers to return (defaults to 0)
  6. limit: number of results to return
  7. fields: one or more comma separated values. Many values are supported including title, authors, publication year, bibtex entries, references, citations, citing sentences and much more (see discussion below)
  8. recommend_method (for generating recommendations); recommend_method should be one of the following (case insensitive):
    1. combined: example (a fast precomputed combination of ProNE and Specter)
    2. ProNE: example
    3. Specter: example
    4. s2_api: example (based on an API from Semantic Scholar)
    5. pubmed_api: example (based on an API from National Center for Biotechnology Information (NCBI)).
    6. ads_api: example (based on an API from Astrophysics Data System)
    ProNE and Specter use cached embeddings to generate recommendations. The last two generate recommendations from Semantic Scholar and PubMed, respectively
  9. Embeddings (for lookup paper); one or more of the following (comma separated and case insensitive):
    1. ProNE: example
    2. Specter: example
    3. SciNCL: example
    4. GNN: example
    5. s2_api: example
    The first four ProNE, Specter and GNN, use cached vectors. The last one, s2_api, uses an API from Semantic Scholar to return the most recent values. Specter and s2_api should return the same vectors, as long as the Specter vector is not missing. There will be one vector for each input paper. Vectors of zeros are returned for missing values.
  10. prone, scincl, specter, gnn, s2_api (case insensitive).
  11. score1: outputs 1d vectors of cosine scores between each recommendation. The value of score1 is a (case insensitive) comma separated list of embeddings: ProNE, Specter, SciNCL, GNN; example.
    Note: missing values will have cosines of 0
  12. score2: outputs pairwise cosine scores between each pair recommendations. The value of score2 is a (case insensitive) comma separated list of embeddings: ProNE, Specter, SciNCL, GNN comma separated list of embeddings (pairwise cosines of recommendations); example.
    Note: missing values will have cosines of 0
  13. limit: limit the number of results
  14. get_pdfs=True returns list of urls to pdfs
  15. get_bibtex=True, returns bibtex entry
  16. sort_by can be assigned to a field that converted to an integer such as sort_by=citationCount

Fields

Fields are based on Semantic Scholar APIs; see here for their documentation. Some useful values are shown below (with separate lists for papers, authors and citations). Fields is set to a comma separated list such as fields=title,authors.
  1. Fields for papers:
    1. externalIds (outputs one or more of the following ids from the 8 sources behind Semantic Scholar)
      1. CorpusId (the primary key for Semantic Scholar); example
      2. MAG (Microsoft Academic Graph); example
      3. DOI; example
      4. PubMed; example
      5. PubMedCentral; example
      6. DBLP; example
      7. arXiv; example
      8. ACL; example
    2. url (pointer into Semantic Scholar); example
    3. title (of paper); example
    4. abstract; example
    5. tldr; example
    6. authors; you can request a list of author objects and/or specific fields from the author objects:
      1. authors (list of author objects); example
      2. authors.name (list of author names); example
      3. authors.authorId (list of author ids); example
    7. year; example
    8. venue; example
    9. citationStyles; example (outputs bibtex entries)
    10. referenceCount; example
    11. citationCount; example
    12. openAccessPdf (pointer to PDF file, if known); example
    13. fieldsOfStudy (probably from MAG); example
    14. s2FieldsOfStudy (like fieldsOfStudy, but from Semantic Scholar); example
    15. embedding.specter_v2 (vector of 768 floats based on an encoding of the title and abstract using a BERT-like model); example
    16. citations (list of papers that cite this paper); example
      1. citations.title; example
      2. citations.authors; example
      3. citations.citationCount; example
      4. citations.xyz where xyz is authors, citationCount and most other paper fields
    17. references; example
      1. references.title; example
      2. references.authors; example
      3. references.citationCount; example
      4. references.xyz where xyz is authors, citationCount and most other paper fields
  2. Fields for authors:
    1. authorId; example
    2. externalIds; example
    3. url; example
    4. name; example
    5. affiliations; example
    6. homepage; example
    7. paperCount; example
    8. hIndex; example
    9. citationCount; example
    10. papers (list of papers); example
      1. papers.title; example
      2. papers.externalIds; example
      3. papers.xyz where xyz is authors, citationCount and most other paper fields
  3. Fields for citations:
    1. contexts (citing sentences): example
    2. intents: example
    3. xyz where xyz is authors, citationCount and most other paper fields