Documentation for Recommend Papers

Bulk Download
Web Site
APIs

Bulk Download

See here for more information on bulk downloads.

Web Site

See web site. You can enter key words in either search box. The box on the left will search papersand the box on the right will search authors.

APIs

These APIs provide a federated view over three APIs: Semantic Scholar (S2), PubMed and Astrophysics Data System (ADS). In addition, the APIs support recommendations based on cached embeddings from Specter and ProNE. Specter is a BERT-like model, fine-tuned on a few million citations. ProNE uses spectral clustering to embed 2 billion citations. Recommendations from ProNE tend to have more citations, whereas recommendations from Specter tend to be more recent (papers with more citations tend to be older because it takes time for papers to accumulate citations). The API provides access to 4 embeddings: Specter, ProNE, SciNCL and GNN; see discussion of score1, score2 and embeddings.

All of the APIs use GET HTTP requests, and return json objects. Click on the examples below to see the input GET request and the output json objects.

API Examples Arguments Description

Paper Search

example

more challenging example with get_pdfs, get_bibtex and sort_by

help, query, fields, sort_by, limit, get_pdfs, get_bibtex

Find papers matching input query (a string); output fields from Semantic Scholar for each paper.

See documentation on fields for more information on fields in Semantic Scholar.

A common use case is to request paper ids from titles of papers since many of the APIs below are based on ids in Semantic Scholar (and other sources).

sort_by can be any of the fields that can be converted to integers

Author Search

simple example

more challenging example

example with embeddings arg

help, query, fields, embeddings, sort_by, limit

Find authors matching input query (a string); output fields from Semantic Scholar for each author.

See documentation on fields for more information on fields in Semantic Scholar.

sort_by can be any of the fields that can be converted to integers

Limit argument will truncate results (after sorting)

Note: author fields are different from paper fields.

Lookup Paper

simple example

more challenging example (with score2)

more challenging example (with embeddings)

more challenging example (with get_pdfs)

more challenging example (with get_bibtex)

help, id, fields, embeddings, score2, get_pdfs, get_bibtex

Input one or more comma separated paper id and output fields from Semantic Scholar, as well as embeddings.

If embeddings argument is specified, then output embedding vectors for each input paper (missing values will have vectors of 0).

See documentation on embeddings for details on how to specify combinations of different embeddings to return.

bibtex

simple example

more challenging example (with score2)

help, id

Input one or more comma separated paper id and output bibtex entries

pdfs

simple example

help, id

Input one or more comma separated paper id and output urls for pdfs

Lookup Author

example1

more challenging example (with score2)

more challenging example (with embeddings)

more challenging example (with get_pdfs and get_bibtex)

help, id, fields, sort_by, limit, embeddings, score2, get_pdfs, get_bibtex

Input author id and output author fields from Semantic Scholar.

Field argument can request list of papers

Limit argument will truncate list of papers returned

sort_by argument can be citationCount; if so, then field argument should contain papers.citationCount

Embeddings can return vectors; see documention on embeddings

Note: author ids are different from paper ids and author fields are different from paper fields.

Lookup Citations example help, offset (defaults to 0), limit (defaults to 100; max is 1000), id, fields

Lookup Citations for paper id and output fields from Semantic Scholar for each citation.

A useful field to request is contexts; that field returns citing sentences, sentences from other papers that cite the input paper id.

Another useful field to request is intents; that field returns the intent of eachciting sentence.

For papers with more than 1000 citations, call this API multiple times with different offsets.

Lookup References example help, offset (defaults to 0), limit (defaults to 100; max is 1000), id, fields

Lookup References for paper id and output fields from Semantic Scholar for each reference.

A useful field to request is contexts; that field returns citing sentences, sentences in the input paper id that cite other other papers.

Another useful field to request is intents; that field returns the intent of eachciting sentence.

For papers with more than 1000 citations, call this API multiple times with different offsets.

Coauthors example help, query, after_year

Input query (a string); for each matching author ids, returns a list of coauthors filtered by after_year (a 4 digit number).

Note: since Semantic Scholar may have multiple author ids for the same author, the json object contains a list of coauthors for each author matching the input query

Recommend Papers

example

example with fields

example with fields and more

help, id, limit, recommend_method

Recommend papers similar to paper id using recommend_method.

See documentation on recommend_method for choices of recommend_methods that are currently supported. New: one or more methods can be provided, separated by commas.

Recommend Authors example help, id, limit, recommend_method

Recommend authors near paper id using recommend_method

Output fields from Semantic Scholar for each recommended author.

Compare and Contrast example1, example2 example2 help, ids (two or more ids, separated by commas)

Use RAG to compare and contrast the first id with the rest.

Compare and Contrast Texts example help, text1, text2

Use RAG to compare and contrast text1 with text2, where both texts are strings.

API	Examples	Arguments	Description
Paper Search	example more challenging example with get_pdfs, get_bibtex and sort_by	help, query, fields, sort_by, limit, get_pdfs, get_bibtex	Find papers matching input query (a string); output fields from Semantic Scholar for each paper. See documentation on fields for more information on fields in Semantic Scholar. A common use case is to request paper ids from titles of papers since many of the APIs below are based on ids in Semantic Scholar (and other sources). sort_by can be any of the fields that can be converted to integers
Author Search	simple example more challenging example example with embeddings arg	help, query, fields, embeddings, sort_by, limit	Find authors matching input query (a string); output fields from Semantic Scholar for each author. See documentation on fields for more information on fields in Semantic Scholar. sort_by can be any of the fields that can be converted to integers Limit argument will truncate results (after sorting) Note: author fields are different from paper fields.
Lookup Paper	simple example more challenging example (with score2) more challenging example (with embeddings) more challenging example (with get_pdfs) more challenging example (with get_bibtex)	help, id, fields, embeddings, score2, get_pdfs, get_bibtex	Input one or more comma separated paper id and output fields from Semantic Scholar, as well as embeddings. If embeddings argument is specified, then output embedding vectors for each input paper (missing values will have vectors of 0). See documentation on embeddings for details on how to specify combinations of different embeddings to return.
bibtex	simple example more challenging example (with score2)	help, id	Input one or more comma separated paper id and output bibtex entries
pdfs	simple example	help, id	Input one or more comma separated paper id and output urls for pdfs
Lookup Author	example1 more challenging example (with score2) more challenging example (with embeddings) more challenging example (with get_pdfs and get_bibtex)	help, id, fields, sort_by, limit, embeddings, score2, get_pdfs, get_bibtex	Input author id and output author fields from Semantic Scholar. Field argument can request list of papers Limit argument will truncate list of papers returned sort_by argument can be citationCount; if so, then field argument should contain papers.citationCount Embeddings can return vectors; see documention on embeddings Note: author ids are different from paper ids and author fields are different from paper fields.
Lookup Citations	example	help, offset (defaults to 0), limit (defaults to 100; max is 1000), id, fields	Lookup Citations for paper id and output fields from Semantic Scholar for each citation. A useful field to request is contexts; that field returns citing sentences, sentences from other papers that cite the input paper id. Another useful field to request is intents; that field returns the intent of eachciting sentence. For papers with more than 1000 citations, call this API multiple times with different offsets.
Lookup References	example	help, offset (defaults to 0), limit (defaults to 100; max is 1000), id, fields	Lookup References for paper id and output fields from Semantic Scholar for each reference. A useful field to request is contexts; that field returns citing sentences, sentences in the input paper id that cite other other papers. Another useful field to request is intents; that field returns the intent of eachciting sentence. For papers with more than 1000 citations, call this API multiple times with different offsets.
Coauthors	example	help, query, after_year	Input query (a string); for each matching author ids, returns a list of coauthors filtered by after_year (a 4 digit number). Note: since Semantic Scholar may have multiple author ids for the same author, the json object contains a list of coauthors for each author matching the input query
Recommend Papers	example example with fields example with fields and more	help, id, limit, recommend_method	Recommend papers similar to paper id using recommend_method. See documentation on recommend_method for choices of recommend_methods that are currently supported. New: one or more methods can be provided, separated by commas.
Recommend Authors	example	help, id, limit, recommend_method	Recommend authors near paper id using recommend_method Output fields from Semantic Scholar for each recommended author.
Compare and Contrast	example1, example2 example2	help, ids (two or more ids, separated by commas)	Use RAG to compare and contrast the first id with the rest.
Compare and Contrast Texts	example	help, text1, text2	Use RAG to compare and contrast text1 with text2, where both texts are strings.

Arguments

help: return short documentation
query: input string
id: input for lookup_paper, recommendations and lookup_citations; many of the externalId formats are supported, including:
1. sha (40 byte hex); example
2. CorpusId (the primary key in Semantic Scholar); example
3. PMID (pubmed ids); example
4. ACL (acl anthology ids); example
5. arXiv; example
6. MAG (Microsoft Academic Graph); example
id: input for lookup_author (Note: author ids are different from paper ids)
offset: start of papers to return (defaults to 0)
limit: number of results to return
fields: one or more comma separated values. Many values are supported including title, authors, publication year, bibtex entries, references, citations, citing sentences and much more (see discussion below)
recommend_method (for generating recommendations); recommend_method should be one of the following (case insensitive):
1. combined: example (a fast precomputed combination of ProNE and Specter)
2. ProNE: example
3. Specter: example
4. s2_api: example (based on an API from Semantic Scholar)
5. pubmed_api: example (based on an API from National Center for Biotechnology Information (NCBI)).
6. ads_api: example (based on an API from Astrophysics Data System)
ProNE and Specter use cached embeddings to generate recommendations. The last two generate recommendations from Semantic Scholar and PubMed, respectively
Embeddings (for lookup paper); one or more of the following (comma separated and case insensitive):
1. ProNE: example
2. Specter: example
3. SciNCL: example
4. GNN: example
5. s2_api: example
The first four ProNE, Specter and GNN, use cached vectors. The last one, s2_api, uses an API from Semantic Scholar to return the most recent values. Specter and s2_api should return the same vectors, as long as the Specter vector is not missing. There will be one vector for each input paper. Vectors of zeros are returned for missing values.
score1: outputs 1d vectors of cosine scores between each recommendation. The value of score1 is a (case insensitive) comma separated list of embeddings: ProNE, Specter, SciNCL, GNN; example.
Note: missing values will have cosines of 0
score2: outputs pairwise cosine scores between each pair recommendations. The value of score2 is a (case insensitive) comma separated list of embeddings: ProNE, Specter, SciNCL, GNN comma separated list of embeddings (pairwise cosines of recommendations); example.
Note: missing values will have cosines of 0
limit: limit the number of results
get_pdfs=True returns list of urls to pdfs
get_bibtex=True, returns bibtex entry
sort_by can be assigned to a field that converted to an integer such as sort_by=citationCount

Fields

Fields are based on Semantic Scholar APIs; see here for their documentation. Some useful values are shown below (with separate lists for papers, authors and citations). Fields is set to a comma separated list such as fields=title,authors.