Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ACM fetcher, because of ACM web site changes #5804

Closed
koppor opened this issue Jan 1, 2020 · 18 comments · Fixed by #7733
Closed

Rewrite ACM fetcher, because of ACM web site changes #5804

koppor opened this issue Jan 1, 2020 · 18 comments · Fixed by #7733
Labels
component: fetcher good first issue An issue intended for project-newcomers. Varies in difficulty.

Comments

@koppor
Copy link
Member

koppor commented Jan 1, 2020

Test query: https://dl.acm.org/exportformats_search.cfm?query=%28%25252Bjabref%2520%25252Barchitectural%2520%25252Bchurn%29&within=owners.owner%3DGUIDE&expformat=bibtex

grafik

Asked on twitter:

.@TheOfficialACM Your exportformats_search.cfm seems to be down. This feature was very useful and widely used in the web. Is it intended that it currently does not work? #opencitations - example query: https://t.co/TOf3sgqvL1

— Oliver Kopp (@koppor) January 1, 2020
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
@koppor
Copy link
Member Author

koppor commented Jan 2, 2020

The fetcher has to be rewritten. See zotero/translators#2101 vor some code ideas.

(Thank you @zuphilip for the pointer)

@koppor koppor changed the title ACM fetcher broken, because of ACM Rewrite ACM fetcher, because of ACM web site changes Jan 2, 2020
@Siedlerchr Siedlerchr added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Feb 9, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2020

This issue has been inactive for half a year. Since JabRef is constantly evolving this issue may not be relevant any longer and it will be closed in two weeks if no further activity occurs.

As part of an effort to ensure that the JabRef team is focusing on important and valid issues, we would like to ask if you could update the issue if it still persists. This could be in the following form:

  • If there has been a longer discussion, add a short summary of the most important points as a new comment (if not yet existing).
  • Provide further steps or information on how to reproduce this issue.
  • Upvote the initial post if you like to see it implemented soon. Votes are not the only metric that we use to determine the requests that are implemented, however, they do factor into our decision-making process.
  • If all information is provided and still up-to-date, then just add a short comment that the issue is still relevant.

Thank you for your contribution!

@github-actions github-actions bot added the status: stale Issues marked by a bot as "stale". All issues need to be investigated manually. label Dec 8, 2020
@koppor
Copy link
Member Author

koppor commented Dec 8, 2020

We need the ACM fetcher - or we need to drop it.

@JofielB
Copy link
Contributor

JofielB commented Dec 9, 2020

Hello, is this issue still available? And if is available I would like to take it

@calixtus
Copy link
Member

calixtus commented Dec 9, 2020

Go for it! :-) If you have questions, don't hesitate to ask at our gitter dev chat ( https://gitter.im/JabRef/jabref ). If you want to learn more about JabRef-programming in general have a look at our dev docs ( https://devdocs.jabref.org/ )

NatashaDudi pushed a commit to NatashaDudi/jabref-1 that referenced this issue Dec 9, 2020
@JofielB
Copy link
Contributor

JofielB commented Dec 10, 2020

Nice, I am gonna start working on it

@koppor
Copy link
Member Author

koppor commented Dec 24, 2020

Internal reference for JabRef developers: Implementation using the screen-scraping technology is available (privately) at https://github.com/NatashaDudi/jabref/blob/develop/src/main/java/org/jabref/logic/importer/fetcher/ACMPortalFetcher.java?rgh-link-date=2020-12-21T19%3A46%3A53Z. --> Now at JabRef#476

@JofielB How are your implementation efforts going?

@koppor
Copy link
Member Author

koppor commented Jan 1, 2021

@JofielB Happy new year. May I ask for some progress regarding that issue?

You can get some inspiration from JabRef#476

@JofielB
Copy link
Contributor

JofielB commented Jan 4, 2021

@koppor Hello and Happy new year. I am sorry for taking me so long to respond to this. I wasn't able to have any kind of progress and I am not pretty sure that I will be able to keep working on this issue.

@jvera701
Copy link

Hello, can I work in this issue ? I am new to open source software and want to contribute

@koppor
Copy link
Member Author

koppor commented Jan 16, 2021

@jvera701 After checking @JofielB's reply, it is very OK for you to try. Always keep test-driven development in your mind and try to use the "play" button in IntelliJ. See https://www.jetbrains.com/help/idea/performing-tests.html for details.

@github-actions github-actions bot removed the status: stale Issues marked by a bot as "stale". All issues need to be investigated manually. label Jan 22, 2021
@jvera701
Copy link

I am sorry, I am replying this late but I couldn't make any substantial progress .and I am not sure I can continue working on this.. Writing this feature was harder than I first thought.

@XDZhelheim
Copy link
Contributor

I think I got some clues.

Referring to zotero/translators#2101, there is an API to query bibtex with DOI.

The format of URL is: https://dl.acm.org/action/exportCiteProcCitation?targetFile=custom-bibtex&dois=<doi1>,<doi2>,... (and encode it).
For example: https://dl.acm.org/action/exportCiteProcCitation?targetFile=custom-bibtex&dois=10.1145%2F3129790.3129810%2C10.1145%2F1961189.1961199. It's OK to open with browser.

The query result is a JSON file. And the "items" field contains bibtex citation. (Last part of the JSON file.)
query_result_json

After that, we can extract all the information needed from JSON.

But in the original fetcher, we use keywords to search, such as jabref architectural churn in the example given by @koppor. Therefore, the question is how to get a paper's DOI by its title. I have no idea now.

Hope this can give help.

@ruanych
Copy link
Contributor

ruanych commented May 12, 2021

One solution is to get the DOI from the HTML results of the search page, then use the export interface to get the JSON format data, and finally parse them.

Search API: https://dl.acm.org/action/doSearch?AllField=
Export API: https://dl.acm.org/action/exportCiteProcCitation?targetFile=custom-bibtex&format=bibTex&dois=,,...

@XDZhelheim
Copy link
Contributor

One solution is to get the DOI from the HTML results of the search page, then use the export interface to get the JSON format data, and finally parse them.

Search API: https://dl.acm.org/action/doSearch?AllField=
Export API: https://dl.acm.org/action/exportCiteProcCitation?targetFile=custom-bibtex&format=bibTex&dois=,,...

Yes, I have tried. The difficulty is how to build a script to get DOI from the web page. ACM digital lib uses some anti-crawler techniques.

@ruanych
Copy link
Contributor

ruanych commented May 12, 2021

Yes, I have tried. The difficulty is how to build a script to get DOI from the web page. ACM digital lib uses some anti-crawler techniques.


I used a browser and curl for testing. The browser access is all normal. The curl response requires cookies, so the cookie should be included in the request.

C:\Users\ruan>curl https://dl.acm.org/action/doSearch?AllField=lidar
The URL has moved <a href="https://dl.acm.org/action/doSearch?AllField=lidar&cookieSet=1">here</a>

@XDZhelheim
Copy link
Contributor

Nice work.

But if I remember correctly, the old ACM fetcher searches in "ACM Guide to Computing Literature", which means the query URL is https://dl.acm.org/action/doSearch?AllField=lidar&expand=all

@ruanych
Copy link
Contributor

ruanych commented May 12, 2021

Nice work.

But if I remember correctly, the old ACM fetcher searches in "ACM Guide to Computing Literature", which means the query URL is https://dl.acm.org/action/doSearch?AllField=lidar&expand=all

Thank you for your reminder, but I did not find this in the original code file.
Maybe we need a little help from @koppor .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: fetcher good first issue An issue intended for project-newcomers. Varies in difficulty.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants