-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NY Misc. court #827
Comments
Hi @mlissner the issue you linked gives me a 404. Is it broken or is it just me? |
That's to our CRM, which we keep pretty private, but I wanted to link it so that we remember we did this for the client. We have three closed repos: |
@mlissner - do you mean this reporter |
Maybe? I'll forward you a message with details. |
Adding this scraper requires a significant extension to Juriscraper to accommodate the scraping of various "other courts" in New York, beyond the scope of specific individual courts. This development will necessitate two key changes: Modification to Juriscraper Paradigm: The current structure of Juriscraper is predominantly focused on scraping specific courts. The proposed scraper will need to be more flexible, allowing it to handle a range of different courts under the "other courts" category in New York. Updates to CourtListener (cl_scrape_opinions file): To integrate this new scraper effectively, we need to modify the Additionally, we will need to update the Identified Courts for InclusionSo far, we have identified the following courts for inclusion in this scraper:
this code more or less is what is all that is needed to get teh scraper working once we make a slight modification to |
This is the complete list of courts for NYMisc 2018-today, for 35 048 cases, 27 798 pdf, 7 250 htm. Some had no court data, or not on the usual places. Got data for 34 886 There are some errors like
|
courts-db has been updated for all of the non-typo examples found. @grossir - I still need to release the version and update our court list in courtlistener |
BackgroundNot sure if you feel strongly about this mike but I need to layout a few things. NYMisc - is actually a bit easier than first anticipated - and also - more difficult. Traditionally, we would have scraped from this search page to collect The difficult part was that there are hundreds of other courts and no way to identify them without parsing the html/pdf files. This meant that we would have to break the paradigm for miscellaneous courts. But in unraveling a "bug" that wasn't a bug yesterday we realized that there is a second listing of opinions provided by the court that we sometimes use. NY also provides the list here Which curiously publishes opinions for MISC opinions but if you look closely you start to see them out of date order. So I called the New York State Law Reporting Bureau this morning and spoke to a nice New Yorker who explained that they often add new courts years later if that opinion becomes relevant later. For example a higher court could be referencing this case and so they decide to publish it later. If you look at the numbering (which they do) they sequentially number based on when it's posted here and not the date filed. Additionally, this means we can easily grab the parent_court - or child court when we scrape and can avoid any post download extraction from text because they have a handy citation lookup tool that provides that information which we should grab via a deferring list. Here Is an example which can easily be parsed and extracted. ProposalI suggest that we take our parent_courts and build one scraper for each. I think that would be these parent scrapers
Additionally, I think we should add a field for if the nysupreme -scraper returns |
If you got her info, please put it in the CRM for future readers.
This all sounds solid to me! |
NY Misc Reporter freelawproject/juriscraper#827 has opinions for a few hundred small courts, which we have grouped into 10 families, each with its own scraper. To avoid both losing data granularity and avoid creating a scraper for each court, juriscraper will return a child_court field for each opinion, which will be transformed into the proper court object . This breaks the usual way, where the court object is obtained from the scraper module name.
…w hundred small courts, which we have grouped into 10 families, each with its own scraper. To avoid both losing data granularity and avoid creating a scraper for each court, juriscraper will return a child_court field for each opinion, which will be transformed into the proper court object . This changes the usual way, where the court object is obtained from the scraper module name. Duplicate checking would not be affected, since it does not uses the court object. DupChecker: - first uses the court url to check if the site is the same - then checks the downloaded content to check if it changed
NY Misc Reporter freelawproject/juriscraper#827 has opinions for a few hundred small courts, which we have grouped into 10 families, each with its own scraper. To avoid both losing data granularity and avoid creating a scraper for each court, juriscraper will return a child_court field for each opinion, which will be transformed into the proper court object . This changes the usual way, where the court object is obtained from the scraper module name. Duplicate checking would not be affected, since it does not uses the court object. DupChecker: - first uses the court url to check if the site is the same - then checks the downloaded content to check if it changed
NY Misc Reporter freelawproject/juriscraper#827 has opinions for a few hundred small courts, which we have grouped into 10 families, each with its own scraper. To avoid both losing data granularity and avoid creating a scraper for each court, juriscraper will return a child_court field for each opinion, which will be transformed into the proper court object. This changes the usual way, where the court object is obtained from the scraper module name. Duplicate checking would not be affected, since it does not uses the court object. DupChecker: - first uses the court url to check if the site is the same - then checks the downloaded content to check if it changed
Theses changes are needed to support freelawproject/juriscraper#827 - If possible, get court object from child_court field passed by nytrial families of scrapers. Otherwise, default to parent court. This does not alter behavior for other sources. Solves freelawproject/juriscraper#827 - Pass opinion.html to site.extract_from_text, if opinion.plain_text does not exists. Solves freelawproject#3549 - Add support to update Opinion object from extract_from_text metadata dict.
Theses changes are needed to support freelawproject/juriscraper#827 - If possible, get court object from child_court field passed by nytrial families of scrapers. Otherwise, default to parent court. This does not alter behavior for other sources. Solves freelawproject/juriscraper#827 - Pass opinion.html to site.extract_from_text, if opinion.plain_text does not exists. Solves freelawproject#3549 - Add support to update Opinion object from extract_from_text metadata dict. - Update juriscraper to 2.5.78 - Update courts-db to 0.10.22
Theses changes are needed to support freelawproject/juriscraper#827 - If possible, get court object from child_court field passed by nytrial families of scrapers. Otherwise, default to parent court. This does not alter behavior for other sources. Solves freelawproject/juriscraper#827 - Pass opinion.html to site.extract_from_text, if opinion.plain_text does not exists. Solves freelawproject#3549 - Add support to update Opinion object from extract_from_text metadata dict. - Update juriscraper to 2.5.78 - Update courts-db to 0.10.22
Theses changes are needed to support freelawproject/juriscraper#827 - If possible, get court object from child_court field passed by nytrial families of scrapers. Otherwise, default to parent court. This does not alter behavior for other sources. Solves freelawproject/juriscraper#827 - Pass opinion.html to site.extract_from_text, if opinion.plain_text does not exists. Solves freelawproject#3549 - Add support to update Opinion object from extract_from_text metadata dict. - Update juriscraper to 2.5.78 - Update courts-db to 0.10.22
One of our clients wants to get alerts for cases in the NY Misc. court. Now that @grossir is here, we should make this client happy.
The text was updated successfully, but these errors were encountered: