-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LAANE Data Cleaning and Analysis #36
Comments
Just a quick update: Albert and I are working to combine the airbnb listing dataset with permit dataset from the city of LA . Some complications we are handling include:
Currently, we are trying to see if there are "obvious" citation breakers or followers that we can filter out immediately. Additionally, we are writing functions and a script to potentially merge the permit dataset with listing dataset using coordinate and address data in a single neighborhood. Once we are able to have a working script, we can apply it to the full datasets. |
I added a new folder "airbnblistings" where Karina and I can upload our files related to the project and I created two functions that are going to help with data cleaning. |
Cleaned up city of LA registrant dataset, merging all sheets into a single CSV file and revising mistyped entries. Next steps would include creating geolocations for each address, and connecting registrant entries with airbnb listing host IDs. |
Met with Jon to do final review of columns to keep vs. remove. Jon is onboard with having a SQLite database and is willing to learn SQL. Currently working on designing the SQLite database with Albert. |
First iteration design is complete, will do some adjustments then start writing the scripts to transformer the data. |
Jon has been updated on progress and Albert and I will be meeting tomorrow to discuss next steps. |
Organized a sheet containing new table and column names following data warehouse schema to facilitate data transformation. Beginning data dictionary to pass along to Jon following completion of data warehouse. I will be taking a short break from 06/26-07/06 and have communicated this to Jon + Albert. Albert and I plan to complete data transformation scripts starting 07/07. |
Here are some updates/milestones. For the next week and a half, Karina and I will be taking a break - we'll be meeting again starting on July 7th. I'll still be around if anyone needs anything, so feel free to email me. Milestones will be as follows:
|
Working through assessor data, we did some ERD modifications and discussed how to handle unique entries. |
Quick update: |
Hey @ryanmswan @salice ,
|
@ryanmswan @salice |
I added the SQL Alchemy code, I need to add Air BnB tables into that file, and it might need some refactoring to reflect the database relationships. |
Airbnb tables are now in the SQL Alchemy file. |
Quick update. |
Here's a quick update on where the project is. |
Update: added more scripts. |
Quick update, all but one dataset are complete. |
@ryanmswan @salice @KarinaLopez19 |
@KarinaLopez19 @ryanmswan @salice |
Video on using scripts, courtesy of Albert and Karina |
We should do a cleanup of this issue to summarize any info we need to keep from the comments into the top part. |
@KarinaLopez19 we want to add a "size" label to this, just so we can keep track of the number of hours it took. Do you have an estimate of how much time you and Albert spent on this, and how much more there is to go? |
Hey @AlbertUlysses and @KarinaLopez19, please provide any appropriate updates on this issue in the comments, since we haven't had anything documented here for a few months now. Progress: "What is the current status of your project? What have you completed and what is left to do?" |
Hi @akhaleghi Team -> reach out to LAANE to see if someone else on our team can do some analysis - last we spoke, Karina suggested that she might take this on. I don't know Karina's status, but if she's no longer contributing, maybe someone else is interested in taking the data and analyzing it? It definitely could be a good learning experience. |
@Abe Khaleghi ***@***.***>
I was going to check this out. I've looked at the code but I can't find
the data? I see a long list of datasets... are they all open data? Is
there a repository with the csv files? I suspect I can find many of them,
but ...
Is it better for me to ask these questions on the github issue or here?
Let me know.
- mcm
…On Fri, Apr 8, 2022 at 9:12 AM Albert Ulysses ***@***.***> wrote:
Hi @akhaleghi <https://github.com/akhaleghi>
I don't have any updates - from our last conversations this is the way the
project should move forward:
Karina -> add data dict
Team -> reach out to LAANE to see if someone else on our team can do some
analysis - last we spoke, Karina suggested that she might take this on.
I don't know Karina's status, but if she's no longer contributing, maybe
someone else is interested in taking the data and analyzing it? It
definitely could be a good learning experience.
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6N357DYRLYRXAQV3TRQSLVEBLG7ANCNFSM43TMIOFA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@mcmorgan27 Data should be in S3 - it's not open data - it's data from LAANE. |
Thanks. Sounds like data issues, so I'll hold off.
…On Fri, Apr 8, 2022 at 3:30 PM Albert Ulysses ***@***.***> wrote:
@mcmorgan27 <https://github.com/mcmorgan27> Data should be in S3 - it's
not open data - it's data from LAANE.
I have no knowledge where specifically it's stored- I believe Sophia was
the person in charge of storing that data - could be wrong.
Also, I believe that the data wasn't stored more publicly because LAANE
had some reservations in that regard.
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6N35ZTQSGNUJLJIY5FRM3VECXQLANCNFSM43TMIOFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We are going to close this issue. Can be reopened or referred to if stakeholder reaches back out to us. He has not responded to our requests for addional engagement. |
Data Scientist
Project Name: LA Alliance for a New Economy, Housing Project
Volunteer Opportunity: Assist local nonprofit by joining several datasets that rely on physical street address as a primary identifier. After joining these data sets together, display on a map localized by approximate region the density of AirBnB rental properties in relation to properties cited for complaints in order to identify potential "party houses" that may not be in compliance with local ordinances.
Duration: 4-6 weeks
Who to communicate your interest to
Primary Stakeholder: LAANE
Currently Staffed: @AlbertUlysses @karinalopez
Resources
Albert's video on using scripts he wrote for project
The text was updated successfully, but these errors were encountered: