Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Prefect #115

Merged
merged 24 commits into from
Oct 17, 2024
Merged

Prefect #115

merged 24 commits into from
Oct 17, 2024

Conversation

bram-vdberg
Copy link
Collaborator

This PR creates a Prefect deployment to run the dune-sync order rewards job once every 3 hours.

Copy link

socket-security bot commented Oct 15, 2024

@bram-vdberg bram-vdberg marked this pull request as draft October 15, 2024 09:13
Copy link
Contributor

@fhenneke fhenneke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not checked the actual code in detail and was not yet able to run it locally. What are the required steps?


RUN apt-get update \
&& apt-get -y install libpq-dev gcc \
&& pip install psycopg2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use the Dockerfile anymore so it's not needed anymore. These are needed if you're using sqlalchemy inside a Docker container.

.gitignore Outdated
@@ -129,3 +129,6 @@ dmypy.json

# Pyre type checker
.pyre/

# Vim swap files
**/*.swp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, I guess, but my personal preference would be to have a small .gitignore with project specific content. For local setup, I use .git/info/exclude.

Is this approach with a huge .gitignore better for, say, using dev containers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a global .gitignore

In ~/.gitconfig:

[core]
	excludesfile = /Users/bh2smith/.gitignore_global

which contains things like:

TODO.md
.venv/
.env.bkp
env/
venv/
.DS_Store
.idea/
.vscode/
out/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this approach with a huge .gitignore better for, say, using dev containers?

Not that I know of, I didn't know you could add it to .git/info/exclude. Thanks!

@@ -0,0 +1,4 @@
prefect==3.0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can start with a prefect.in and generated prefect.txt right away here.

We could also make that a separate change, but I do not see much momentum for any type of refactoring.

requirements/prefect.txt Show resolved Hide resolved
prefect/deployment.py Outdated Show resolved Hide resolved
prefect/deployment.py Outdated Show resolved Hide resolved
@bram-vdberg
Copy link
Collaborator Author

I have not checked the actual code in detail and was not yet able to run it locally. What are the required steps?

Running it locally requires changing the way we serve the deployment. I'll make a local_deploy.py file to make this easier so you can just run docker compose up instead.

Copy link
Contributor

@fhenneke fhenneke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure of the order_rewards flow. The logic is really clear there.

I do not fully understand the logic behing the batching in time. Some clarifications would help there.

There are a few unused variables here and there.

Otherwise it looks good for finally using prefect systematically.

prefect/deployment.py Outdated Show resolved Hide resolved

load_dotenv()

def get_last_monday_midnight_utc():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected to see data uploaded aligned with accounting periods, so Tuesday 00:00:00.

In principle, that should not have an effect on the aggregated table. Uploading new values for an accounting week before payments should be easier with alignment.



@task
def get_block_range() -> BlockRange:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to double check this. It looks as if this approach would not be able to handle the transition between weeks. Or is it uploading data twice for one day of the week?

In either case, some comments would be helpful to explain the behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it executes the job once every 30 minutes, but it's very easy to change it. I think Haris recommended 3 hours so I can set it to that. Every time this job runs it re-uploads the entire data for that week, overwriting the current table (for that week). When we get to another week the start block will change and it will instead upload to a new table with a new name.

So on a transition between weeks it will run the query multiple times for that day, and re-upload each time, changing the table when the startblock moves up a week.

prefect/deployment.py Outdated Show resolved Hide resolved
prefect/deployment.py Outdated Show resolved Hide resolved
blockrange = get_block_range()
orderbook = fetch_orderbook(blockrange)
data = cast_orderbook_to_dune_string(orderbook)
table_name = upload_data_to_dune(data, blockrange.block_from, blockrange.block_to)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could also use the block range object here. Would look cleaner here but less clean in upload_data_to_dune. Since the latter only uses it for creating a string, it should be fine to look less clean there.

@bram-vdberg
Copy link
Collaborator Author

I have not checked the actual code in detail and was not yet able to run it locally. What are the required steps?

I made some updates, now you should be able to run the prefect server with make prefect. Then you can find the dashboard on localhost:4200. There's a local config in src.prefect.local_deploy that deploys the flow to this local dashboard. You can test it with: make deployment.

One note on running the flow locally: the dune update query command requires an enterprise API key, so I don't use that step on the local deploy. It will just run the commands and upload the data to your dune account.

Copy link
Contributor

@fhenneke fhenneke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this so we can test the github flow.

Local testing kind of worked but there are still a few issues. Creating a PR for that would be good. (e.g. dependency conflicts).

@bram-vdberg bram-vdberg marked this pull request as ready for review October 17, 2024 12:31
@bram-vdberg bram-vdberg merged commit c2e956d into main Oct 17, 2024
6 checks passed
@bram-vdberg bram-vdberg deleted the Prefect branch October 17, 2024 12:38
harisang pushed a commit that referenced this pull request Nov 9, 2024
This PR removes the changes introduced by #122, #117, #115, and #116 and
removes Prefect from the dune-sync repo.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants