Sql example #248

tmarkovich · 2022-02-02T19:26:23Z

Note, this PR is a work in progress. I'm putting it up now to get some guidance on code organization -- where should I put these functions, refactoring, etc?

Types of changes

[] Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

This PR provides a set of scripts to process data stored in SQL tables to get them into the right format for use with PBG.

Checklist

The documentation is up-to-date with the changes I made.
I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
All tests passed, and additional code has been covered with new tests.

adamlerer

Hey Thomas, this looks good to me! You are going to now add a script that does this end to end (downloads data, converts to sql, runs PBG, processes the outputs back to sql)?

Maybe we call it sql_end2end or something?

torchbiggraph/examples/e2e/data_prep.py

torchbiggraph/examples/e2e/sql_templates.py

Fixes to SQL Commands

.

Typos

adamlerer

Your sql script takes a CSV file as input. I know that's just as an example and not the "intended use", but for this setup is it faster than our existing scripts? If so, should we be replacing those scripts with this code, rather than just using it as an example?

README.md

torchbiggraph/check.py

torchbiggraph/examples/sql_end2end/README.md

torchbiggraph/examples/sql_end2end/data_prep.py

adamlerer · 2022-03-08T20:58:56Z

torchbiggraph/examples/sql_end2end/data_prep.py

+def write_rels_dict(rels):
+  my_rels = ""
+  for _, row in rels.sort_values(by="graph_id").iterrows():
+    r = "{"
+    r += f"'name': '{row['id']}', 'lhs': '{row['source_type']}', 'rhs': '{row['destination_type']}', 'operator': 'translation'"
+    r += "},\n"
+    my_rels += r
+  return my_rels
+
+
+def write_entities_dict(entity2partitions):
+    my_entities = "{\n"
+    for name, part in entity2partitions.items():
+        my_entities += '\t"{name}": {{"num_partitions": {part} }},\n'.format(name=name, part=part)
+    my_entities += "}\n"
+    return my_entities


The config is an actual python function... having python code that generates python code seems overly complex, no?

Either you can call train directly from within an end-to-end script, with the config, or perhaps you can write a config with json.dump or something?

Here's a schematic that might make this more clear:

>>> cfg = dict( ... entity_path="/foo/bar", ... edge_paths=[ ... "path1", ... "path2", ... ], ... relations={ ... "rel1": 3 ... } ... ) >>> import json >>> print(json.dumps(cfg, indent=2)) { "entity_path": "/foo/bar", "edge_paths": [ "path1", "path2" ], "relations": { "rel1": 3 } }

So you would do something like:

cfg = copy.deepcopy(DEFAULT_CFG) cfg.relations = relations cfg.entities = entities with open(cfg_file, "w") as f: f.write(f"def get_torchbiggraph_config():\n {json.dumps(cfg, indent=4)}")

This makes sense.

torchbiggraph/examples/sql_end2end/data_prep.py

tmarkovich · 2022-04-01T19:53:08Z

In a single test it was faster, but I'm not sure that I'd advocate for replacing the conversion script yet. I'd probably want to spend a little more time optimizing and evaluating first before doing that.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2022

tmarkovich marked this pull request as draft February 2, 2022 19:27

adamlerer reviewed Feb 2, 2022

View reviewed changes

Thomas Markovich added 9 commits February 14, 2022 09:59

Added SQL queries to generate edgelists

b4dc2a5

Fixes to SQL Commands

Added the script to generate all PBG files from a SQL database

5b3676c

Added comment

75d3e5d

.

Moved comment to readme

c69072a

Started to write a readme

7b0f05a

Added blank check file

5c0adae

pr comments

a2d1e1f

Added the ability to write the config

334164f

Expanded readme

fddb146

tmarkovich force-pushed the sql-example branch from 7076f8d to fddb146 Compare February 14, 2022 16:20

Thomas Markovich added 11 commits February 14, 2022 11:35

Updated readme

a6156b8

typo

a7ffad6

Typos

Renamed e2e dir

7811f13

Constants

087f8a2

Fixes

658c864

Added check.py

22d38d5

Check.py now catches errors

0dc9032

Added check to be a biggraph script

57d26a1

Reverted commented out edges

d153a23

Updated logging

38c533e

more logging

dec264e

tmarkovich marked this pull request as ready for review February 24, 2022 02:15

Thomas Markovich added 2 commits February 23, 2022 21:37

Added the check script to readme

e33b659

Added command line args

1c8c1e5

adamlerer suggested changes Mar 8, 2022

View reviewed changes

Thomas Markovich added 2 commits April 1, 2022 13:51

Documentation fix

8ace8bf

Updated the check script

3198239

Thomas Markovich added 6 commits April 1, 2022 13:51

PR Comments and typing

339ba94

Added return types

cb5c7d0

Little more cleanup

1dd8492

Uncommented

d76c78c

removed the config template

729043c

Switched to cosine + ranking

291a0f1

tmarkovich requested a review from adamlerer April 1, 2022 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sql example #248

Sql example #248

tmarkovich commented Feb 2, 2022

adamlerer left a comment

adamlerer left a comment •

edited

Loading

adamlerer Mar 8, 2022

tmarkovich Apr 1, 2022

tmarkovich commented Apr 1, 2022 •

edited

Loading

Sql example #248

Are you sure you want to change the base?

Sql example #248

Conversation

tmarkovich commented Feb 2, 2022

Types of changes

Motivation and Context / Related issue

Checklist

adamlerer left a comment

Choose a reason for hiding this comment

adamlerer left a comment • edited Loading

Choose a reason for hiding this comment

adamlerer Mar 8, 2022

Choose a reason for hiding this comment

tmarkovich Apr 1, 2022

Choose a reason for hiding this comment

tmarkovich commented Apr 1, 2022 • edited Loading

adamlerer left a comment •

edited

Loading

tmarkovich commented Apr 1, 2022 •

edited

Loading