This project analyzes eight leading in-memory subgraph matching algorithms, including QuickSI, GraphQL, CFL, CECI, DP-iso, RI, and VF2++. We evaluate them based on:
- Vertex filtering methods in data graphs.
- Techniques for ordering query vertices.
- Strategies for enumerating partial results.
- Other optimization approaches.
We also compare these algorithms against Glasgow, a constraint programming-based algorithm. Our findings show the strengths of different methods in various aspects, leading to recommendations for specific scenarios based on graph density and query size.
To compile the source code, navigate to the project's root directory and execute:
mkdir build
cd build
cmake ..
make
To verify the binary's correctness, run the following:
cd test
python test.py ../build/matching/SubgraphMatching.out
Find the compiled 'SubgraphMatching.out' in 'build/matching'. Execute it using:
./SubgraphMatching.out -d [data_graphs] -q [query_graphs] -filter [filter_method] -order [order_method] -engine [enum_method] -num [embeddings_number]
Set -num
to 'MAX' for all results. The program stops upon reaching the limit or finding all results.
Example:
./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter GQL -order GQL -engine LFTJ -num MAX
The input graphs should be vertex-labeled, starting with 't N M' (N = vertices, M = edges). Format vertices as 'v VertexID LabelId Degree' and edges as 'e VertexId VertexId'. Vertex ids start from 0.
Example:
t 5 6
v 0 0 2
v 1 1 3
v 2 2 3
v 3 1 2
v 4 2 2
e 0 1
e 0 2
e 1 2
e 1 3
e 2 4
e 3 4
- LDF: Label and degree filter (default).
- NLF: Neighborhood label frequency filter.
- GQL: GraphQL's method.
- TSO: TurboIso's method.
- CFL: CFL's method.
- DPiso: DP-iso's method.
- CECI: CECI's method.
- QSI: QuickSI's method.
- GQL: GraphQL's method.
- TSO: TurboIso's method.
- CFL: CFL's method.
- DPiso: DP-iso's method.
- CECI: CECI's method.
- RI: RI's method.
- VF2++: VF2++'s method.
- QSI: QuickSI and RI's method.
- GQL: GraphQL's method.
- EXPLORE: CFL's method.
- DPiso: DP-iso's method.
- CECI: CECI's method.
- LFTJ: Local candidates computation.
Examples of command-line execution are provided for various configurations.
We suggest using GQL, CFL, or DPiso for filtering; GQL or RI for ordering; and always LFTJ for enumeration. Enable failing set pruning for large queries and QFilter for dense graphs.
Run a basic query using GraphQL's filtering and ordering methods with local candidate computation for enumeration:
./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter GQL -order GQL -engine LFTJ -num MAX
Execute with custom settings, such as using label and degree filter, CFL's ordering method, and set-intersection based local candidates computation:
./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter LDF -order CFL -engine LFTJ -num MAX
For complex queries, configure the set intersection algorithms and optimization techniques in 'configuration/config.h'.
Perform spectrum analysis on a query with time limits:
./SubgraphMatching.out -d ../../test/sample_dataset/test_case_1.graph -q ../../test/sample_dataset/query1_positive.graph -filter LDF -order Spectrum -engine Spectrum -order_num 100 -time_limit 60 -num 100000