Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added array type handler & completed well known defs for all PG data types #51

Merged
merged 21 commits into from
Oct 4, 2024

Conversation

jimezesinachi
Copy link
Contributor

@jimezesinachi jimezesinachi commented Sep 27, 2024

SynthQL pull request template

Why

Motivation an context for the change.

What changed

Brief summary of the changes.

Important

Mention important points that require special attention or that might surprise
reviewers.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced schema generation with the addition of excludeTablesAndViews property, allowing users to specify tables/views to exclude.
    • Updated JSON schemas to support more complex item definitions for columns and properties.
    • New schema definition added for the public.film table, including the special_features field.
    • Improved handling of PostgreSQL built-in types in the schema generation process.
  • Bug Fixes

    • Improved handling of PostgreSQL built-in types in the schema generation process.
  • Documentation

    • Enhanced comments for better readability in various files.
  • Chores

    • Added a new dependency for improved JSON schema validation.

Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Copy link

vercel bot commented Sep 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 4, 2024 7:49am

Copy link
Contributor

coderabbitai bot commented Sep 27, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The changes in this pull request encompass various modifications across multiple files, primarily focusing on enhancing the PostgreSQL executor and schema definitions. The PgExecutor class has been refined to streamline type parsing for PostgreSQL types and improve query execution logic. Additionally, new properties have been added to JSON schemas, enhancing their capability to define complex structures. Other changes include updates to test files, new utility functions, and adjustments in configuration files, all aimed at improving the functionality and maintainability of the codebase.

Changes

File Change Summary
packages/backend/src/execution/executors/PgExecutor/PgExecutor.ts Streamlined type parser settings; removed custom parsers for TIME and TIMETZ; enhanced query execution methods.
packages/backend/src/tests/propertyBased/tablesToSkip.ts Improved formatting of TODO comments; no structural changes.
packages/cli/src/tests/synthql.dev.config.json Added new properties items and properties to JSON schema for column attributes.
packages/cli/src/validators/schemas.ts Updated JSON schema definitions to include new items property in schemaDefOverridesSchema and configFileSchema.
packages/docs/static/reference/assets/navigation.js Updated window.navigationData with a new base64-encoded string.
packages/docs/static/reference/assets/search.js Updated window.searchData with a new base64-encoded string.
packages/docs/static/schemas/synthql.config.json Added items property to type definitions in schemaDefOverrides for columns and properties.
packages/introspect/package.json Added new dependency "ajv": "^8.17.1" to the @synthql/introspect package.
packages/introspect/src/e2e.test.ts Introduced excludeTablesAndViews property in multiple generate function calls.
packages/introspect/src/generate.ts Added excludeTablesAndViews property to GenerateProps; introduced helper functions for array type handling.
packages/introspect/src/tests/pg_types.test.ts New test suite for PostgreSQL data types; includes various utility functions for schema validation and data handling.
packages/queries/src/types/Schema.ts Enhanced Schema, ColumnDefTypeDef, and ColumnDefPropertyDef interfaces with new items property.

Possibly related PRs

  • Deferred queries #52: The changes in the PgExecutor class in the main PR relate to the modifications in the executeQuery method, which is also part of the broader execution logic being enhanced in the "Deferred queries" PR. Both PRs focus on refining how queries are processed and executed, indicating a connection in their objectives regarding query handling.

Poem

In the garden where queries bloom,
The PgExecutor sheds old gloom.
With schemas now more defined,
New paths for data intertwined.
A hop, a skip, through code we weave,
In every change, new tales we leave! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
packages/cli/README.md Outdated Show resolved Hide resolved
Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
jimezesinachi and others added 3 commits October 3, 2024 09:33
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@@ -63,6 +63,17 @@
"type": "string",
"nullable": true
},
"items": {
Copy link
Collaborator

@fhur fhur Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I think you should just make the whole column type an "any" schema e.g.

{
    schemaDefOverrides: {
        someTable: {
            type: {} <======= any
        }
    }
}

Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)
packages/cli/src/tests/synthql.config.json (1)

21-31: New schema definition for public.film looks good, but consider adding documentation.

The new schema definition override for the public.film table's special_features field is correctly structured and aligns with the PR objective of adding an array type handler. The use of pg_catalog.text[] as the type ID and the nested array structure with pg_catalog.text items is appropriate for representing an array of text values in PostgreSQL.

Consider adding a comment or description field to explain the purpose and expected content of the special_features array. This would improve the configuration's self-documentation and help other developers understand the data structure. For example:

 "public.film": {
     "special_features": {
+        "description": "An array of special features associated with the film",
         "type": {
             "id": "pg_catalog.text[]",
             "type": "array",
             "items": {
                 "type": "pg_catalog.text"
             }
         }
     }
 },
packages/introspect/src/tests/pg_types.test.ts (2)

53-53: Consider optimizing test performance

The test case has a long timeout of 100,000 ms (100 seconds), which might indicate slow test execution. This could potentially slow down your CI/CD pipeline.

Investigate ways to optimize the test performance:

  1. Consider using a separate test database with a smaller dataset.
  2. Explore if any operations can be mocked or stubbed to reduce execution time.
  3. If possible, parallelize tests or break this large test into smaller, focused tests.

Improving test performance will lead to faster feedback cycles and more efficient CI/CD processes.


1-295: Overall: Well-structured and comprehensive test suite with room for improvements

This test suite for PostgreSQL data types is well-designed and thorough. It covers crucial aspects such as table creation, data insertion, schema generation, and data validation. The code is generally well-structured and includes consistent error handling.

Main areas for improvement:

  1. Security: Use environment variables for database credentials and implement parameterized queries to prevent SQL injection risks.
  2. Performance: Investigate ways to optimize test execution time, considering the long timeout.
  3. Type Safety: Enhance type definitions, especially in the checkRowSchema function, to leverage TypeScript's type system more effectively.
  4. Error Handling: Review the error handling strategy to avoid potential duplicate logging.
  5. Test Data: Diversify example values in the allKnownTypes function for more comprehensive test coverage.

Addressing these points will further enhance the quality and robustness of this test suite.

packages/introspect/src/generate.ts (4)

43-47: Consider clarifying the behavior for conflicting include/exclude specifications.

The addition of the excludeTablesAndViews property is a good feature for customizing the output. However, it would be helpful to clarify in the comment what happens if a table is specified in both includeTables and excludeTablesAndViews. For example:

 /**
  * The tables and/or views to exclude from generation.
  * e.g. `['agents', 'suppliers', 'agent_supplier']`
+ * Note: If a table is specified in both `includeTables` and `excludeTablesAndViews`,
+ * the exclusion takes precedence.
  */
 excludeTablesAndViews?: string[];

This clarification will help users understand the behavior when there are conflicting specifications.


Line range hint 479-693: LGTM! Comprehensive coverage of PostgreSQL data types

The addition of these PostgreSQL data types to the createWellKnownDefs function greatly improves the coverage and accuracy of the generated schema. The definitions are well-structured and include helpful descriptions.

A few suggestions for minor improvements:

  1. For pg_catalog.inet and pg_catalog.cidr, consider adding a format property with value "ipv4" or "ipv6" to enable additional validation.

  2. For pg_catalog.macaddr and pg_catalog.macaddr8, you could add a pattern property with a regular expression to validate the format.

  3. For pg_catalog.money and pg_catalog.numeric, consider adding a pattern property to ensure the string represents a valid number format.

Here's an example of how you might implement these suggestions:

 'pg_catalog.inet': {
     id: 'pg_catalog.inet',
     type: 'string',
+    format: 'ipv4',
     description: 'A PG inet',
 },
 // ...
 'pg_catalog.macaddr': {
     id: 'pg_catalog.macaddr',
     type: 'string',
+    pattern: '^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$',
     description: 'A PG macaddr',
 },
 // ...
 'pg_catalog.money': {
     id: 'pg_catalog.money',
     type: 'string',
+    pattern: '^-?\\d+(\\.\\d+)?$',
     description: [
         'A PG money.',
         'Note that values of the PG money type,',
         'are returned as strings from the database.',
         'This is because that is how they can be best',
         'accurately processed in JavaScript/TypeScript',
     ].join('\n'),
 },

These additions would provide more precise validation for these specific types.


748-751: Consider enhancing descriptions for specialized PostgreSQL types

The additions of pg_catalog.tsquery and pg_catalog.txid_snapshot are good, but these specialized PostgreSQL types might benefit from more detailed descriptions. Consider adding information about their typical use cases, format, or any special considerations when working with these types in JavaScript/TypeScript.

For example:

 'pg_catalog.tsquery': {
     id: 'pg_catalog.tsquery',
     type: 'string',
-    description: 'A PG tsquery',
+    description: 'A PG tsquery. Represents a text search query. Note that complex tsquery operations may not be fully representable in JSON and might require special handling.',
 },
 // ...
 'pg_catalog.txid_snapshot': {
     id: 'pg_catalog.txid_snapshot',
     type: 'string',
-    description: 'A PG txid_snapshot',
+    description: 'A PG txid_snapshot. Represents a snapshot of the transaction ID state. Format is typically "xmin:xmax:xip_list". May require parsing for meaningful use in application logic.',
 },

These more detailed descriptions would provide developers with better context for working with these specialized types.

Also applies to: 758-761


769-782: Enhance type definitions for improved validation

The additions of pg_catalog.varbit, pg_catalog.varchar, and pg_catalog.xml are good, but could be enhanced for better validation:

  1. For pg_catalog.varchar, consider adding a maxLength property. This could be set to a large number (e.g., 65535) to represent PostgreSQL's maximum varchar length, or left as a TODO for users to set based on their specific column definitions.

  2. For pg_catalog.xml, consider adding a contentMediaType property to specify that the content should be XML.

Here's a suggested implementation:

 'pg_catalog.varchar': {
     id: 'pg_catalog.varchar',
     type: 'string',
+    maxLength: 65535, // Or add a comment suggesting users adjust this based on their specific column definition
     description: 'A PG varchar',
 },
 'pg_catalog.xml': {
     id: 'pg_catalog.xml',
     type: 'string',
+    contentMediaType: 'application/xml',
     description: 'A PG xml',
 },

These additions would provide more precise validation for these types.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2b29e68 and a87f604.

📒 Files selected for processing (6)
  • packages/cli/src/tests/synthql.config.json (1 hunks)
  • packages/docs/static/reference/assets/navigation.js (1 hunks)
  • packages/docs/static/reference/assets/search.js (1 hunks)
  • packages/introspect/package.json (1 hunks)
  • packages/introspect/src/generate.ts (10 hunks)
  • packages/introspect/src/tests/pg_types.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/docs/static/reference/assets/navigation.js
  • packages/docs/static/reference/assets/search.js
  • packages/introspect/package.json
🔇 Additional comments (9)
packages/cli/src/tests/synthql.config.json (1)

Line range hint 1-48: Consider improving consistency across table definitions and verifying completeness.

While the new public.film schema override is correctly implemented, there are a couple of points to consider for improving the overall configuration:

  1. Consistency: Other table definitions (public.actor, public.customer, public.agents, luminovo.agents) include includable properties for first_name and last_name. Consider whether similar properties should be added to the public.film table for consistency, if applicable.

  2. Completeness: The PR objectives mention completing well-known definitions for all PG data types, but only one new type (pg_catalog.text[]) is added here. It might be worth verifying if there are other PG data types that need to be defined in this configuration file.

To verify the completeness of PG data type definitions, you can run the following script:

This script will help identify any PostgreSQL data types that are not yet defined in the configuration file.

packages/introspect/src/tests/pg_types.test.ts (3)

290-295: LGTM: Effective column name generation

The getColumnName function effectively generates unique and valid column names based on PostgreSQL data types. The use of regular expressions to replace spaces and remove special characters is appropriate and results in clean, consistent column names.


13-16: ⚠️ Potential issue

Use environment variables for database credentials

Hardcoding database credentials poses a security risk and makes it challenging to manage different environments. Consider using environment variables to store sensitive information.

Update the code to use environment variables:

 const client = new Pool({
-    connectionString: 'postgresql://postgres:postgres@localhost:5432/postgres',
+    connectionString: process.env.DATABASE_URL,
     max: 10,
 });

Ensure that DATABASE_URL is properly set in your environment configuration.

Likely invalid or redundant comment.


39-40: ⚠️ Potential issue

Use environment variables for database connection string

The database connection string is hardcoded again in the generate function call. For consistency and security, use environment variables instead of hardcoding.

Update the code as follows:

 const { schema } = await generate({
-    connectionString:
-        'postgresql://postgres:postgres@localhost:5432/postgres',
+    connectionString: process.env.DATABASE_URL,
     includeSchemas: ['public'],
     defaultSchema: 'public',
     includeTables: [tableName],
     outDir: '../introspect/src/tests/generated',
 });

Likely invalid or redundant comment.

packages/introspect/src/generate.ts (5)

8-11: LGTM! Improved type handling for tables and views.

The introduction of the TableOrViewColumn type and the update to the TableDefTransformer interface are good improvements. This change allows for more uniform handling of both table and view columns, which should lead to better code reuse and maintainability.

Also applies to: 19-23


Line range hint 69-78: LGTM! Consistent implementation of the new exclude feature.

The addition of the excludeTablesAndViews parameter with a default empty array is consistent with the interface update and maintains backwards compatibility. This is a good implementation of the new feature.


110-110: LGTM! Proper propagation of exclusion logic.

The addition of the excludeTablesAndViews parameter to the createRootJsonSchema function call ensures that the exclusion logic is correctly passed down to the schema creation process. This is consistent with the previous updates and completes the implementation of the new feature.


734-745: LGTM! Improved definition for pg_catalog.timetz

The update to the pg_catalog.timetz definition is well-done. The use of the 'time' format for the string type is appropriate, and the detailed description provides valuable information for users working with this data type. This change is consistent with the handling of other time-related types in the schema.


Line range hint 1-784: Overall assessment: Significant improvements with minor enhancement opportunities

This update greatly enhances the schema generation capabilities, particularly in handling array types and providing comprehensive coverage of PostgreSQL data types. The changes are well-implemented and improve the overall functionality of the schema generation process.

Key improvements:

  1. Addition of array type handling
  2. Comprehensive coverage of PostgreSQL data types
  3. Implementation of table/view exclusion feature

Areas for potential enhancement:

  1. Improve error handling in array type processing
  2. Simplify some of the filtering logic
  3. Enhance certain type definitions with more specific validation rules

These changes represent a substantial improvement to the codebase. Great work overall!

Comment on lines 56 to 71
function checkRowSchema(
schema: JSONSchema,
tableName: string,
row: unknown,
): void {
for (const [name, value] of Object.entries(row ?? {})) {
const columnType = (schema.properties?.[tableName] as any)?.properties
?.columns?.properties?.[name]?.properties?.type;

const { id, ...columnTypeWithoutId } = columnType || {};

const validateColumnTypeSchema = ajv.compile(columnTypeWithoutId);

expect(validateColumnTypeSchema(value)).toEqual(true);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve type safety and add explanatory comments

The checkRowSchema function uses type assertions and the any type, which reduces TypeScript's type safety benefits. Additionally, the reason for removing the id property isn't clear from the code.

  1. Define proper types for the schema object to enhance code reliability:
interface SchemaProperties {
  [key: string]: {
    properties?: {
      columns?: {
        properties?: {
          [key: string]: {
            properties?: {
              type: unknown;
            };
          };
        };
      };
    };
  };
}

function checkRowSchema(
    schema: JSONSchema,
    tableName: string,
    row: Record<string, unknown>,
): void {
    for (const [name, value] of Object.entries(row)) {
        const columnType = (schema.properties as SchemaProperties)?.[tableName]?.properties
            ?.columns?.properties?.[name]?.properties?.type;

        // ... rest of the function
    }
}
  1. Add a comment explaining why the id property is removed:
// Remove the 'id' property from the column type as it's not needed for validation
// and might cause issues with the Ajv schema compilation
const { id, ...columnTypeWithoutId } = columnType || {};

These changes will improve type safety and code clarity.

Comment on lines 73 to 91
const createTable = async () => {
const createTableQuery = `
CREATE TABLE IF NOT EXISTS all_data_types (
${allKnownTypes()
.map((column) => `${getColumnName(column.type)} ${column.type}`)
.join(',')}
);
`;

try {
const results = await client.query(createTableQuery);
console.log('Table created successfully!');

return results.rows;
} catch (err) {
console.error('Error creating table', err);
throw err;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use parameterized queries for better security

The current implementation uses template literals to construct SQL queries, which could potentially lead to SQL injection vulnerabilities if the input is not properly sanitized.

Consider using parameterized queries instead of string concatenation. For example, in the insertData function:

const insertDataQuery = `
    INSERT INTO all_data_types (${columns.join(',')})
    VALUES (${placeholders.join(',')});
`;

const values = allKnownTypes().map(column => column.examples[0]);

await client.query(insertDataQuery, values);

This approach separates the SQL query from the data, making it much more secure against SQL injection attacks.

Also applies to: 93-115, 117-136, 138-152

Comment on lines 82 to 90
try {
const results = await client.query(createTableQuery);
console.log('Table created successfully!');

return results.rows;
} catch (err) {
console.error('Error creating table', err);
throw err;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Review error handling strategy

The current error handling strategy in these functions logs the error and then throws it. This might lead to duplicate error logging if the error is caught and logged again at a higher level.

Consider one of these approaches:

  1. Only log the error and don't throw it, letting the caller handle the error:
try {
    const results = await client.query(createTableQuery);
    console.log('Table created successfully!');
    return results.rows;
} catch (err) {
    console.error('Error creating table', err);
    return null; // or a specific error value
}
  1. Or, only throw the error without logging, and handle logging at a higher level:
try {
    const results = await client.query(createTableQuery);
    return results.rows;
} catch (err) {
    throw new Error(`Error creating table: ${err.message}`);
}

Choose the approach that best fits your error handling strategy across the application.

Also applies to: 106-114, 127-135, 143-151

Comment on lines 154 to 288
],
},
{
type: 'JSON',
examples: [`'{"key": "value1"}'`, `'{"key": "value2"}'`],
},
{
type: 'JSONB',
examples: [`'{"key": "value1"}'`, `'{"key": "value2"}'`],
},
{ type: 'LINE', examples: ["'{1,-1,2}'", "'{1,-1,3}'"] },
{ type: 'LSEG', examples: ["'[(0,0),(1,1)]'", "'[(0,0),(2,2)]'"] },
{
type: 'MACADDR',
examples: ["'08:00:2b:01:02:03'", "'08:00:2b:01:02:04'"],
},
{
type: 'MACADDR8',
examples: [
"'08:00:2b:01:02:03:04:05'",
"'08:00:2b:01:02:03:04:06'",
],
},
{ type: 'MONEY', examples: ['19.98', '19.99'] },
{
type: 'NUMERIC(10, 2)',
examples: ['3.14159265358978', '3.14159265358979'],
},
{
type: 'PATH',
examples: ["'[(0,0),(1,1),(2,2)]'", "'[(0,0),(1,1),(3,3)]'"],
},
{ type: 'PG_LSN', examples: ["'16/B374D847'", "'16/B374D848'"] },
{
type: 'PG_SNAPSHOT',
examples: ["'10:20:10,14,15'", "'10:20:10,14,15'"],
},
{ type: 'POINT', examples: ["'(1,1)'", "'(1,1)'"] },
{
type: 'POLYGON',
examples: [
"'((0,0),(0,1),(1,1),(1,0))'",
"'((0,0),(0,1),(1,1),(1,0))'",
],
},
{ type: 'REAL', examples: ['3.14', '3.15'] },
{ type: 'SMALLINT', examples: ['-32768', '32767'] },
{ type: 'SMALLSERIAL', examples: ['1', '32767'] },
{ type: 'SERIAL', examples: ['1', '2147483647'] },
{
type: 'TEXT',
examples: [
"'This is a text column'",
"'This is another text column'",
],
},
{ type: 'TIME', examples: ["'04:05:06'", "'04:05:08'"] },
{
type: 'TIME WITH TIME ZONE',
examples: ["'04:05:06 PST'", "'04:05:08 PST'"],
},
{
type: 'TIMESTAMP',
examples: ["'2023-05-18 04:05:06'", "'2023-05-18 04:05:08'"],
},
{
type: 'TIMESTAMP WITH TIME ZONE',
examples: [
"'2023-05-18 04:05:06 PST'",
"'2023-05-18 04:05:08 PST'",
],
},
{ type: 'TSQUERY', examples: ["'fat & mat'", "'fat & pat'"] },
{
type: 'TSVECTOR',
examples: [
"'a fat cat sat on a mat and ate a fat rat'",
"'a fat cat sat on a mat and ate a fat tat'",
],
},
{
type: 'TXID_SNAPSHOT',
examples: ["'10:20:10,14,15'", "'10:20:10,14,15'"],
},
{
type: 'UUID',
examples: [
"'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'",
"'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'",
],
},
{
type: 'XML',
examples: [
"'<root><element>value</element></root>'",
"'<root><element>value</element></root>'",
],
},
];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Diversify example values for better test coverage

The allKnownTypes function provides a comprehensive list of PostgreSQL data types, which is excellent for testing. However, some data types have identical example values, which might not provide optimal test coverage.

Consider diversifying the example values for data types where they are currently identical. For instance:

function allKnownTypes(): Array<Column> {
    return [
        // ...
        {
            type: 'CIDR',
            examples: ["'192.168.100.128/25'", "'10.0.0.0/8'"],
        },
        // ...
        {
            type: 'PG_SNAPSHOT',
            examples: ["'10:20:10,14,15'", "'11:30:11,16,17'"],
        },
        // ...
        {
            type: 'UUID',
            examples: [
                "'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'",
                "'123e4567-e89b-12d3-a456-426614174000'",
            ],
        },
        // ...
    ];
}

This will provide more diverse test cases and potentially catch edge cases in your type handling.

Comment on lines +312 to +346
excludeTablesAndViews,
tableDefTransformers,
}: {
defaultSchema: string;
includeTables: string[];
excludeTablesAndViews: string[];
tableDefTransformers: Array<TableDefTransformer>;
},
): JSONSchema {
// Check if a list of tables is passed, and if so, use as filter
const allTables: TableOrView[] = Object.values(schemas).flatMap(
(schema) => schema.tables,
);
const allViews: TableOrView[] = Object.values(schemas).flatMap(
(schema) => schema.views,
);

const tablesAndViews: TableOrView[] = allTables.concat(allViews);
const allTablesAndViews: TableOrView[] = allTables.concat(allViews);

// Check if a list of tables/views is passed, and if so, use as filter
// The logic here is currently that `exclusion` takes precedence,
// so if a table is both included & excluded, the exclude takes precedence,
// so the table will NOT be included in the schema
let tables: TableOrView[] = allTablesAndViews;

if (includeTables.length > 0) {
tables = tables.filter((tableOrView) =>
includeTables.includes(tableOrView.name),
);
}

const tables =
includeTables.length === 0
? tablesAndViews
: tablesAndViews.filter((table) =>
includeTables.includes(table.name),
);
if (excludeTablesAndViews.length > 0) {
tables = tables.filter(
(tableOrView) => !excludeTablesAndViews.includes(tableOrView.name),
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify the filtering logic for better readability

The implementation of the inclusion and exclusion logic is correct, with exclusion taking precedence as intended. However, the nested if statements could be simplified for better readability. Consider using Array.filter with a single condition that combines both inclusion and exclusion criteria.

Here's a suggested refactoring:

-    let tables: TableOrView[] = allTablesAndViews;
-
-    if (includeTables.length > 0) {
-        tables = tables.filter((tableOrView) =>
-            includeTables.includes(tableOrView.name),
-        );
-    }
-
-    if (excludeTablesAndViews.length > 0) {
-        tables = tables.filter(
-            (tableOrView) => !excludeTablesAndViews.includes(tableOrView.name),
-        );
-    }
+    const tables = allTablesAndViews.filter((tableOrView) => 
+        (includeTables.length === 0 || includeTables.includes(tableOrView.name)) &&
+        !excludeTablesAndViews.includes(tableOrView.name)
+    );

This change combines both filtering conditions into a single filter call, making the logic more concise and easier to understand at a glance.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
excludeTablesAndViews,
tableDefTransformers,
}: {
defaultSchema: string;
includeTables: string[];
excludeTablesAndViews: string[];
tableDefTransformers: Array<TableDefTransformer>;
},
): JSONSchema {
// Check if a list of tables is passed, and if so, use as filter
const allTables: TableOrView[] = Object.values(schemas).flatMap(
(schema) => schema.tables,
);
const allViews: TableOrView[] = Object.values(schemas).flatMap(
(schema) => schema.views,
);
const tablesAndViews: TableOrView[] = allTables.concat(allViews);
const allTablesAndViews: TableOrView[] = allTables.concat(allViews);
// Check if a list of tables/views is passed, and if so, use as filter
// The logic here is currently that `exclusion` takes precedence,
// so if a table is both included & excluded, the exclude takes precedence,
// so the table will NOT be included in the schema
let tables: TableOrView[] = allTablesAndViews;
if (includeTables.length > 0) {
tables = tables.filter((tableOrView) =>
includeTables.includes(tableOrView.name),
);
}
const tables =
includeTables.length === 0
? tablesAndViews
: tablesAndViews.filter((table) =>
includeTables.includes(table.name),
);
if (excludeTablesAndViews.length > 0) {
tables = tables.filter(
(tableOrView) => !excludeTablesAndViews.includes(tableOrView.name),
);
}
excludeTablesAndViews,
tableDefTransformers,
}: {
defaultSchema: string;
includeTables: string[];
excludeTablesAndViews: string[];
tableDefTransformers: Array<TableDefTransformer>;
},
): JSONSchema {
const allTables: TableOrView[] = Object.values(schemas).flatMap(
(schema) => schema.tables,
);
const allViews: TableOrView[] = Object.values(schemas).flatMap(
(schema) => schema.views,
);
const allTablesAndViews: TableOrView[] = allTables.concat(allViews);
// Check if a list of tables/views is passed, and if so, use as filter
// The logic here is currently that `exclusion` takes precedence,
// so if a table is both included & excluded, the exclude takes precedence,
// so the table will NOT be included in the schema
const tables = allTablesAndViews.filter((tableOrView) =>
(includeTables.length === 0 || includeTables.includes(tableOrView.name)) &&
!excludeTablesAndViews.includes(tableOrView.name)
);

Comment on lines +260 to +305
function isExpandedTypeAnArray(expandedType: string): boolean {
return expandedType.endsWith('[]');
}

function createArrayDef(expandedType: string, fullName: string): JSONSchema {
const items = expandedType.split('[]');

// 1. Starting at the rightmost,
// create an array schema def,
// and move leftwards

// 2. Once the string left is
// an object pg type, embed
// the object properties

if (items[items.length - 1] === fullName) {
return createWellKnownDefs()[fullName];
} else if (items[items.length - 1] === '') {
return {
id: expandedType,
type: 'array',
items: createArrayDef(items.slice(0, -1).join('[]'), fullName),
};
}

console.warn(
`No type definition found for the pg_type: ${items[items.length - 1]}`,
);

// return `unknown` otherwise
return {};
}

function createColumnTypeDefOrRef(column: TableOrViewColumn) {
// TODO(fhur): for now, when a type is composite use the "unknown" type.
// In the future we should add support for composite types.
if (column.type.kind === 'composite') {
return {};
} else if (isExpandedTypeAnArray(column.expandedType)) {
return createArrayDef(column.expandedType, column.type.fullName);
}

return {
$ref: `#/$defs/${createTypeDefId(column.type)}`,
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve array type handling and error cases

The new functions for handling array types are a good addition. However, there are a few areas that could be improved:

  1. In createArrayDef, consider adding a maximum depth check to prevent potential infinite recursion for deeply nested arrays.

  2. The warning message in createArrayDef could be more informative. Consider including the full expandedType in the message.

  3. In createColumnTypeDefOrRef, returning an empty object for composite types might lead to unexpected behavior. Consider throwing an error or returning a more descriptive placeholder schema.

Here's a suggested implementation for createArrayDef:

 function createArrayDef(expandedType: string, fullName: string): JSONSchema {
+    const MAX_DEPTH = 10; // Adjust as needed
+    function createArrayDefRecursive(expandedType: string, fullName: string, depth: number): JSONSchema {
+        if (depth > MAX_DEPTH) {
+            throw new Error(`Maximum array nesting depth exceeded for type: ${expandedType}`);
+        }
         const items = expandedType.split('[]');
 
         if (items[items.length - 1] === fullName) {
             return createWellKnownDefs()[fullName];
         } else if (items[items.length - 1] === '') {
             return {
                 id: expandedType,
                 type: 'array',
-                items: createArrayDef(items.slice(0, -1).join('[]'), fullName),
+                items: createArrayDefRecursive(items.slice(0, -1).join('[]'), fullName, depth + 1),
             };
         }
 
         console.warn(
-            `No type definition found for the pg_type: ${items[items.length - 1]}`,
+            `No type definition found for the pg_type: ${items[items.length - 1]} in expanded type: ${expandedType}`,
         );
 
-        // return `unknown` otherwise
-        return {};
+        return { type: 'unknown' };
+    }
+    return createArrayDefRecursive(expandedType, fullName, 0);
 }

And for createColumnTypeDefOrRef:

 function createColumnTypeDefOrRef(column: TableOrViewColumn) {
     if (column.type.kind === 'composite') {
-        return {};
+        return {
+            type: 'object',
+            description: `Composite type: ${column.type.fullName}`,
+            additionalProperties: true
+        };
     } else if (isExpandedTypeAnArray(column.expandedType)) {
         return createArrayDef(column.expandedType, column.type.fullName);
     }
 
     return {
         $ref: `#/$defs/${createTypeDefId(column.type)}`,
     };
 }

These changes should improve the robustness and clarity of the array type handling.

Committable suggestion was skipped due to low confidence.

@fhur fhur merged commit 4102af5 into master Oct 4, 2024
3 of 4 checks passed
@fhur fhur deleted the feat/fix-generated-types branch October 4, 2024 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants