Skip to content

C++: Convert remaining CSV models to .model.yml and remove CSV model infrastructure#21585

Open
Copilot wants to merge 8 commits intomainfrom
copilot/convert-models-to-yml
Open

C++: Convert remaining CSV models to .model.yml and remove CSV model infrastructure#21585
Copilot wants to merge 8 commits intomainfrom
copilot/convert-models-to-yml

Conversation

Copy link
Contributor

Copilot AI commented Mar 26, 2026

Migrates the last two QL files specifying models as inline CSV to .model.yml data extension files in ext/, and removes the entire CSV model infrastructure from ExternalFlow.qll.

Model migrations

  • ZMQ.qllext/ZMQ.model.yml: Moved 3 source, 3 sink, and 2 summary models for ZeroMQ. Deleted ZMQ.qll (now empty) and removed its import from Models.qll.
  • Gets.qllext/getc.model.yml: Extracted 13 getc-family source models. Other classes in Gets.qll (FgetsFunction, GetsFunction) remain as they use the older QL modeling interfaces.
  • Test models: Converted the models-as-data test models from CSV classes in testModels.qll to a testModels.ext.yml data extension file.

CSV infrastructure removal

  • Removed classes: SourceModelCsv, SinkModelCsv, SummaryModelCsv from ExternalFlow.qll (and the codeql.util.Unit import they required).
  • Removed predicates: Single-argument sourceModel(string row), sinkModel(string row), summaryModel(string row).
  • Simplified MadInput: Removed all additionalSourceModel, additionalSinkModel, additionalSummaryModel CSV-parsing predicates. Now only contains namespaceSegmentSeparator, following the pattern used by Go and C#.
  • Removed CSV-specific validation: getInvalidModelSubtype() and getInvalidModelColumnCount() from the CsvValidation module. All extensible-predicate-based validations are retained.
  • Updated documentation: Changed doc comments from "CSV format" to "data extension files".

After this change, zero CSV model classes or CSV parsing infrastructure remain in cpp/ql/lib/. All models are specified via .model.yml data extensions and extensible predicates.


📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

Migrate ZeroMQ models from ZMQ.qll and getc-family source models
from Gets.qll into new .model.yml files in the ext/ directory.

Agent-Logs-Url: https://github.com/github/codeql/sessions/da8f5e5b-35f7-47a4-afa0-750616e3df5b

Co-authored-by: owen-mc <62447351+owen-mc@users.noreply.github.com>
Copilot AI changed the title [WIP] Convert models from CSV to .model.yml files C++: Convert remaining CSV models in QL files to .model.yml data extensions Mar 26, 2026
Copilot AI requested a review from owen-mc March 26, 2026 12:39
@owen-mc
Copy link
Contributor

owen-mc commented Mar 26, 2026

@copilot That's good. Please also remove the definitions of SourceModelCsv and everything else that only exists for specifying models using csv in ql. Note that we need to keep everything related to specifying models using extensible predicates and .model.yml files.

Remove SourceModelCsv, SinkModelCsv, SummaryModelCsv classes,
single-argument CSV predicates, CSV parsing in MadInput, and
CSV-specific validation checks. Simplify MadInput to only contain
the namespace separator. Convert test models to .ext.yml format.

Agent-Logs-Url: https://github.com/github/codeql/sessions/89ff81fe-5585-446d-99e2-6fe6966495c5

Co-authored-by: owen-mc <62447351+owen-mc@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 26, 2026

@copilot That's good. Please also remove the definitions of SourceModelCsv and everything else that only exists for specifying models using csv in ql. Note that we need to keep everyth...

Done in d69bcca. Removed SourceModelCsv, SinkModelCsv, SummaryModelCsv classes, the single-argument sourceModel(row)/sinkModel(row)/summaryModel(row) predicates, all CSV parsing code in MadInput (simplified to only namespaceSegmentSeparator), and CSV-specific validation checks (getInvalidModelSubtype, getInvalidModelColumnCount). Also converted the test models from CSV classes to a .ext.yml file and removed the codeql.util.Unit import.

Copilot AI changed the title C++: Convert remaining CSV models in QL files to .model.yml data extensions C++: Convert remaining CSV models to .model.yml and remove CSV model infrastructure Mar 26, 2026
@owen-mc owen-mc force-pushed the copilot/convert-models-to-yml branch from 452d485 to 7cc689f Compare March 26, 2026 16:40
@owen-mc owen-mc force-pushed the copilot/convert-models-to-yml branch from 7cc689f to 8a99ef4 Compare March 26, 2026 16:45
@owen-mc owen-mc marked this pull request as ready for review March 26, 2026 17:03
@owen-mc owen-mc requested a review from a team as a code owner March 26, 2026 17:03
Copilot AI review requested due to automatic review settings March 26, 2026 17:03
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval doesn't count because I collaborated with the coding agent, but I wanted to show that I checked everything that it did.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the C++ Models-as-Data migration by moving the last remaining inline-CSV models into .model.yml/.ext.yml data extensions and removing the legacy CSV model infrastructure from ExternalFlow.qll, with corresponding test updates.

Changes:

  • Migrate ZeroMQ and getc-family models from inline CSV-in-QL to .model.yml files under cpp/ql/lib/ext/.
  • Remove CSV model classes/parsing/validation entry points from cpp/ql/lib/semmle/code/cpp/dataflow/ExternalFlow.qll and switch validation users to the new ModelValidation module.
  • Update query tests and library tests to use inline expectations / model pretty-printing and a YAML-based test model extension file.

Reviewed changes

Copilot reviewed 24 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/tests_sysconf.cpp Updates inline expectations for CWE-497 test cases.
cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/tests_sockets.cpp Adds inline $ Source / $ Alert annotations for socket-flow cases.
cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/tests2.cpp Annotates additional sources/alerts (including ZeroMQ sinks) for inline expectation testing.
cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/ExposedSystemData.qlref Converts qlref to YAML format and adds postprocessing for inline expectations/model printing.
cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/ExposedSystemData.expected Updates expected output to include model provenance and inline-expectations postprocess results.
cpp/ql/test/library-tests/dataflow/models-as-data/testModels.qll Removes CSV-based test model definitions.
cpp/ql/test/library-tests/dataflow/models-as-data/testModels.ql Consolidates models-as-data library tests into a single QL test entry point.
cpp/ql/test/library-tests/dataflow/models-as-data/testModels.ext.yml Adds YAML data extensions for models-as-data library-test models.
cpp/ql/test/library-tests/dataflow/models-as-data/testModels.expected Updates expected results to match the consolidated test and YAML extension models.
cpp/ql/test/library-tests/dataflow/models-as-data/taint.ql Removes now-redundant test entry point (consolidated into testModels.ql).
cpp/ql/test/library-tests/dataflow/models-as-data/interpretElement.ql Removes now-redundant test entry point (consolidated into testModels.ql).
cpp/ql/test/library-tests/dataflow/models-as-data/consistency.ql Removes now-redundant test entry point (consolidated into testModels.ql).
cpp/ql/test/library-tests/dataflow/models-as-data/consistency.expected Removes expected output for deleted test entry point.
cpp/ql/test/library-tests/dataflow/models-as-data/SummaryCall.ql Removes now-redundant test entry point (consolidated into testModels.ql).
cpp/ql/test/library-tests/dataflow/models-as-data/SummaryCall.expected Removes expected output for deleted test entry point.
cpp/ql/test/library-tests/dataflow/models-as-data/FlowSummaryNode.ql Removes now-redundant test entry point (consolidated into testModels.ql).
cpp/ql/test/library-tests/dataflow/external-models/validatemodels.ql Updates validation import to the renamed validation module in ExternalFlow.
cpp/ql/lib/semmle/code/cpp/models/implementations/ZMQ.qll Removes inline CSV-based ZeroMQ models (migrated to .model.yml).
cpp/ql/lib/semmle/code/cpp/models/implementations/Gets.qll Removes inline CSV-based getc-family models (migrated to .model.yml).
cpp/ql/lib/semmle/code/cpp/models/Models.qll Stops importing the removed implementations.ZMQ module.
cpp/ql/lib/semmle/code/cpp/dataflow/ExternalFlow.qll Removes CSV infrastructure and renames validation module to reflect MaD-only modeling.
cpp/ql/lib/ext/getc.model.yml Adds .model.yml source models for getc-family functions.
cpp/ql/lib/ext/ZMQ.model.yml Adds .model.yml source/sink/summary models for ZeroMQ APIs.
cpp/ql/lib/change-notes/2026-03-26-convert-csv-models-to-yml.md Adds breaking-change note documenting removal of CSV model infrastructure.
Comments suppressed due to low confidence (1)

cpp/ql/lib/semmle/code/cpp/dataflow/ExternalFlow.qll:11

  • The header comment’s “columns” lists don’t match the current extensible predicate signatures used in this file. In particular, the models-as-data relations include additional trailing columns (for example provenance and model) that aren’t documented here. Please update these column lists to reflect the current schema used by the .model.yml files in ext/.
 * The extensible relations have the following columns:
 * - Sources:
 *   `namespace; type; subtypes; name; signature; ext; output; kind`
 * - Sinks:
 *   `namespace; type; subtypes; name; signature; ext; input; kind`

- ["", "", False, "madArg0ToReturnFieldIndirect", "", "", "Argument[0]", "ReturnValue.Field[*ptr]", "taint", "manual"]
- ["", "", False, "madFieldToFieldVar", "", "", "Field[value]", "Field[value2]", "taint", "manual"]
- ["", "", False, "madFieldToIndirectFieldVar", "", "", "Field[value]", "Field[*ptr]", "taint", "manual"]
- ["", "", False, "madIndirectFieldToFieldVar", "", "", "", "Field[value]", "Field[value2]", "manual"]
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testModels.ext.yml, the summaryModel row for madIndirectFieldToFieldVar appears to have its columns shifted: the input is empty, output is Field[value], and kind becomes Field[value2] (with no taint/value kind). This looks inconsistent with the documented column order in the file and with the surrounding summary rows, and may cause kind validation / model interpretation issues. Please correct the row so that input, output, and kind are in the expected positions (and include the intended kind, likely taint).

Suggested change
- ["", "", False, "madIndirectFieldToFieldVar", "", "", "", "Field[value]", "Field[value2]", "manual"]
- ["", "", False, "madIndirectFieldToFieldVar", "", "", "Field[*ptr]", "Field[value2]", "taint", "manual"]

Copilot uses AI. Check for mistakes.
---
category: breaking
---
* ZeroMQ and `getc`-family models have been migrated from inline CSV specifications in QL files to `.model.yml` data extension files in the `ext/` directory. The `SourceModelCsv`, `SinkModelCsv`, and `SummaryModelCsv` classes and the associated CSV parsing infrastructure have been removed from `ExternalFlow.qll`. New models should be added as `.model.yml` files in the `ext/` directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's relevant to mentioned that some models have been moved away from .ql files to data extensions. There's no observable behavior change here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants