C++: Convert remaining CSV models to .model.yml and remove CSV model infrastructure#21585
C++: Convert remaining CSV models to .model.yml and remove CSV model infrastructure#21585
Conversation
Migrate ZeroMQ models from ZMQ.qll and getc-family source models from Gets.qll into new .model.yml files in the ext/ directory. Agent-Logs-Url: https://github.com/github/codeql/sessions/da8f5e5b-35f7-47a4-afa0-750616e3df5b Co-authored-by: owen-mc <62447351+owen-mc@users.noreply.github.com>
|
@copilot That's good. Please also remove the definitions of SourceModelCsv and everything else that only exists for specifying models using csv in ql. Note that we need to keep everything related to specifying models using extensible predicates and |
cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/ExposedSystemData.qlref
Fixed
Show fixed
Hide fixed
Remove SourceModelCsv, SinkModelCsv, SummaryModelCsv classes, single-argument CSV predicates, CSV parsing in MadInput, and CSV-specific validation checks. Simplify MadInput to only contain the namespace separator. Convert test models to .ext.yml format. Agent-Logs-Url: https://github.com/github/codeql/sessions/89ff81fe-5585-446d-99e2-6fe6966495c5 Co-authored-by: owen-mc <62447351+owen-mc@users.noreply.github.com>
Done in d69bcca. Removed |
452d485 to
7cc689f
Compare
7cc689f to
8a99ef4
Compare
owen-mc
left a comment
There was a problem hiding this comment.
My approval doesn't count because I collaborated with the coding agent, but I wanted to show that I checked everything that it did.
There was a problem hiding this comment.
Pull request overview
This PR completes the C++ Models-as-Data migration by moving the last remaining inline-CSV models into .model.yml/.ext.yml data extensions and removing the legacy CSV model infrastructure from ExternalFlow.qll, with corresponding test updates.
Changes:
- Migrate ZeroMQ and
getc-family models from inline CSV-in-QL to.model.ymlfiles undercpp/ql/lib/ext/. - Remove CSV model classes/parsing/validation entry points from
cpp/ql/lib/semmle/code/cpp/dataflow/ExternalFlow.qlland switch validation users to the newModelValidationmodule. - Update query tests and library tests to use inline expectations / model pretty-printing and a YAML-based test model extension file.
Reviewed changes
Copilot reviewed 24 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/tests_sysconf.cpp | Updates inline expectations for CWE-497 test cases. |
| cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/tests_sockets.cpp | Adds inline $ Source / $ Alert annotations for socket-flow cases. |
| cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/tests2.cpp | Annotates additional sources/alerts (including ZeroMQ sinks) for inline expectation testing. |
| cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/ExposedSystemData.qlref | Converts qlref to YAML format and adds postprocessing for inline expectations/model printing. |
| cpp/ql/test/query-tests/Security/CWE/CWE-497/semmle/tests/ExposedSystemData.expected | Updates expected output to include model provenance and inline-expectations postprocess results. |
| cpp/ql/test/library-tests/dataflow/models-as-data/testModels.qll | Removes CSV-based test model definitions. |
| cpp/ql/test/library-tests/dataflow/models-as-data/testModels.ql | Consolidates models-as-data library tests into a single QL test entry point. |
| cpp/ql/test/library-tests/dataflow/models-as-data/testModels.ext.yml | Adds YAML data extensions for models-as-data library-test models. |
| cpp/ql/test/library-tests/dataflow/models-as-data/testModels.expected | Updates expected results to match the consolidated test and YAML extension models. |
| cpp/ql/test/library-tests/dataflow/models-as-data/taint.ql | Removes now-redundant test entry point (consolidated into testModels.ql). |
| cpp/ql/test/library-tests/dataflow/models-as-data/interpretElement.ql | Removes now-redundant test entry point (consolidated into testModels.ql). |
| cpp/ql/test/library-tests/dataflow/models-as-data/consistency.ql | Removes now-redundant test entry point (consolidated into testModels.ql). |
| cpp/ql/test/library-tests/dataflow/models-as-data/consistency.expected | Removes expected output for deleted test entry point. |
| cpp/ql/test/library-tests/dataflow/models-as-data/SummaryCall.ql | Removes now-redundant test entry point (consolidated into testModels.ql). |
| cpp/ql/test/library-tests/dataflow/models-as-data/SummaryCall.expected | Removes expected output for deleted test entry point. |
| cpp/ql/test/library-tests/dataflow/models-as-data/FlowSummaryNode.ql | Removes now-redundant test entry point (consolidated into testModels.ql). |
| cpp/ql/test/library-tests/dataflow/external-models/validatemodels.ql | Updates validation import to the renamed validation module in ExternalFlow. |
| cpp/ql/lib/semmle/code/cpp/models/implementations/ZMQ.qll | Removes inline CSV-based ZeroMQ models (migrated to .model.yml). |
| cpp/ql/lib/semmle/code/cpp/models/implementations/Gets.qll | Removes inline CSV-based getc-family models (migrated to .model.yml). |
| cpp/ql/lib/semmle/code/cpp/models/Models.qll | Stops importing the removed implementations.ZMQ module. |
| cpp/ql/lib/semmle/code/cpp/dataflow/ExternalFlow.qll | Removes CSV infrastructure and renames validation module to reflect MaD-only modeling. |
| cpp/ql/lib/ext/getc.model.yml | Adds .model.yml source models for getc-family functions. |
| cpp/ql/lib/ext/ZMQ.model.yml | Adds .model.yml source/sink/summary models for ZeroMQ APIs. |
| cpp/ql/lib/change-notes/2026-03-26-convert-csv-models-to-yml.md | Adds breaking-change note documenting removal of CSV model infrastructure. |
Comments suppressed due to low confidence (1)
cpp/ql/lib/semmle/code/cpp/dataflow/ExternalFlow.qll:11
- The header comment’s “columns” lists don’t match the current extensible predicate signatures used in this file. In particular, the models-as-data relations include additional trailing columns (for example
provenanceandmodel) that aren’t documented here. Please update these column lists to reflect the current schema used by the.model.ymlfiles inext/.
* The extensible relations have the following columns:
* - Sources:
* `namespace; type; subtypes; name; signature; ext; output; kind`
* - Sinks:
* `namespace; type; subtypes; name; signature; ext; input; kind`
| - ["", "", False, "madArg0ToReturnFieldIndirect", "", "", "Argument[0]", "ReturnValue.Field[*ptr]", "taint", "manual"] | ||
| - ["", "", False, "madFieldToFieldVar", "", "", "Field[value]", "Field[value2]", "taint", "manual"] | ||
| - ["", "", False, "madFieldToIndirectFieldVar", "", "", "Field[value]", "Field[*ptr]", "taint", "manual"] | ||
| - ["", "", False, "madIndirectFieldToFieldVar", "", "", "", "Field[value]", "Field[value2]", "manual"] |
There was a problem hiding this comment.
In testModels.ext.yml, the summaryModel row for madIndirectFieldToFieldVar appears to have its columns shifted: the input is empty, output is Field[value], and kind becomes Field[value2] (with no taint/value kind). This looks inconsistent with the documented column order in the file and with the surrounding summary rows, and may cause kind validation / model interpretation issues. Please correct the row so that input, output, and kind are in the expected positions (and include the intended kind, likely taint).
| - ["", "", False, "madIndirectFieldToFieldVar", "", "", "", "Field[value]", "Field[value2]", "manual"] | |
| - ["", "", False, "madIndirectFieldToFieldVar", "", "", "Field[*ptr]", "Field[value2]", "taint", "manual"] |
| --- | ||
| category: breaking | ||
| --- | ||
| * ZeroMQ and `getc`-family models have been migrated from inline CSV specifications in QL files to `.model.yml` data extension files in the `ext/` directory. The `SourceModelCsv`, `SinkModelCsv`, and `SummaryModelCsv` classes and the associated CSV parsing infrastructure have been removed from `ExternalFlow.qll`. New models should be added as `.model.yml` files in the `ext/` directory. |
There was a problem hiding this comment.
I don't think it's relevant to mentioned that some models have been moved away from .ql files to data extensions. There's no observable behavior change here
Migrates the last two QL files specifying models as inline CSV to
.model.ymldata extension files inext/, and removes the entire CSV model infrastructure fromExternalFlow.qll.Model migrations
ZMQ.qll→ext/ZMQ.model.yml: Moved 3 source, 3 sink, and 2 summary models for ZeroMQ. DeletedZMQ.qll(now empty) and removed its import fromModels.qll.Gets.qll→ext/getc.model.yml: Extracted 13getc-family source models. Other classes inGets.qll(FgetsFunction,GetsFunction) remain as they use the older QL modeling interfaces.models-as-datatest models from CSV classes intestModels.qllto atestModels.ext.ymldata extension file.CSV infrastructure removal
SourceModelCsv,SinkModelCsv,SummaryModelCsvfromExternalFlow.qll(and thecodeql.util.Unitimport they required).sourceModel(string row),sinkModel(string row),summaryModel(string row).MadInput: Removed alladditionalSourceModel,additionalSinkModel,additionalSummaryModelCSV-parsing predicates. Now only containsnamespaceSegmentSeparator, following the pattern used by Go and C#.getInvalidModelSubtype()andgetInvalidModelColumnCount()from theCsvValidationmodule. All extensible-predicate-based validations are retained.After this change, zero CSV model classes or CSV parsing infrastructure remain in
cpp/ql/lib/. All models are specified via.model.ymldata extensions and extensible predicates.📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.