To get started with developing SQL queries for using the SQL Query Export, you will need to have familiarity with basic SQL query syntax. The following concepts will be key to writing basic queries against a relational, dimensional model:
- Basic SQL syntax, including:
- JOINs, particularly natural joins (you will rarely need to use an outer join)
- GROUP BY expressions, with built in aggregates such as MIN, MAX, COUNT, etc.
- PostgreSQL 9.0 SQL syntax, if you need non-standard SQL syntax
- PostgreSQL functions and operators (for translating and formatting data, particularly numeric and date outputs)
- Basic dimensional modeling and data warehousing concepts, the Kimball Group Dimensional Modeling Techniques page is a great resource
Step 1: Read the Manual
Yes, there is documentation. Yes, you will need to read it. The Reporting Data Model that the SQL Query Export is built on provides an Application Programming Interface (API) through a set of relational tables and functions. As a result, the facts and dimensions in this model have well-defined documentation for their names, data types and relationships. Open your in-product help links or the Nexpose User's Guide and read the Creating reports based on SQL queries topic. The documentation is split into four parts:
- Introduction - gives a basic overview of dimensional modeling, terms, data types, and a quick primer to writing your SQL queries.
- Facts - provides the API for all facts exposed within the dimensional model, including the name and type of each column, with a graphical star/snowflake dimensional schema for each fact table
- Dimensions - provides the API for all dimensions exposed with the dimensional model,
- Functions - provides the API for several helper functions that will be useful to write more complex queries, particularly historical and trending queries.
Step 2: Run a Sample Query
Several example queries are built into the product. Try running a report with one of these samples to get familiar with the process. Then take a query and try to customize it.
Step 3: Write Your First Query
Finally, jump in. The best way to learn is by doing. If you get stuck, have trouble, or still don't know where to start, post a discussion question or search for help in the Nexpose Reporting side street.
The following are some simple best practices and design tips to keep in mind when writing your SQL queries:
You typically will only need to use natural JOINs when writing your queries. There may be rare cases when a left, right, or outer join is required, but this will typically not be the case. When performing a JOIN, you will notice that the JOIN should always be made with the same column name. All primary keys and foreign keys that should be joined have the same column name. This helps find simple joining bugs where you accidentally join a column from one table to another, but there is in fact no relationship to join on (SQL doesn't prevent you from doing this). This also make using the USING keyword simple and easy.
PostgreSQL supports the USING expression and it is highly recommend. This short-hand notation for JOIN will simplify the queries, and make them much more readable. Our examples will use them, so we recommend your queries do as well.
Use clear, verbose, and readable aliases for column names. Not only does this make the output more clear in the CSV, but it helps you as the SQL designer to see what the output should be. For example, if your output contains an asset name and a site name column, don't use the "name" alias twice. Instead, prefer names like "site_name" and "asset_name" to distinguish them. When aliasing table names, we tend to prefer using the first letter of each table name. So fact_asset_vulnerability_finding is aliased to favf. Feel free to use your own style, but it should be clear and easy to read.
You will notice that our examples use what we think is a clear, readable style. This includes indentation, and upper-casing SQL keywords. Using style like this can really help the readability, particularly for more complex queries with nesting and sub-selects.
The WITH expression can also be used to enhance style. For particularly complex and long queries, the WITH expression can be used to divide a large query into smaller, more logical pieces with clear names. In some cases, the WITH expression can also be used to optimize the performance of a query if the same table is referenced multiple times.
We strongly recommend you build your queries in small iterations. Typically in a dimensional model, you'll start with the fact table. This will usually answer a business question (e.g. what vulnerabilities do I have in my site?). From there, try to format, manipulate and translate your numerical data first. Then move on to dimensional data. Join the dimension tables as required one by one, and add filtering, grouping and ordering as necessary.
Building queries with the scope of all your assets can be time consuming and inefficient. Unless you specify a scope filter, the query will run against all assets you have permissions to access. When building a query, use the scope filters in the report configuration to reduce the size of your results so the response time is fast in your development and testing cycle. Here are the basic steps:
- Create a new report configuration
- Select a subset of scope to test against (a single site, asset or asset group - the smaller the better)
- Edit, preview and save the contents of your query
- Modify your scope to your final preference
- Save the report and run it