Tired of searching Slack, GitHub, and Google Drive separately? Do it all at once in SQL

0
7
Tired of searching Slack, GitHub, and Google Drive separately? Do it all at once in SQL

You know the drill: The words you are looking for might be in Slack, or GitHub, or Google Drive, or Google Sheets, or Zendesk, or … the list goes on. Searching across these silos is a common frustration. It should be frictionless, and this Steampipe dashboard makes it so.

This wasn’t my first rodeo. I started this journey in 1996 and have revisited the idea periodically. In 2018 I wrote about a version that was the classic example of The Simplest Thing That Could Possibly Work: a web page that corrals the search URLs for various services and visits each in its own tab. As silly as that sounds it was helpful enough to get used a bit, and not just by me.

Of course I wanted to use the underlying APIs, normalize the results, and merge them into a common view. But the effort required to wrangle all the APIs made that project more trouble than it was worth. If you’ve done this kind of thing before you know that most services provide search APIs along with adapters for your preferred programming language. But each service will have its own way of calling the API, paginating results, and formatting them. Those differences create friction you need to overcome in order to work with the results in a consistent way.

When API wrangling becomes frictionless, though, many things become possible. Effective metasearch is one of them. Steampipe gets you out of the business of calling APIs, paginating results, and unpacking JSON objects. It calls the APIs for you and streams the results into database tables so you can focus entirely on working with the data. That solves the biggest problem you face when building a metasearch dashboard.

Converging on a schema

The next challenge is to bind search results to a common schema. SQL is a great environment in which to do that. The query that drives the dashboard shown in the screencast includes three stanzas that you don’t have to be a SQL wizard to write. They all follow the same pattern as this one for searching GitHub issues.

select
   'github_issue' as type,
   repository_full_name || ' ' || title as source,
   to_char(created_at, 'YYYY-MM-DD') as date,
   html_url as link,
   substring(body from 1 for 200) || '...' as content
from
   github_search_issue
where
   $1 ~ 'github_issue'
   and query = 'in:body in:comments org:github ' || $2
   limit $3

Items in blue are the names of columns in a database table—in this case github_search_issue, one of the tables made by Steampipe’s GitHub plugin. The Steampipe hub makes it easy to inspect the names and descriptions of the columns in the table, and shows you examples of how to use the information in the table.

Because fetching the data doesn’t require calling APIs and unpacking JSON, you can focus on higher-order search syntax, which is plenty to think about, along with the interesting (and fun!) challenge of mapping source columns to a common schema.

Items in red are the names of the columns that show up in the dashboard. For this dashboard we’ve decided each search result will map to these five columns: type, source, date, link, and content. SQL’s AS clause makes it easy for each stanza to rename its columns to match the schema.

The full query

Here’s the full query that drives the dashboard. There are three stanzas like the one above, each written as a CTE (common table expression) with parameters corresponding to input variables. And there’s almost nothing else! Each stanza queries an API-based table (slack_search, github_search_issue, googleworkspace_drive_my_file), selects (and maybe transforms) columns, then aliases the results to match the schema. All that’s left is to UNION the three CTEs, which act like temporary tables, and order the results.

with slack as (
  select
    'slack' as type,
    user_name || ' in #' || (channel ->> 'name')::text as source,
    to_char(timestamp, 'YYYY-MM-DD') as date,
    permalink as link,
    substring(text from 1 for 200) as content
  from
    slack_search
  where
    $1 ~ 'slack'
    and query = 'in:#steampipe after:${local.config.slack_date} ' || $2
  limit $3
),
github_issue as (
  select
    'github_issue' as type,
    repository_full_name || ' ' || title as source,
    to_char(created_at, 'YYYY-MM-DD') as date,
    html_url as link,
    substring(body from 1 for 200) || '...' as content
  from
    github_search_issue
  where
    $1 ~ 'github_issue'
    and query = ' in:body in:comments org:${local.config.github_org} ' || $2
  limit $3
),
gdrive as (
  select
    'gdrive' as type,
    replace(mime_type,'application/vnd.google-apps.','') as source,
    to_char(created_time, 'YYYY-MM-DD') as date,
    'https://docs.google.com/document/d/' || id as link,
    name as content
  from
    googleworkspace_drive_my_file
  where
    $1 ~ 'gdrive'
    and query = 'fullText contains ' || '''' || $2 || ''''
  limit $3
)

select * from slack
union 
select * from github_issue
union 
select * from gdrive

order by
  date desc

Dashboards as code

Many dashboard systems can work with this query. You can, for example, connect Metabase, or Tableau, or another Postgres client to Steampipe and build the same kind of interactive dashboard as shown here. You’d do that work in a low-code environment where widgets and settings are handled in a user interface. Steampipe’s dashboard subsystem takes a different approach informed by its infrastructure-as-code (IaC) roots. Queries against APIs should be expressed in SQL code that’s managed, like all other code, in version-controlled repositories. The dashboard widgets that display the results of those queries should likewise be expressed in code, and in this case the language is Terraform’s HCL.

Here’s the HCL definition of the metasearch dashboard. It declares three kinds of input block: sources (multi-select), search_term (text), and max_per_source (single-select, which is the default). You can do much more with the input block—notably, you can fill it with results from a SQL query, as shown in the documentation. That’s not needed here, though.

The table block uses the query defined above, and defines the parameters passed to it. The wrap argument ensures that columns with lots of text will be readable.

dashboard "metasearch" {

  input "sources" {
    title = "sources"
    type = "multiselect"
    width = 2
    option "slack" {} 
    option "github_issue" {}
    option "gdrive" {}
  }  

  input "search_term" {
    type = "text"
    width = 2
    title = "search term"
  }

  input "max_per_source" {
    title = "max per source"
    width = 2
    option "2" {}
    option "5" {}
    option "10" {}   
    option "20" {}
  }  

  table {
    title = "search slack + github + gdrive"
    query = query.metasearch
    args = [
      self.input.sources,
      self.input.search_term,
      self.input.max_per_source
    ]
    column "source" {
      wrap = "all"
    }
    column "link" {
      wrap = "all"
    }
    column "content" {
      wrap = "all"
    }
  }

}

Again there’s not much else to see here, nor should there be. Building dashboards as code shouldn’t require lots of complex code, and it doesn’t.

No wizardry required

Just as you don’t need to be a SQL wizard to create new subqueries, you also don’t need to be an HCL wizard to add them to the dashboard. Would you like to add sources? There are dozens of other plugins to choose from, with more added each month. They don’t all offer search but many do, and it’s easy to find them with (of course!) a Steampipe query.

select
  name
  html_url
from
  github_search_code
where
  query = 'search org:turbot org:francois2metz org:ellisvalentiner org:theapsgroup'
  and name ~ 'table'
  and name ~ 'search'
order by
  name

In the steampipe-samples repo we’ve included the code for the dashboard shown here, along with an extra search stanza for Zendesk that we removed when our trial account expired. Have fun extending this dashboard! If a search API you need isn’t already available, drop by our Slack community and let us know. Somebody might already be writing the plugin you need—or maybe you’d like to tackle that yourself. Every new plugin makes it possible for anyone who can work with basic HCL plus SQL to wield APIs like a pro and solve real problems.

Go to Publisher: InfoWorld
Author: