incava.org

tutorial

installing

glark requires the Ruby language to run. Ruby is included with recent Linux and BSD (including OS X) distributions. To check, run:

% ruby -v

If you get a "command not found" error, install Ruby using the package management tool for your operation system, such as APT, RPM, or Yum. Windows users can use the one-click installer instead, although please note that one of the primary features of glark -- highlighting matches -- is not supported by the Windows command prompt.

With Ruby installed, download an RPM or tarball of glark from incava.org. If the site is slow, you can alternatively use a mirror site at SourceForge.net.

Debian users can find a .deb file thanks to the very much appreciated work of Michael Ablassmeier.

Verify that glark is installed by running:

% glark -v

The output should read something like:

glark, version x.y.z
Written by Jeff Pace (jpace@incava.org).
Released under the Lesser GNU Public License.

If it does, we're ready to start. If not, check your path. In either case, webscrapers just got another email address.

running

basics

In its simplest form, glark is just grep plus fancy highlighting of Perl-compatible regular expressions (PCREs). To test out a few, go to the install directory of glark (most likely /usr/share/glark) and run:

% glark puts *.rb

If your terminal supports color highlighting (via ANSI escape codes) you'll see the matching lines, with "puts" highlighted as black text on a yellow background.

OK, that's not very exciting. But try a regular expression:

% glark '\bfiles?\-[-\w]+' *.rb

This searches for strings matching the following:

  • A word boundary, followed by ...
  • The characters "file", followed by ...
  • Optionally, an 's', followed by
  • A dash
  • One or more occurances of either word characters (a-z, A-Z, 0-9, or _), or dashes

Very exciting. The world of regular expressions is a vast and powerful one, and this isn't the time or place for a regular expression tutorial. Besides, another Jeff, Jeffrey Friedl, already literally wrote the book on regular expressions: Mastering Regular Expressions.

While waiting for the book to arrive, I suggest reading the Perl reference on regular expressions, which is the perlre man page. Yes, glark (thanks to Ruby) supports all of that ... every amazing PCRE that you can muster. And I think most will agree that there's nothing quite like a mustered regular expression.

Some examples? Okay. Here's one: find all variations of the word "Google" in some text files, handling such words as "Google", "Googol", and "gooooogle":

% glark -i 'goo+g(led?|ol)' *.txt

Oftentimes one is searching for whole words, useful for finding variable names such as "i", which happens also to be a letter that is often found in English, and other words besides "English". To find an expression that is a whole word -- that is, there are word boundaries on either side of it -- use the option "-w", as follows ...

% glark -w i *.java

... and you will probably find every "for" loop in the code, depending on the creativity of the programmers involved. Professional typists who enjoy showing their skills via longer option names can alternatively use "--word" (and "--word-regexp", to match grep's option). glark also tries to figure out when an expression will probably not match what the user intends, such as:

% glark -w '-\s*1' *.c
WARNING: pattern '-\s*1' does not start on a word boundary

(That condition is actually matched by a regular expression (see share/glark/regexp.rb). I'm quite proud of that one, so I beseech you to take a look, but punish me not with your hard thoughts.)

Another handy shortcut is "-i", or in long form, "--ignore-case", which makes the regular expression case insensitive.

% glark -i stringio *.rb

Regular expressions can be defined with leading and trailing slashes, such as ...

% glark '/bates/' psycho.txt

... and common modifiers can be used too, for case insensitivity, multiline matching, and extended regular expressions:

% glark '/norman.bates/i' psycho.txt
% glark '/norman.bates/m' psycho.txt
% glark '/ norman . bates /x' psycho.txt
% glark '/ norman . bates /mix' psycho.txt

complex expressions

glark (in modesty, I refuse to capitalize the name, even at the beginning of a sentence) offers something beyond PCREs ... complex expressions. These take regular expressions and allow them to be combined, so that searches can be done as alternatives ("or expressions"):

% glark --or puts print *.rb

which searches for occurances of "puts" or "print" in the local .rb files. Note that the highlighting is different for "puts" and "print". If that's too colorful, you can go to single-color highlighting by passing the option "--highlight=single":

% glark --highlight=single --or puts print *.rb

To disable highlighting, pass the value "none" as the argument to --highlight:

% glark --highlight=none --or puts print *.rb

Back to complex expressions, you can use combinations of and, or and xor. And these nest so, that you can write:

% glark --and=4 this --or that those

which searches for "this" within four lines of a line containing "this" or "that". The default distance of the --and option is zero, so that the following:

% glark --and this --or that those

searches for "this" on the same line as "this" or "that".

And even xor is supported:

% glark --xor initialize new *.rb

which searches for all lines on which either "initialize" or "new" occurs, but not both.

input files

By default, glark will automatically skip files that are not text. A file is determined to be text based on its extension, and if not already associated, then the file contents are examined. This makes it easy to skip binary files:

% glark -w sprintf *

This will search, for example, all .h and .c files, but skip .o and .a files.

This can be fine-tuned on a project-by-project basis, to make it possible to skip .class files in Java projects (where .class files are binary files), yet search them in PHP projects (where .class is used as the extension of PHP source files. This is done by local .glarkrc files, which override system-wide options set in ~/.glarkrc. When glark starts running, ~/.glarkrc is read, and if this file sets the option local-config-files to true, then .glarkrc are read from the top of the file system hierarchy down to the current directory. Note that this option is set by default to false, since this option consumes a fair amount of time.

So to enable this, first set up /home/you/.glarkrc as:

local-config-files: true

And if you have a Java project, in /home/you/projects/javaapp, you can write /home/yaou/projects/javaapp/.glarkrc as:

known-nontext-files: class

And in your PHP project, as /home/yaou/projects/phpapp/.glarkrc:

known-text-files: class

So now, in /home/you/projects/javaapp, running:

% glark -r print .

will skip .class files, whereas the same command, run in /home/you/projects/phpapp, will include .class files in those searched.

And that introduces the feature that directories can be specified, and if run with -r (or --directories=recurse) then the directories will be searched recursively. Yes, this can be done with find ... | xargs glark ..., but that can result in xargs returning only one file in a set, and it does not result in glark hanging. In contrast, when find produces no files, the application will hang, because it has been run with no arguments, and thus expects to read standard input.

When specifying a list of files, it can be helpful to exclude or include file names based on patterns. The simplest form of support for this is via the --exclude-matching option, but much more power and flexibility is offered via the options combining --with- and --without with "basename" and "fullname". Note that there is some redundancy with the option names, with multiple options provided for parallelism with other options. There are two modes, "with" and "without", and two types of file names, "base" and "full" names.

--basename --with-basename --name --with-name
--fullname --path --with-fullname --with-path
--without-basename --without-name
--without-fullname --without-path

Some examples:

To find "cout" in all files in or under a directory named "src" or "source":

% glark -r --path='\/(src|source)\/' cout .

To skip any files in or under a directory named ".svn" (Subversion):

% glark -r --without-path='\/\.svn\b' cout .

the end ... so far

... a work in progress ... check back for updates ...

Valid HTML 4.01!

Valid CSS!