glark - Search text files for complex regular expressions
glark [options] expression file ...
Similar to grep, glark offers: Perl-compatible regular expressions, color
highlighting of matches, context around matches, complex expressions (``and'' and
``or''), grep output emulation, and automatic exclusion of non-text files. Its
regular expressions should be familiar to persons experienced in Perl, Python,
or Ruby. File may also be a list of files in the form of a path.
- -0[nnn]
-
Use \nnn (octal) as the input record separator. If nnn is omitted, use '\n\n' as
the record separator, which treats paragraphs as lines.
- -d ACTION, --directories=ACTION
-
Directories are processed according to the given ACTION, which by default is
read. If ACTION is recurse, each file in the directory is read and each
subdirectory is recursed into (equivalent to the -r option). If ACTION is
skip, directories are not read, and no message is produced.
- --binary-files=TYPE
-
Specify how to handle binary files, thus overriding the default behavior, which
is to denote the binary files that match the expression, without displaying the
match. TYPE may be one of:
binary, the default; without-match, which
results in binary files being skipped; and text, which results in the binary
file being treated as text, the display of which may have bad side effects with
the terminal. Note that the default behavior has changed; this previously was to
skip binary files. The same effect may be achieved by setting binary-files to
without-match in the ~/.glarkrc file.
- --basename EXPR, --name EXPR
-
Search only files whose names match the given regular expression. As in find(1),
this works on the basename of the file. This expression can be negated and
modified with
i, such as '/io\.[hc]$/i'.
- --fullname EXPR, --path EXPR
-
Search only files whose names, including path, match the given regular
expression. As in find(1), this works on the path of the file. This expression
can be negated and modified with
i, such as '/source/.*/ui/.*widget\.java/i'.
- -M, --exclude-matching
-
Do not search files whose names match the given expression. This can be useful
for finding external references to a file, or to a class (assuming that class
names match file names).
- -r, --recurse
-
Recurse through directories. Equivalent to --directories=read.
- --split-as-path(=VALUE), --no-split-as-path
-
Sets whether, if a command line argument includes the path separator (such as
``:''), the argument should be split by the path separator. This functionality is
useful for using environment variables as input, such as $PATH and $CLASSPATH,
which are automatically split and processed as a list of files and directories.
The default value of this option is ``true''. --no-split-as-path is equivalent
to --split-as-path=false.
- --size-limit=SIZE
-
If provided, files no larger than SIZE will be searched. This is useful when
running the
--recurse option on directories that may contain large files.
- -a NUM expr1 expr2
- --and NUM expr1 expr2
- --and=NUM expr1 expr2
-
Match both of the two expressions, within NUM lines of each other. See the
EXPRESSIONS section for more information.
- -b NUM[%], --before NUM[%]
-
Restrict the search to before the given location, which represents either the
number of the last line within the valid range, or the percentage of lines to be
searched.
- --after NUM[%]
-
Restrict the search to after the given section, which represents either the
number of the first line within the valid range, or the percentage of lines to
be skipped.
- -f FILE, --file=FILE
-
Use the lines in the given file as expressions. Each line consists of a regular
expression.
- -i, --ignore-case
-
Match regular expressions without regard to case. The default is
case sensitive.
- -m NUM, --match-limit NUM
-
Find only the first NUM matches in each file.
- -o expr1 expr2
- --or expr1 expr2
-
Match either of the two expressions. See the EXPRESSIONS section for more
information.
- -R, --range NUM[%] NUM[%]
-
Restrict the search to the given range of lines.
- -v, --invert-match
-
Show lines that do not match the expression.
- -w, --word, --word-regexp
-
Put word boundaries around each pattern, thus matching only where
the full
word(s) occur in the text. Thus, glark -w Foo is the same
as glark '/\bFoo\b/'.
- -x, --line-regexp
-
Select only where the entire line matches the pattern(s).
- --xor expr1 expr2
-
Match either of the two expressions, but not both. See the EXPRESSIONS section
for more information.
- -A NUM, --after-context=NUM
-
Print NUM lines after a matched expression.
- -B NUM, --before-context=NUM
-
Print NUM lines before a matched expression.
- -C [NUM], -NUM, --context[=NUM]
-
Output NUM lines of context around a matched expression. The default is no
context. If no NUM is given for this option, the number of lines of context
is 2.
- -c, --count
-
Instead of normal output, display only the number of matches in each file.
- -F, --file-color COLOR
-
Specify the highlight color for file names. See the HIGHLIGHTING section for
the values that can be used.
- --no-filter
-
Display the entire file(s), presumably with matches highlighted.
- -g, --grep
-
Produce output like the grep default: file names, no line numbers, and a single
line of the match, which will be the first line for matches that span multiple
lines. If the EMACS environment variable is set, this value is set to true.
Thus, running glark under Emacs results in the output format expected by Emacs.
- -h, --no-filename
-
Do not display the names of the files that matched.
- -H, --with-filename
-
Display the names of the files that matched. This is the default
behavior.
- -l, --files-with-matches
-
Print only the names of the file that matched the expression.
- -L, --files-without-match
-
Print only the names of the file that did not match the expression.
- --label=NAME
-
Use NAME as output file name. This is useful when reading from standard input.
- -n, --line-number
-
Display the line numbers. This is the default behavior.
- -N, --no-line-number
-
Do not display the line numbers.
- --line-number-color
-
Specify the highlight color for line numbers. This defaults to none (no
highlighting). See the HIGHLIGHTING section for more information.
- -T, --text-color COLOR
-
Specify the highlight color for text. See the HIGHLIGHTING section for more
information.
- --text-color-NUM COLOR
-
Specify the highlight color for the regular expression capture NUM. Colors are
used by regular expressions in the order they are created (that is, with the
--and and --or option), or with captures within a regular expression (such
as '/(this)|(that)/'). is See the HIGHLIGHTING section for more information.
- -u, --highlight=[FORMAT]
-
Enable highlighting. This is the default behavior. Format is ``single'' (one
color) or ``multi'' (different color per regular expression). See the HIGHLIGHTING
section for more information.
- -U, --no-highlight
-
Disable highlighting.
- -y, --extract-matches
-
Display only the region that matched, not the entire line. If the expression
contains ``backreferences'' (i.e., expressions bounded by ``( ... )''), then only
the portion captured will be displayed, not the entire line. This option is
useful with
-g, which eliminates the default highlighting and display of file
names.
- -Z, --null
-
When in -l mode, write file names followed by the ASCII NUL character ('\0')
instead of '\n'.
- -?, --help
-
Display the help message.
- --config
-
Display the settings glark is using, and exit. Since this is run after
configuration files are read, this may be useful for determining values of
configuration parameters.
- --explain
-
Write the expression in a more legible format, useful for debugging.
- -q, -s, --quiet, --no-messages
-
Suppress warnings.
- -Q, --no-quiet
-
Enable warnings. This is the default.
- -V, --version
-
Display version information.
- --verbose
-
Display normally suppressed output, for debugging purposes.
Regular expressions are expected to be in the Perl/Ruby format. perldoc
perlre has more general information. The expression may be of either form:
something
/something/
There is no difference between the two forms, except that with the latter, one
can provide the ``ignore case'' modifier, thus matching ``someThing'' and
``SoMeThInG'':
% glark /something/i
Note that this is redundant with the -i (--ignore-case) option.
All regular expression characters and options are available, such as ``\w'',
``.*?'' and ``[^9]''. For example:
% glark '\b[a-z][^\d]\d{1,3}.*\s*>>\s*\d+\s*.*& +\d{3}'
If the and and or options are not used, the last non-option is considered
to be the expression to be matched. In the following, ``printf'' is used as the
expression.
% glark -w printf *.c
POSIX character classes (e.g., [[:alpha:]]) are also supported.
Complex expressions combine regular expressions (and complex expressions
themselves) with logical AND, OR, and XOR operators.
- -o expr1 expr2
- --or expr1 expr2 --end-of-or
-
Match either of the two expressions. The results of the two forms are
equivalent. In the latter syntax, the --end-of-or is optional.
- -a number expr1 expr2
- --and number expr1 expr2 --end-of-and
-
Match both of the two expressions, within <number> lines of each other. As with
the
or option, the results of the two forms are equivalent, and the
--end-of-and is optional. The forms -aNUM and --and=NUM are also
supported.
-
If the number provided is -1 (negative one), the distance is considered to be
``infinite'', and thus, the condition is satisfied if both expressions match
within the same file.
-
If the number provided is 0 (zero), the condition is satisfied if both
expressions match on the same line.
-
A warning will be issued if the value given in the number position does not
appear to be numeric.
- --xor expr1 expr2 --end-of-xor
-
Match either of the two expressions, but not both.
--end-of-xor is optional.
Regular expressions can be negated, by being prefixed with '!', and using the
'/' quote characters around the expression, such as:
!/expr/
This has the effect of ``match anything other than this''. For a single
expression, this is no different than the -v/--invert-match option, but it can
be useful in complex expressions, such as:
--and 0 this '!/that/'
which means ``match and line that has ''this``, but not ''that``.
Matching patterns and file names can be highlighted using ANSI escape sequences.
Both the foreground and the background colors may be specified, from the
following:
black
blue
cyan
green
magenta
red
white
yellow
The foreground may have any number of the following modifiers applied:
blink
bold
concealed
reverse
underline
underscore
The format is ``MODIFIERS FOREGROUND on BACKGROUND''. For example:
red
black on yellow (the default for patterns)
reverse bold (the default for file names)
green on white
bold underline red on cyan
By default text is highlighted as black on yellow. File names are written in
reversed bold text.
- % glark format *.h
-
Searches for ``format'' in the local .h files.
- % glark --ignore-case format *.h
-
Searches for ``format'' without regard to case. Short form:
% glark -i format *.h
- % glark --context=6 format *.h
-
Produces 6 lines of context around any match for ``format''. Short forms:
% glark -C 6 format *.h
% glark -6 format *.h
- % glark --exclude-matching Object *.java
-
Find references to ``Object'', excluding the files whose names match ``Object''.
Thus, SessionBean.java would be searched; EJBObject.java would not. Short form:
% glark -M Object *.java
- % glark --grep --extract-matches '\w+\.printStackTrace\(.*\)' *.java
-
Show where exceptions are dumped. Note that the
--grep option is used, thus
turning off highlighting and display of file names. If the --no-filename
option is used, the output will consist of only the matching portions. The short
form of this command is:
% glark -gy '\w+\.printStackTrace\(.*\)' *.java
- % glark --grep --extract-matches '(\w+)\.printStackTrace\(.*\)' *.java
-
Show only the variable name of exceptions that are dumped. Short form:
% glark -gy '(\w+)\.printStackTrace\(.*\)' *.java
- % who | glark -gy '^(\S+)\s+\S+\s*May 15'
-
Display only the names of users who logged in today.
- % glark -l '\b\w{25,}\b' *.txt
-
Display (only) the names of the text files that contain ``words'' at least 25
characters long.
- % glark --files-without-match '``\w+'''
-
Display (only) the names of the files that do not contain strings consisting of
a single word. Short form:
% glark -L '``\w+'''
- % for i in *.jar; do jar tvf $i | glark --LABEL=$i Exception
-
Display (only) the names of the files that do not contain strings consisting of
a single word. Short form:
% glark -L '``\w+'''
- % glark --text-color ``red on white'' '\b[[:digit:]]{5}\b' *.c
-
Display (in red text on a white background) occurrences of exactly 5 digits.
Short form:
% glark -T ``red on white'' '\b\d{5}\b' *.c
See the HIGHLIGHTING section for valid colors and modifiers.
- % glark --or format print *.h
-
Searches for either ``printf'' or ``format''. Short form:
% glark -o format print *.h
- % glark --and 4 printf format *.c *.h
-
Searches for both ``printf'' or ``format'' within 4 lines of each other. Short form:
% glark -a 4 printf format *.c *.h
- % glark --context=3 --and 0 printf format *.c
-
Searches for both ``printf'' or ``format'' on the same line (``within 0 lines of each
other''). Three lines of context are displayed around any matches. Short form:
% glark -3 -a 0 printf format *.c
- % glark -8 -i -a 15 -a 2 pthx '\.\.\.' -o 'va_\w+t' die *.c
-
(In order of the options:) Produces 8 lines of context around case insensitive
matches of (``phtx'' within 2 lines of '...' (literal)) within 15 lines of (either
``va_\w+t'' or ``die'').
- % glark --and -1 '#define\s+YIELD' '#define\s+dTHR' *.h
-
Looks for ``#define\s+YIELD'' within the same file (-1 == ``infinite distance'') of
``#define\s+dTHR''. Short form:
% glark -a -1 '#define\s+YIELD' '#define\s+dTHR' *.h
- % glark --before 50% cout *.cpp
-
Find references to ``cout'', within the first half of the file. Short form:
% glark -b 50% cout *.cpp
- % glark --after 20 cout *.cpp
-
Find references to ``cout'', starting at the 20th line in the file. Short form:
% glark -b 50% cout *.cpp
- % glark --range 20 50% cout *.cpp
-
Find references to ``cout'', in the first half of the file, after the 20th line.
Short form:
% glark -R 20 50% cout *.cpp
- GLARKOPTS
-
A string of whitespace-delimited options. Due to parsing constraints, should
probably not contain complex regular expressions.
- $HOME/.glarkrc
-
A resource file, containing name/value pairs, separated by either ':' or '='.
The valid fields of a .glarkrc file are as follows, with example values:
-
after-context: 1
before-context: 6
context: 5
file-color: blue on yellow
highlight: off
ignore-case: false
quiet: yes
text-color: bold reverse
line-number-color: bold
verbose: false
grep: true
-
``yes'' and ``on'' are synonymnous with ``true''. ``no'' and ``off'' signify ``false''.
-
My ~/.glarkrc file is the following:
-
file-color: bold reverse
text-color: bold black on yellow
context: 2
highlight: on
verbose: false
ignore-case: false
quiet: yes
word: false
binary-files: without-match
- local .glarkrc
-
See the local-config-files field below:
- after-context
-
See the
--after-context option. Example, for 3 lines:
-
after-context: 3
- before-context
-
See the
--before-context option. Example, for 7 lines:
-
before-context: 7
- binary-files
-
See the
--binary-files option. Example, to skip binary files:
-
binary-files: without-match
- context
-
See the
--context option, Example, for 2 lines before and after matches:
-
context: 2
- expression
-
See the EXPRESSION section. Example:
-
expression: --or '^\s*public\s+class\s+\w+' '^\s*\w+\(
- file-color
-
See the
--file-color option. Example for white on black:
-
file-color: white on black
- filter
-
See the
--filter option. Example, to show the entire file:
-
filter: false
- grep
-
See the
--grep option. Example, to run in grep mode:
-
grep: true
- highlight
-
See the
--highlight option. To turn off highlighting:
-
highlight: false
- ignore-case
-
See the
--ignore-case option. To make matching case-insensitive:
-
ignore-case: true
- known-nontext-files
-
The extensions of files that should be considered to always be nontext (binary).
If a file extension is not known, the file contents are examined for nontext
characters. Thus, setting this field can result in faster searches. Example:
-
known-nontext-files: class exe dll com
-
See the Exclusion of Non-Text Files section in NOTES for the default
settings.
- known-text-files
-
The extensions of files that should be considered to always be text. See above
for more. Example:
-
known-text-files: ini bat xsl xml
-
See the Exclusion of Non-Text Files section in NOTES for the default
settings.
- local-config-files
-
By default, glark uses only the configuration file ~/.glarkrc. Enabling this
makes glark search upward from the current directory for the first .glarkrc
file.
-
This can be used, for example, in a Java project, where .class files are binary,
versus a PHP project, where .class files are text:
-
/home/me/.glarkrc
-
local-config-files: true
-
/home/me/projects/java/.glarkrc
-
known-nontext-files: class
-
/home/me/projects/php/.glarkrc
-
known-text-files: class
-
With this configuration, .class files will automatically be treated as binary
file in Java projects, and .class files will be treated as text. This can speed
up searches.
-
Note that the configuration file ~/.glarkrc is read first, so the local
configuration file can override those settings.
- quiet
-
See the
--quiet option.
- show-break
-
Whether to display breaks between sections, when displaying context. Example:
-
show-break: true
-
By default, this is false.
- text-color
-
See the
--text-color option. Example:
-
text-color: bold blue on white
- verbose
-
See the
--verbose option. Example:
-
verbose: true
- verbosity
-
See the
--verbosity option. Example:
-
verbosity: 4
Non-text files are automatically skipped, by taking a sample of the file and
checking for an excessive number of non-ASCII characters. For speed purposes,
this test is skipped for files whose suffixes are associated with text files:
c
cpp
css
h
f
for
fpp
hpp
html
java
mk
php
pl
pm
rb
rbw
txt
Similarly, this test is also skipped for files whose suffixes are associated
with non-text (binary) files:
Z
a
bz2
elc
gif
gz
jar
jpeg
jpg
o
obj
pdf
png
ps
tar
zip
See the known-(?:non)text-files field for denoting file name suffixes
to associate as text or nontext.
The exit status is 0 if matches were found; 1 if no matches were found, and 2 if
there was an error. An inverted match (the -v/--invert-match option) will result
in 1 for matches found, 0 for none found.
For regular expressions, the perlre man page.
Mastering Regular Expressions, by Jeffrey Friedl, published by O'Reilly.
``Unbalanced'' leading and trailing slashes will result in those slashes being
included as characters in the regular expression. Thus, the following pairs are
equivalent:
/foo "/foo"
/foo\/ "/foo/"
/foo\/i "/foo/i"
foo/ "foo/"
foo/ "foo/"
The code to detect nontext files assumes ASCII, not Unicode.
Jeff Pace <jpace at incava dot org>
Copyright (c) 2002, Jeff Pace.
All Rights Reserved. This module is free software. It may be used, redistributed
and/or modified under the terms of the Lesser GNU Public License. See
http://www.gnu.org/licenses/lgpl.html for more information.