Trying to find if a file content/output contains phrases which may contain special chartacters

Hello,

SHORT DESCRIPTION:
I have a bash script command:
unwanted=$(cat /dev/shm/lastmessage|grep -oE '01[[:alnum:]]{64}|B01[[:alnum:]]{64}[[:punct:]]?|/abc/|badphrase|$1' 2>/dev/null)
and since number of phrases that grep is checking is increasing, i wanted to move phrases (maybe not those that contain alnum since it needs to be grep treated differently than the rest) into a separate file, one per line, while that file phrases may contain ANY special characters like ", ’ \ / ; $ ?
It should be efficient even for a huge list of bad phrases and described way? Any idea how the command may look like?

REST IS A LONG DESCRIPTION, POSSIBLY NOT NEEDED TO SPEND TIME READING IT:
I am hosting a community for an anonymous messenger called Session and its profanity blocklist function is badly made not to block phrases containing special characters plus it cause slow start of the server if profanity list is too long. Developers are inactive so since i do not know how to prevent SQLITE3 INSERT command containing bad phrase, i am looking for a way to delete bad phrase containing message post-INSERT. Not optimal, but better than spam.

Currently my bash script contains following code to insert last message content to a file:

# discover last message in a DB table and insert it into a file
for i in 1 2 3 4 5; do sqlite3 /var/lib/session-open-group-server/sogs.db 'SELECT * FROM message_details ORDER BY id DESC LIMIT 1;' && echo "Success selecting msg" && break || echo "Failure selecting msg, attempt $i/5" && sleep 0.1; don
e > /dev/shm/lastmessage

Then the script follows, I have tried two variants and none is working:

A)

mapfile -t phrases < /var/lib/session-open-group-server/profanity.txt
pattern=$(printf "%s|" "${phrases[@]}")
unwanted=$(grep -oF "$pattern|$1" /dev/shm/lastmessage 2>/dev/null)

B)

phrases=$(cat /var/lib/session-open-group-server/profanity.txt)
unwanted=$(grep -oF "$phrases|$1" /dev/shm/lastmessage 2>/dev/null)

Inside a profanity.txt file, i have:
specphrasee"'e/d\df
01[[:alnum:]]{64}
B01[[:alnum:]]{64}[[:punct:]]?
/abc/
badphrase

2nd and 3rd phrase is something that may be called regular expressions, so i probably need to separate it from others that should be treated as a fixed string.

QUESTION: do you have idea how to solve this, how the bash script code should look like?

What works for me is:
unwanted=$(cat /dev/shm/lastmessage|grep -oE '01[[:alnum:]]{64}|B01[[:alnum:]]{64}[[:punct:]]?|/abc/|badphrase|$1' 2>/dev/null)

yet I wanted to make longer list of phrases, so it seems more practical to have these inside a separate file one per line. The whole task of checking lets say 5MB big file of thousands of phrases against output (last posted message) should be CPU/memory/disk efficient, since it will be done every couple of seconds.

  1. Don’t use cat to pipe into grep and the many others that can accept a file parameter natively. Apart from making it easier to get to the filename without a mouse (if the cat is in or close to the start of the line and the grep ain’t at or close to the end of the line), it’s a total waste of effort and performance. This needs to end.
  2. You need to escape all RegEx dialect-relevant chars before processing with RegEx. I don’t recommend -E (POSIX ERE (Extended RegEx)) cuz it’s just confusing and also feature-limited, instead I’d go for -P (PCRE2 (Perl-Compatible Regular Expressions)).
  • The reference at Regular Expressions Reference is a very good resource to learn from (I need to remind myself every so often).
  • Regex101 is an extremely useful, borderline mandatory, service to write PCRE2 (and other dialects) while getting told where exactly a less complicated problem lies, complete with highlighting, automatic execution, automatic testing of multiple test units, the ability to replace matched strings, and run a per-character debugging of deeper problems. I use it almost every time I write any even a little bigger RegEx.

replace ./inputfile, ./plainphrases, ./repatterns
by the path to appropriate files.

#!/bin/bash
# Pass plain phrases file and RE patterns file to awk variables
LC_ALL=C awk -v plainphrases="./plainphrases" -v repatterns="./repatterns" '
  BEGIN {
# Plain array
    while (getline line < plainphrases) { pp[++ppi]=line }
# RE array
    while (getline line < repatterns) { rp[++rpi]=line }
  }
# Main loop
  {
# Plain matches, eliminate duplicates
    for (p in pp) if (index($0, pp[p])) { uniq[pp[p]] }
# RE matches, eliminate duplicates
    for (r in rp) if (match($0, rp[r])) { uniq[substr($0, RSTART, RLENGTH)] }
  }
  END {
# Print the collected results
    for (u in uniq) { print u }
  }
' "./inputfile"

If one need to do some action for the case when awk command is/not empty, then the whole awk command may be prefixed by variable=(
and suffixed by )
and then adding a condition for the case that the variable is/not empty.
if [[ "$variable" != "" ]]; then echo "not empty"; fi