Hello,
SHORT DESCRIPTION:
I have a bash script command:
unwanted=$(cat /dev/shm/lastmessage|grep -oE '01[[:alnum:]]{64}|B01[[:alnum:]]{64}[[:punct:]]?|/abc/|badphrase|$1' 2>/dev/null)
and since number of phrases that grep is checking is increasing, i wanted to move phrases (maybe not those that contain alnum since it needs to be grep treated differently than the rest) into a separate file, one per line, while that file phrases may contain ANY special characters like ", ’ \ / ; $ ?
It should be efficient even for a huge list of bad phrases and described way? Any idea how the command may look like?
REST IS A LONG DESCRIPTION, POSSIBLY NOT NEEDED TO SPEND TIME READING IT:
I am hosting a community for an anonymous messenger called Session and its profanity blocklist function is badly made not to block phrases containing special characters plus it cause slow start of the server if profanity list is too long. Developers are inactive so since i do not know how to prevent SQLITE3 INSERT command containing bad phrase, i am looking for a way to delete bad phrase containing message post-INSERT. Not optimal, but better than spam.
Currently my bash script contains following code to insert last message content to a file:
# discover last message in a DB table and insert it into a file
for i in 1 2 3 4 5; do sqlite3 /var/lib/session-open-group-server/sogs.db 'SELECT * FROM message_details ORDER BY id DESC LIMIT 1;' && echo "Success selecting msg" && break || echo "Failure selecting msg, attempt $i/5" && sleep 0.1; don
e > /dev/shm/lastmessage
Then the script follows, I have tried two variants and none is working:
A)
mapfile -t phrases < /var/lib/session-open-group-server/profanity.txt
pattern=$(printf "%s|" "${phrases[@]}")
unwanted=$(grep -oF "$pattern|$1" /dev/shm/lastmessage 2>/dev/null)
B)
phrases=$(cat /var/lib/session-open-group-server/profanity.txt)
unwanted=$(grep -oF "$phrases|$1" /dev/shm/lastmessage 2>/dev/null)
Inside a profanity.txt file, i have:
specphrasee"'e/d\df
01[[:alnum:]]{64}
B01[[:alnum:]]{64}[[:punct:]]?
/abc/
badphrase
2nd and 3rd phrase is something that may be called regular expressions, so i probably need to separate it from others that should be treated as a fixed string.
QUESTION: do you have idea how to solve this, how the bash script code should look like?
What works for me is:
unwanted=$(cat /dev/shm/lastmessage|grep -oE '01[[:alnum:]]{64}|B01[[:alnum:]]{64}[[:punct:]]?|/abc/|badphrase|$1' 2>/dev/null)
yet I wanted to make longer list of phrases, so it seems more practical to have these inside a separate file one per line. The whole task of checking lets say 5MB big file of thousands of phrases against output (last posted message) should be CPU/memory/disk efficient, since it will be done every couple of seconds.