[TriLUG] Linux From Scratch
via TriLUG
trilug at trilug.org
Tue Mar 4 14:02:52 EST 2025
On 04/03/2025 13:52, Dwain Sims via TriLUG wrote:
> From the mid 90s on the signal to noise ratio on the Internet has been
> somewhat suspect. It's just getting harder to filter out the noise.
Speaking of signal-to-noise ratio, here's a little bash script to help
you figure out how much signal-versus-noise an HTML page contains:
#!/usr/bin/env bash
while [ ! -z $1 ]; do
htmlpage=$(curl -sSL "$1")
if [ $? -ne 0 ]; then
echo "error"
exit
fi
htmlbytes=$(echo "$htmlpage" | wc -c)
txtbytes=$(echo "$htmlpage" | html2text | wc -c)
echo -e "Signal ratio for \e[1m\e[4m${1}\e[0m: " $(echo "$txtbytes /
$htmlbytes" | bc -l)
unset htmlpage txtbytes htmlbytes
shift
done
Save the file as 'signal-ratio', ensure you have bc and html2text
installed, make it +x & it part of your ${PATH}, and then invoke as
following:
$ signal-ratio https://example.org
Signal ratio for https://example.org: .18550955414012738853
$ signal-ratio microsoft.com
Signal ratio for microsoft.com: .04468482614010156319
$ signal-ratio en.wikipedia.org/wiki/Bovine
Signal ratio for
en.wikipedia.org/wiki/Bovine: .30573320474345781048
$ signal-ratio trilug.org phillylinux.org
Signal ratio for trilug.org: .63920205436552674433
Signal ratio for phillylinux.org: .18757475083056478405
0 means "all noise", 1 means "all signal".
I'm not saying this is perfect, but it's an indicator. I'm also not
making any judgements on any of the websites I happened to have picked.
More information about the TriLUG
mailing list