[TriLUG] Linux From Scratch

via TriLUG trilug at trilug.org
Tue Mar 4 14:02:52 EST 2025


On 04/03/2025 13:52, Dwain Sims via TriLUG wrote:
>  From the mid 90s on the signal to noise ratio on the Internet has been
> somewhat suspect.  It's just getting harder to filter out the noise.

Speaking of signal-to-noise ratio, here's a little bash script to help 
you figure out how much signal-versus-noise an HTML page contains:

#!/usr/bin/env bash
while [ ! -z $1 ]; do
	htmlpage=$(curl -sSL "$1")

	if [ $? -ne 0 ]; then
		echo "error"
		exit
	fi
	htmlbytes=$(echo "$htmlpage" | wc -c)
	txtbytes=$(echo "$htmlpage" | html2text | wc -c)
	echo -e "Signal ratio for \e[1m\e[4m${1}\e[0m: " $(echo "$txtbytes / 
$htmlbytes" | bc -l)
	unset htmlpage txtbytes htmlbytes
	shift
done

Save the file as 'signal-ratio', ensure you have bc and html2text 
installed, make it +x & it part of your ${PATH}, and then invoke as 
following:

$ signal-ratio https://example.org
Signal ratio for https://example.org:  .18550955414012738853

$ signal-ratio microsoft.com
Signal ratio for microsoft.com:  .04468482614010156319

$ signal-ratio en.wikipedia.org/wiki/Bovine 
  
                                        Signal ratio for 
en.wikipedia.org/wiki/Bovine:  .30573320474345781048

$ signal-ratio trilug.org phillylinux.org
Signal ratio for trilug.org:  .63920205436552674433
Signal ratio for phillylinux.org:  .18757475083056478405


0 means "all noise", 1 means "all signal".

I'm not saying this is perfect, but it's an indicator. I'm also not 
making any judgements on any of the websites I happened to have picked.


More information about the TriLUG mailing list