[TriLUG] data set wrangling ---------- Re: TriLUG Digest, Vol 4047, Issue 1

Michael Rulison via TriLUG trilug at trilug.org
Sat Dec 17 19:30:55 EST 2022


On 12/16/2022 12:00, Joe Purvis, via TriLUG wrote:
> ... My other thought was to see if there was a no-code/low-code solution like Airtable that might be able to put some nice forms in front of 7,000+ records worth of info. Part of the dataset is in a MySQL database, but then we'd have to find something to put in front of it to provide gentle data management/searching...
>
> Any ideas, comments, thoughts, shouts of horror, expressions of sympathy, or suggestions for therapy would be appreciated!

Fools walk in.... I have been working, sporadically, with a data set 
that numbers in the scores of thousands of records and a score of vars. 
(columns) --- using the R language. After all, if I can learn to put 
together some code to make a random sample or partition a set into ranks 
based on a metric, like real estate value, hard-core LUGgers should be 
able to get over those initial hurdles pretty fast. R just eats up my 
data; the slowness of my work is getting syntax right down to the last 
character, not to the speed of my processor (4-year-old laptop).

More: R will happily import spreadsheets, CSVs, and even parse text. And 
once one has a hunk of code that does what one wants one can reuse it 
with updated data, etc. Easy to use regular expressions for complex 
searches, etc. Make a monster flat file then rip it apart for various needs.

Yes, it is code but with lots of modules to handle particular issues. 
And, using R to produce subsets that then make sense to handle with a 
spreadsheet.

-- 
====================
Michael Rulison
☎ 919 205 9168



More information about the TriLUG mailing list