Today we can access a ton of public datasets from sources ranging from Data.gov to Kaggle.com. However, often these datasets are partitioned (split) across many files by some time dimension like year or month.
One such dataset might be the Housing Affordability Data (HADS). For HADS data you can get zipped ASCII files for the years 2013 and 2011. Preparing and loading this dataset would have been a huge pain in the past, but now we have the Lodr Multi-File Processor.
Below both the 2013 and 2011 file was selected for the same load operation:
Great, we can configure a single loader to load multiple files into the same Redshift Table. However, this doesn't solve the whole problem! You see we also need to know what year each record is from. The only place we can get that information is from the file name.
Not a problem with Lodr! We'll use the Row Processing > Insert File Name transform to encode the file name information into the rows of the dataset:
Once the file name is inserted into each row you can apply pattern extract transforms to get the specific year or time dimension out of the file name.
When you run the loader all records from both files will be loaded into the same table and they will have the correct time dimensions associated with them. Easy peasy!