This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Replies: 2 comments 3 replies
-
|
@lou-k The dataloader should be able to load multiple .rec files, at least the DALI dataloader. |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
For anyone who finds this via google, this is my current workaround: import numpy as np
from mxnet.gluon.data.dataset import Dataset
from bisect import bisect_right
class ConcatenatedDataset(Dataset):
"""
Combines multiple gluon datasets into one. This is useful if, for example, you need to combine multiple ImageRecordDatasets
after using the 'chunk' option in im2rec..
"""
def __init__(self, datasets):
self.datasets = datasets
self.offsets = [0] + np.cumsum([len(d) for d in datasets]).tolist()
def __len__(self):
return self.offsets[-1]
def __getitem__(self, idx):
# figure out which dataset this index will fall into
j = bisect_right(self.offsets, idx) - 1
# get that item.
return self.datasets[j].__getitem__(idx - self.offsets[j]) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I ran into some performance issues packaging about 2 million images into
.recfiles – after about 200k, the time to pack each image got really high. As a workaround, I used the--chunksoption forim2rec, which resulted in about 10.recfiles with around 200k images each.That was nice, but I don’t see an easy way to combine them. I’m using gluon’s
ImageRecordDataset, which only accepts a single.recand idx file.Is there an easy way to combine these
.recI’ve generated? I’d be OK combining them into one file for passing toImageRecordDataset, or having a dataset support multiple input files.Beta Was this translation helpful? Give feedback.
All reactions