huggingface / datasets

Watch 128
Star 4k
Fork 315

Code
Issues 71
Pull requests 9
Actions
Projects
Security
Insights

Code
Issues
Pull requests
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Pick a username

Email Address

Password

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

71 Open 183 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels.

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated

Most reactions

Error in the notebooks/Overview.ipynb notebook

#712 opened Oct 4, 2020 by subhrm

#709 opened Oct 3, 2020 by nsankar

Datasets performance slow? - 6.4x slower than in memory dataset

#708 opened Oct 3, 2020 by eugeneware

Requirements should specify pyarrow<1

#707 opened Oct 2, 2020 by mathcass

TypeError: '<' not supported between instances of 'NamedSplit' and 'NamedSplit'

#705 opened Oct 2, 2020 by pvcastro

Add UI filter to filter datasets based on task

#691 opened Oct 1, 2020 by praateekmahajan

Dataset browser url is still https://huggingface.co/nlp/viewer/

#686 opened Sep 29, 2020 by jarednielsen

train_test_split returns empty dataset item

#676 opened Sep 28, 2020 by HuangLianzhe

Add custom dataset to NLP?

#675 opened Sep 27, 2020 by timpal0l

load_dataset() won't download in Windows

#674 opened Sep 27, 2020 by ThisDavehead

blog_authorship_corpus crashed nlp-viewer

#673 opened Sep 26, 2020 by Moshiii

Questions about XSUM

#672 opened Sep 26, 2020 by danyaljj

How to skip a example when running dataset.map

#669 opened Sep 25, 2020 by xixiaoyao

Loss not decrease with Datasets and Transformers

#667 opened Sep 24, 2020 by wangcongcong123

Does both 'bookcorpus' and 'wikipedia' belong to the same datasets which Google used for pretraining BERT?

#666 opened Sep 23, 2020 by wahab4114

runing dataset.map, it raises TypeError: can't pickle Tokenizer objects

#665 opened Sep 23, 2020 by xixiaoyao

load_dataset from local squad.py, raise error: TypeError: 'NoneType' object is not callable

#664 opened Sep 23, 2020 by xixiaoyao

Problem with JSON dataset format

#651 opened Sep 20, 2020 by vikigenius

Caching processed dataset at wrong folder bug

#643 opened Sep 18, 2020 by mrm8488

Load large text file for LM pre-training resulting in OOM

#633 opened Sep 16, 2020 by leethu2012

dtype of tensors should be preserved

#625 opened Sep 14, 2020 by BramVanroy

Add learningq dataset dataset request

#624 opened Sep 13, 2020 by krrishdholakia

load_dataset for text files not working dataset bug

#622 opened Sep 12, 2020 by BramVanroy

map/filter multiprocessing raises errors and corrupts datasets bug

#620 opened Sep 11, 2020 by timothyjlaurent

UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors

#616 opened Sep 11, 2020 by BramVanroy

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.