Pinned
2,156 contributions in the last year
Less
More
Activity overview
Contributed to
huggingface/datasets,
huggingface/transformers,
huggingface/huggingface_hub
and 5 other
repositories
Contribution activity
June 2021
Created 44 commits in 3 repositories
Created 1 repository
- lhoestq/notebooks Jupyter Notebook
Created a pull request in huggingface/datasets that received 9 comments
Better error message when using the wrong load_from_disk
As mentioned in #2424, the error message when one tries to use Dataset.load_from_disk to load a DatasetDict object (or vice versa) can be improved.…
+20
−6
•
9
comments
Opened 18 other pull requests in 2 repositories
huggingface/datasets
16
merged
1
open
- Update: WebNLG - update checksums
-
Fix
feverkeys - Fix code_search_net keys
- Add license to the Cambridge English Write & Improve + LOCNESS dataset card
- Add Parquet loader + from_parquet and to_parquet
- Fix dev version
-
Replace bad
n>1Msize tag - Add align_labels_with_mapping to DatasetDict
- Fix fingerprint when moving cache dir
- Make numpy arrow extractor faster
- JAX integration
- Use gc.collect only when needed to avoid slow downs
- Allow to use tqdm>=4.50.0
- Support sliced list arrays in cast
- Mention that there are no answers in adversarial_qa test set
- Better error message when trying to access elements of a DatasetDict without specifying the split
- Fix NQ features loading: reorder fields of features to match nested fields order in arrow data
patrickvonplaten/notebooks
1
open
Reviewed 62 pull requests in 3 repositories
huggingface/datasets 59 pull requests
- update discofuse link cc @ekQ
- Fix FileSystems documentation
- Implement ClassLabel encoding in JSON loader
- Update README.md
- Raise FileNotFoundError in WindowsFileLock
- Add support for Split.ALL
- Fix logging levels
- Fix DuplicatedKeysError in drop dataset
- Sync with transformers disabling NOTSET
-
Remove task templates if required features are removed during
Dataset.map - Improve Features docs
- Add summarization template
- Dataset Streaming
- Allow downloading/processing/caching only specific splits
- Add task template for automatic speech recognition
- Fixed label parsing in the ProductReviews dataset
-
pretty_namefor dataset in YAML tags - Fix fingerprint when moving cache dir
- CRD3 dataset card
- Improve performance of pandas arrow extractor
- Use scikit-learn package rather than sklearn in setup.py
- Insert text classification template for Emotion dataset
- Add task templates for tydiqa and xquad
- Add align_labels_with_mapping function
- Rearrange JSON field names to match passed features schema field names
- Some pull request reviews not shown.
huggingface/transformers 2 pull requests
huggingface/datasets-tagging 1 pull request
Created an issue in huggingface/datasets that received 1 comment
Add C4
Adding a Dataset Name: C4 Description: allenai/allennlp#5056 Paper: https://arxiv.org/abs/1910.10683 Data: https://huggingface.co/datasets/allenai/c4
1
comment
Opened 5 other issues in 1 repository
huggingface/datasets
3
open
2
closed
11
contributions
in private repositories
Jun 2 – Jun 23