Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] DeepSpeed Hybrid Engine Does not Work for Mistral-7B
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#4954
opened Jan 14, 2024 by
liziniu
[Queston] Deepspeed stage 3 OOM on 7B-model with A100-80GB memory card
bug
Something isn't working
training
#4953
opened Jan 13, 2024 by
candygocandy
[BUG] Step 3 with ZeRO=3 see error: RuntimeError: CUDA error: an illegal memory access was encountered
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#4945
opened Jan 12, 2024 by
N33MO
[BUG] Setting Finetune=True causes checkpoint loading to not work correctly
bug
Something isn't working
training
#4944
opened Jan 12, 2024 by
exnx
AttributeError: type object 'Init' has no attribute 'quantizer_module'
bug
Something isn't working
training
#4943
opened Jan 12, 2024 by
ZetangForward
Grad parameters are None with bigger autograd graphs
bug
Something isn't working
training
#4941
opened Jan 11, 2024 by
miguelscarv
[BUG] Running DDP with transformers integrated deepspeed get a deadlock (long time no response) when training model.
bug
Something isn't working
training
#4933
opened Jan 11, 2024 by
PommesPeter
Enable deepspeed.zero.Init causes very strange spikes in PPO policy_loss
#4932
opened Jan 11, 2024 by
wuxibin89
[BUG]deepspeed zero3 gets error in dist.get_rank() in multiple node and multiple gpu
bug
Something isn't working
compression
#4931
opened Jan 11, 2024 by
janenie
[REQUEST] Set num_local_io_workers
bug
Something isn't working
#4925
opened Jan 10, 2024 by
Yangr116
[BUG] gan stage2 training TypeError: 'NoneType' object is not subscriptable
bug
Something isn't working
training
#4923
opened Jan 9, 2024 by
cat6523
[BUG] DeepSpeed Zero3 Inference behavior error when model.train() mode
bug
Something isn't working
inference
#4922
opened Jan 9, 2024 by
liu-zichen
Cannot install deepspeed 0.12.6, fail to produce metadata.
bug
Something isn't working
build
Improvements to the build and testing systems.
#4914
opened Jan 8, 2024 by
simonou99
[BUG] DeepSpeed Zero Inference (stage 3) Stuck When One Process Doesn't Execute Something isn't working
inference
model.generate()
bug
#4910
opened Jan 6, 2024 by
samuel21119
Killing subprocess when saving checkpoints during training with zero2.
bug
Something isn't working
training
#4909
opened Jan 6, 2024 by
IvoryTower800
[BUG] my autocast is not working
bug
Something isn't working
training
#4908
opened Jan 6, 2024 by
YooSungHyun
[BUG] Training hang on when use zero2+cpu offload
bug
Something isn't working
training
#4905
opened Jan 6, 2024 by
boundles
[BUG] ZERO++ | AssertionError: ZeRO parameter intra parallel group is already initialized
bug
Something isn't working
training
#4901
opened Jan 5, 2024 by
dhkim0225
[TASK] Seperate AutoTP workflow
enhancement
New feature or request
#4894
opened Jan 4, 2024 by
delock
[BUG] Does DeepSpeed init_inference allow two engines?
bug
Something isn't working
inference
#4893
opened Jan 4, 2024 by
MinghaoYan
[BUG] Error: Attempting to get amgpu ISA Details 'NoneType' object has no attribute 'group'
bug
Something isn't working
build
Improvements to the build and testing systems.
#4891
opened Jan 3, 2024 by
unavailableun
[BUG] [ERROR] [launch.py:321:sigkill_handler [xxx] exits with return code = -9
bug
Something isn't working
training
#4890
opened Jan 3, 2024 by
xinbingzhe
Previous Next
ProTip!
no:milestone will show everything without a milestone.