-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What could went wrong #13
Comments
It looks like you are using the default devset size for the hessian generation (iirc 256). You need something larger to get a good approximation of the actual hessians. For our experiments and released models, we used a devset of at least 4096. The reason why hessian generation takes a long time on consumer gpus is because they have poor fp64 performance and the hessian script accumulates in fp64. If 4096 will take too long for you and you cannot get access to a gpu with fast fp64, you can try accumulating the hessians in fp32 at the probable loss of some numerical accuracy.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: 俞航 ***@***.***>
Sent: Sunday, December 10, 2023 12:47:54 AM
To: Cornell-RelaxML/quip-sharp ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [Cornell-RelaxML/quip-sharp] What could went wrong (Issue #13)
Hi,
I was going to upload a quip 2 bit version of llama2 model which I took as a chance as an experiment to this method.
https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-QUIP-2bit
But as I mentioned in its readme, the hessian pass took quite long, about 6 hours, and the final ppl for quip 2bit is not so ideal. The model perfomance also download grade noticeably.
The conversion process does not throw any error though, and the evlaution process is smooth, too. What could went wrong. I might spent some time to re-run the whole process as double check
Here is my script
#!/bin/bash
eval "$(conda shell.bash hook)"
conda activate quip
MODEL_NAME=Xwin-Math-7B-V1.0
#MODEL_NAME=ShareGPT4V-7B
BASE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME/
SAVE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME-QUIP/
TMP_MODEL_DIR=./$MODEL_NAME
BATCH_SIZE=2
CTX_LEN=4096
cd repo/quip-sharp/
if [! -d "$TMP_MODEL_DIR" ]; then
mkdir -p $TMP_MODEL_DIR
fi
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./hessian_offline_llama.py \
--seed 34 \
--base_model $BASE_MODEL_DIR \
--save_path $TMP_MODEL_DIR/Hessian/ \
--ctx_size $CTX_LEN \
--batch_size $BATCH_SIZE \
| tee $TMP_MODEL_DIR/Hessian-quip.log
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./quantize_llama.py \
--seed 34 \
--base_model $BASE_MODEL_DIR \
--hessian_path $TMP_MODEL_DIR/Hessian \
--save_path $TMP_MODEL_DIR/Ckpt \
--ctx_size $CTX_LEN \
--batch_size $BATCH_SIZE \
--codebook E8P12 \
--scale_override 0.9 \
| tee $TMP_MODEL_DIR/Ckpt-quip.log
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python hfize_llama.py \
--quantized_path $TMP_MODEL_DIR/Ckpt \
--hf_output_path $SAVE_MODEL_DIR \
| tee $TMP_MODEL_DIR/HFize-quip.log
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python eval_ppl.py \
--seed 34 \
--hf_path $BASE_MODEL_DIR \
--seqlen $CTX_LEN \
| tee $TMP_MODEL_DIR/PPl-quip.log
cd ../../
—
Reply to this email directly, view it on GitHub<#13>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6WZSCLD5PSRLBT77BWYETYIVEIVAVCNFSM6AAAAABAOKF6BWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTIMRTGAYTSMI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Also if this is a math model you should probably change the hessian devset from redpajama (natural language) to a math dataset.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Albert T ***@***.***>
Sent: Sunday, December 10, 2023 12:53:26 AM
To: Cornell-RelaxML/quip-sharp ***@***.***>; Cornell-RelaxML/quip-sharp ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [Cornell-RelaxML/quip-sharp] What could went wrong (Issue #13)
It looks like you are using the default devset size for the hessian generation (iirc 256). You need something larger to get a good approximation of the actual hessians. For our experiments and released models, we used a devset of at least 4096. The reason why hessian generation takes a long time on consumer gpus is because they have poor fp64 performance and the hessian script accumulates in fp64. If 4096 will take too long for you and you cannot get access to a gpu with fast fp64, you can try accumulating the hessians in fp32 at the probable loss of some numerical accuracy.
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: 俞航 ***@***.***>
Sent: Sunday, December 10, 2023 12:47:54 AM
To: Cornell-RelaxML/quip-sharp ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [Cornell-RelaxML/quip-sharp] What could went wrong (Issue #13)
Hi,
I was going to upload a quip 2 bit version of llama2 model which I took as a chance as an experiment to this method.
https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-QUIP-2bit
But as I mentioned in its readme, the hessian pass took quite long, about 6 hours, and the final ppl for quip 2bit is not so ideal. The model perfomance also download grade noticeably.
The conversion process does not throw any error though, and the evlaution process is smooth, too. What could went wrong. I might spent some time to re-run the whole process as double check
Here is my script
#!/bin/bash
eval "$(conda shell.bash hook)"
conda activate quip
MODEL_NAME=Xwin-Math-7B-V1.0
#MODEL_NAME=ShareGPT4V-7B
BASE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME/
SAVE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME-QUIP/
TMP_MODEL_DIR=./$MODEL_NAME
BATCH_SIZE=2
CTX_LEN=4096
cd repo/quip-sharp/
if [! -d "$TMP_MODEL_DIR" ]; then
mkdir -p $TMP_MODEL_DIR
fi
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./hessian_offline_llama.py \
--seed 34 \
--base_model $BASE_MODEL_DIR \
--save_path $TMP_MODEL_DIR/Hessian/ \
--ctx_size $CTX_LEN \
--batch_size $BATCH_SIZE \
| tee $TMP_MODEL_DIR/Hessian-quip.log
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./quantize_llama.py \
--seed 34 \
--base_model $BASE_MODEL_DIR \
--hessian_path $TMP_MODEL_DIR/Hessian \
--save_path $TMP_MODEL_DIR/Ckpt \
--ctx_size $CTX_LEN \
--batch_size $BATCH_SIZE \
--codebook E8P12 \
--scale_override 0.9 \
| tee $TMP_MODEL_DIR/Ckpt-quip.log
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python hfize_llama.py \
--quantized_path $TMP_MODEL_DIR/Ckpt \
--hf_output_path $SAVE_MODEL_DIR \
| tee $TMP_MODEL_DIR/HFize-quip.log
TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python eval_ppl.py \
--seed 34 \
--hf_path $BASE_MODEL_DIR \
--seqlen $CTX_LEN \
| tee $TMP_MODEL_DIR/PPl-quip.log
cd ../../
—
Reply to this email directly, view it on GitHub<#13>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6WZSCLD5PSRLBT77BWYETYIVEIVAVCNFSM6AAAAABAOKF6BWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTIMRTGAYTSMI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Closed
@yhyu13 do you need any more help on this issue? If not, feel free to close it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I was going to upload a quip 2 bit version of llama2 model which I took as a chance as an experiment to this method.
https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-QUIP-2bit
But as I mentioned in its readme, the hessian pass took quite long, about 6 hours, and the final ppl for quip 2bit is not so ideal. The model perfomance also download grade noticeably.
The conversion process does not throw any error though, and the evlaution process is smooth, too. What could went wrong. I might spent some time to re-run the whole process as double check
Here is my script
The text was updated successfully, but these errors were encountered: