What could went wrong #13

yhyu13 · 2023-12-10T05:47:43Z

Hi,

I was going to upload a quip 2 bit version of llama2 model which I took as a chance as an experiment to this method.
https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-QUIP-2bit

But as I mentioned in its readme, the hessian pass took quite long, about 6 hours, and the final ppl for quip 2bit is not so ideal. The model perfomance also download grade noticeably.

The conversion process does not throw any error though, and the evlaution process is smooth, too. What could went wrong. I might spent some time to re-run the whole process as double check

Here is my script

#!/bin/bash
eval "$(conda shell.bash hook)"
conda activate quip
MODEL_NAME=Xwin-Math-7B-V1.0
#MODEL_NAME=ShareGPT4V-7B
BASE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME/
SAVE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME-QUIP/
TMP_MODEL_DIR=./$MODEL_NAME
BATCH_SIZE=2
CTX_LEN=4096

cd repo/quip-sharp/

if [! -d "$TMP_MODEL_DIR" ]; then
    mkdir -p $TMP_MODEL_DIR
fi

TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./hessian_offline_llama.py \
    --seed 34 \
    --base_model $BASE_MODEL_DIR \
    --save_path $TMP_MODEL_DIR/Hessian/ \
    --ctx_size $CTX_LEN \
    --batch_size $BATCH_SIZE \
    | tee $TMP_MODEL_DIR/Hessian-quip.log

TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./quantize_llama.py \
    --seed 34 \
    --base_model $BASE_MODEL_DIR \
    --hessian_path $TMP_MODEL_DIR/Hessian \
    --save_path $TMP_MODEL_DIR/Ckpt \
    --ctx_size $CTX_LEN \
    --batch_size $BATCH_SIZE \
    --codebook E8P12 \
    --scale_override 0.9 \
     | tee $TMP_MODEL_DIR/Ckpt-quip.log

TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python hfize_llama.py \
    --quantized_path $TMP_MODEL_DIR/Ckpt \
    --hf_output_path $SAVE_MODEL_DIR \
    | tee $TMP_MODEL_DIR/HFize-quip.log

TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python eval_ppl.py \
    --seed 34 \
    --hf_path $BASE_MODEL_DIR \
    --seqlen $CTX_LEN \
    | tee $TMP_MODEL_DIR/PPl-quip.log

cd ../../

The text was updated successfully, but these errors were encountered:

tsengalb99 · 2023-12-10T05:53:33Z

It looks like you are using the default devset size for the hessian generation (iirc 256). You need something larger to get a good approximation of the actual hessians. For our experiments and released models, we used a devset of at least 4096. The reason why hessian generation takes a long time on consumer gpus is because they have poor fp64 performance and the hessian script accumulates in fp64. If 4096 will take too long for you and you cannot get access to a gpu with fast fp64, you can try accumulating the hessians in fp32 at the probable loss of some numerical accuracy. Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: 俞航 ***@***.***> Sent: Sunday, December 10, 2023 12:47:54 AM To: Cornell-RelaxML/quip-sharp ***@***.***> Cc: Subscribed ***@***.***> Subject: [Cornell-RelaxML/quip-sharp] What could went wrong (Issue #13) Hi, I was going to upload a quip 2 bit version of llama2 model which I took as a chance as an experiment to this method. https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-QUIP-2bit But as I mentioned in its readme, the hessian pass took quite long, about 6 hours, and the final ppl for quip 2bit is not so ideal. The model perfomance also download grade noticeably. The conversion process does not throw any error though, and the evlaution process is smooth, too. What could went wrong. I might spent some time to re-run the whole process as double check Here is my script #!/bin/bash eval "$(conda shell.bash hook)" conda activate quip MODEL_NAME=Xwin-Math-7B-V1.0 #MODEL_NAME=ShareGPT4V-7B BASE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME/ SAVE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME-QUIP/ TMP_MODEL_DIR=./$MODEL_NAME BATCH_SIZE=2 CTX_LEN=4096 cd repo/quip-sharp/ if [! -d "$TMP_MODEL_DIR" ]; then mkdir -p $TMP_MODEL_DIR fi TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./hessian_offline_llama.py \ --seed 34 \ --base_model $BASE_MODEL_DIR \ --save_path $TMP_MODEL_DIR/Hessian/ \ --ctx_size $CTX_LEN \ --batch_size $BATCH_SIZE \ | tee $TMP_MODEL_DIR/Hessian-quip.log TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./quantize_llama.py \ --seed 34 \ --base_model $BASE_MODEL_DIR \ --hessian_path $TMP_MODEL_DIR/Hessian \ --save_path $TMP_MODEL_DIR/Ckpt \ --ctx_size $CTX_LEN \ --batch_size $BATCH_SIZE \ --codebook E8P12 \ --scale_override 0.9 \ | tee $TMP_MODEL_DIR/Ckpt-quip.log TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python hfize_llama.py \ --quantized_path $TMP_MODEL_DIR/Ckpt \ --hf_output_path $SAVE_MODEL_DIR \ | tee $TMP_MODEL_DIR/HFize-quip.log TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python eval_ppl.py \ --seed 34 \ --hf_path $BASE_MODEL_DIR \ --seqlen $CTX_LEN \ | tee $TMP_MODEL_DIR/PPl-quip.log cd ../../ — Reply to this email directly, view it on GitHub<#13>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6WZSCLD5PSRLBT77BWYETYIVEIVAVCNFSM6AAAAABAOKF6BWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTIMRTGAYTSMI>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

tsengalb99 · 2023-12-10T06:02:36Z

Also if this is a math model you should probably change the hessian devset from redpajama (natural language) to a math dataset. Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Albert T ***@***.***> Sent: Sunday, December 10, 2023 12:53:26 AM To: Cornell-RelaxML/quip-sharp ***@***.***>; Cornell-RelaxML/quip-sharp ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [Cornell-RelaxML/quip-sharp] What could went wrong (Issue #13) It looks like you are using the default devset size for the hessian generation (iirc 256). You need something larger to get a good approximation of the actual hessians. For our experiments and released models, we used a devset of at least 4096. The reason why hessian generation takes a long time on consumer gpus is because they have poor fp64 performance and the hessian script accumulates in fp64. If 4096 will take too long for you and you cannot get access to a gpu with fast fp64, you can try accumulating the hessians in fp32 at the probable loss of some numerical accuracy. Get Outlook for Android<https://aka.ms/AAb9ysg>

________________________________ From: 俞航 ***@***.***> Sent: Sunday, December 10, 2023 12:47:54 AM To: Cornell-RelaxML/quip-sharp ***@***.***> Cc: Subscribed ***@***.***> Subject: [Cornell-RelaxML/quip-sharp] What could went wrong (Issue #13) Hi, I was going to upload a quip 2 bit version of llama2 model which I took as a chance as an experiment to this method. https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-QUIP-2bit But as I mentioned in its readme, the hessian pass took quite long, about 6 hours, and the final ppl for quip 2bit is not so ideal. The model perfomance also download grade noticeably. The conversion process does not throw any error though, and the evlaution process is smooth, too. What could went wrong. I might spent some time to re-run the whole process as double check Here is my script #!/bin/bash eval "$(conda shell.bash hook)" conda activate quip MODEL_NAME=Xwin-Math-7B-V1.0 #MODEL_NAME=ShareGPT4V-7B BASE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME/ SAVE_MODEL_DIR=/media/hangyu5/Home/Documents/Hugging-Face/$MODEL_NAME-QUIP/ TMP_MODEL_DIR=./$MODEL_NAME BATCH_SIZE=2 CTX_LEN=4096 cd repo/quip-sharp/ if [! -d "$TMP_MODEL_DIR" ]; then mkdir -p $TMP_MODEL_DIR fi TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./hessian_offline_llama.py \ --seed 34 \ --base_model $BASE_MODEL_DIR \ --save_path $TMP_MODEL_DIR/Hessian/ \ --ctx_size $CTX_LEN \ --batch_size $BATCH_SIZE \ | tee $TMP_MODEL_DIR/Hessian-quip.log TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python ./quantize_llama.py \ --seed 34 \ --base_model $BASE_MODEL_DIR \ --hessian_path $TMP_MODEL_DIR/Hessian \ --save_path $TMP_MODEL_DIR/Ckpt \ --ctx_size $CTX_LEN \ --batch_size $BATCH_SIZE \ --codebook E8P12 \ --scale_override 0.9 \ | tee $TMP_MODEL_DIR/Ckpt-quip.log TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python hfize_llama.py \ --quantized_path $TMP_MODEL_DIR/Ckpt \ --hf_output_path $SAVE_MODEL_DIR \ | tee $TMP_MODEL_DIR/HFize-quip.log TRANSFORMERS_VERBOSITY=debug CUDA_VISIBLE_DEVICES=0 python eval_ppl.py \ --seed 34 \ --hf_path $BASE_MODEL_DIR \ --seqlen $CTX_LEN \ | tee $TMP_MODEL_DIR/PPl-quip.log cd ../../ — Reply to this email directly, view it on GitHub<#13>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6WZSCLD5PSRLBT77BWYETYIVEIVAVCNFSM6AAAAABAOKF6BWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTIMRTGAYTSMI>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

tsengalb99 · 2023-12-11T17:28:22Z

@yhyu13 do you need any more help on this issue? If not, feel free to close it.

ustcwhy mentioned this issue Dec 10, 2023

custom 1.3B llama quant #14

Closed

yhyu13 closed this as completed Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What could went wrong #13

What could went wrong #13

yhyu13 commented Dec 10, 2023

tsengalb99 commented Dec 10, 2023 via email

tsengalb99 commented Dec 10, 2023 via email •

edited

Loading

tsengalb99 commented Dec 11, 2023

What could went wrong #13

What could went wrong #13

Comments

yhyu13 commented Dec 10, 2023

tsengalb99 commented Dec 10, 2023 via email

tsengalb99 commented Dec 10, 2023 via email • edited Loading

tsengalb99 commented Dec 11, 2023

tsengalb99 commented Dec 10, 2023 via email •

edited

Loading