[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ESPResSo-users] Problem of checkpointing with mpi
From: |
|
Subject: |
Re: [ESPResSo-users] Problem of checkpointing with mpi |
Date: |
Fri, 10 May 2019 23:12:10 +0800 |
Dear JN,
Thanks for your reply!
I will test it again soon.
As for the DipolarDirectSum, Rudolf said that the dipolar direct sum on th CPU is not parallelized. You can only use it on a single core.
Regards!
Ricky
|
ricky
|
address@hidden
|
Dear Ricky,
The scripts you attached contained many commented out lines and file
operations that are not necessary to replicate an MPI error. In any
case, I ran your scripts with espresso 4.0.2 on Ubuntu 18 and 8 MPI
threads in a [2, 2, 2] configuration and couldn't replicate your error
message. In fact, I got the "Could not activate magnetostatics method
DipolarDirectSumCpu" exception you mentioned in your email of May 7th.
Did you solve the MPI error?
Best regards,
JN
On 5/6/19 4:24 PM, 赵睿祺 wrote:
> Dear all,
>
> I have some problems about checkpointing with mpi. What I want to do is
> to register the system which I set up in the part1.py and load it in the
> part2.py.
>
> When I run the scripts without mpi, it works well. The command I use is
>
> ./pypresso <SCRIPT>
>
> However, when I execute the command with mpi,
>
> mpirun –n 32 ./pypresso <SCRIPT>
>
> something wrong happens:
>
> _______________________________________________________________________________
>
> terminate called after throwing an instance of 'std::out_of_range'
>
> what():_Map_base::at
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] *** Process
> received signal ***
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] Signal:
> Aborted (6)
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] Signal code:(-6)
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] [ 0]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fe4d0054390]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] [ 1]
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fe4cfcae428]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] [ 2]
> terminate called after throwing an instance of 'std::out_of_range'
>
> ……
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22772] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22746] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22747] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22743] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22744] *** End of
> error message ***
>
> --------------------------------------------------------------------------
>
> mpirun noticed that process rank 4 with PID 22746 on node
> zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11 exited on signal 6
> (Aborted).
>
> How to solve this problem? Thanks so much for your kind help!
>
> Best regards!
>
> Ricky Zhao
>