espressomd-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo-users] Problem of checkpointing with mpi


From:
Subject: Re: [ESPResSo-users] Problem of checkpointing with mpi
Date: Fri, 10 May 2019 23:12:10 +0800

Dear JN,

Thanks for your reply!

I will test it again soon.

As for the DipolarDirectSum, Rudolf said that the dipolar direct sum on th CPU is not parallelized. You can only use it on a single core.

Regards!
Ricky



ricky
address@hidden

签名由 网易邮箱大师 定制

On 05/10/2019 22:05, Jean-Noël Grad wrote:
Dear Ricky,

The scripts you attached contained many commented out lines and file
operations that are not necessary to replicate an MPI error. In any
case, I ran your scripts with espresso 4.0.2 on Ubuntu 18 and 8 MPI
threads in a [2, 2, 2] configuration and couldn't replicate your error
message. In fact, I got the "Could not activate magnetostatics method
DipolarDirectSumCpu" exception you mentioned in your email of May 7th.
Did you solve the MPI error?

Best regards,
JN

On 5/6/19 4:24 PM, 赵睿祺 wrote:
> Dear all,
>
> I have some problems about checkpointing with mpi. What I want to do is
> to register the system which I set up in the part1.py and load it in the
> part2.py.
>
> When I run the scripts without mpi, it works well. The command I use is
>
> ./pypresso <SCRIPT>
>
> However, when I execute the command with mpi,
>
> mpirun –n 32 ./pypresso <SCRIPT>
>
> something wrong happens:
>
> _______________________________________________________________________________
>
> terminate called after throwing an instance of 'std::out_of_range'
>
> what():_Map_base::at
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] *** Process
> received signal ***
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] Signal:
> Aborted (6)
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] Signal code:(-6)
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] [ 0]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fe4d0054390]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] [ 1]
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fe4cfcae428]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22768] [ 2]
> terminate called after throwing an instance of 'std::out_of_range'
>
> ……
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22772] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22746] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22747] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22743] *** End of
> error message ***
>
> x4bec4b]
>
> [zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11:22744] *** End of
> error message ***
>
> --------------------------------------------------------------------------
>
> mpirun noticed that process rank 4 with PID 22746 on node
> zhrq-X10DRi-Invalid-entry-length-16-Fixed-up-to-11 exited on signal 6
> (Aborted).
>
> How to solve this problem? Thanks so much for your kind help!
>
> Best regards!
>
> Ricky Zhao
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]